<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>nandeshwar.info</title>
	<atom:link href="http://nandeshwar.info/feed/" rel="self" type="application/rss+xml" />
	<link>https://nandeshwar.info/</link>
	<description></description>
	<lastBuildDate>Mon, 13 Dec 2021 07:27:52 +0000</lastBuildDate>
	<language>en-US</language>
	<sy:updatePeriod>
	hourly	</sy:updatePeriod>
	<sy:updateFrequency>
	1	</sy:updateFrequency>
	
	<item>
		<title>An Introduction To Dynamic Array Formulas In Excel</title>
		<link>https://nandeshwar.info/excel-2/dynamic-array-formulas-excel/</link>
		
		<dc:creator><![CDATA[n.ashutosh]]></dc:creator>
		<pubDate>Mon, 13 Dec 2021 07:27:49 +0000</pubDate>
				<category><![CDATA[Excel]]></category>
		<category><![CDATA[data science]]></category>
		<category><![CDATA[deep learning]]></category>
		<category><![CDATA[machine learning]]></category>
		<guid isPermaLink="false">https://nandeshwar.info/?p=3902</guid>

					<description><![CDATA[<p>Although they can take a little time to master, once you understand them, they can save you vast amounts of time, especially as Office 365 has now made them more straightforward to use. In this article, you will learn about Excel&#8217;s dynamic array formulas and how to use them with the help of simple examples. [&#8230;]</p>
<p>The post <a rel="nofollow" href="https://nandeshwar.info/excel-2/dynamic-array-formulas-excel/">An Introduction To Dynamic Array Formulas In Excel</a> appeared first on <a rel="nofollow" href="https://nandeshwar.info">nandeshwar.info</a>.</p>
]]></description>
										<content:encoded><![CDATA[
<p>Although they can take a little time to master, once you understand them, they can save you vast amounts of time, especially as Office 365 has now made them more straightforward to use.</p>



<p>In this article, you will learn about Excel&#8217;s dynamic array formulas and how to use them with the help of simple examples.</p>



<h2><strong>What Is An Array?</strong></h2>



<p>What precisely is an array? In Excel terms,&nbsp;<strong>an array</strong>&nbsp;is a collection of values, whether they are in rows, columns, or both.&nbsp;</p>



<p>An&nbsp;<strong>array formula</strong>&nbsp;allows us to do multiple calculations on one or more values that are stored in our array.&nbsp;</p>



<p>These formulas update automatically when the referenced cells change. And that is why they are called&nbsp;<strong>dynamic arrays</strong>.&nbsp;</p>



<p>D<strong>ynamic arrays</strong>&nbsp;perform automatic calculations, and they return values to one cell or multiple cells based on our formula.&nbsp;</p>



<p>To do this, we need to input the formula, usually in just one cell.</p>



<p class="has-background" style="background-color:#ff993f">Note: This is a guest post by Ben Richardson. Ben is an ex-banker and venture capitalist. He now runs Acuity Training, one of the UK&#8217;s leading providers of <a href="https://www.acuitytraining.co.uk/microsoft-training-courses/excel/">Excel training courses</a>.</p>



<h2><strong>Short History Of Dynamic Arrays In Excel</strong></h2>



<p>In previous versions of Excel, dynamic arrays were complex to set up.&nbsp;</p>



<p>You would create an array using the keystroke combination&nbsp;<strong>CTRL + SHIFT + ENTER</strong>, thus often referred as&nbsp;<strong>CSE formulas</strong>.</p>



<p>With the introduction of dynamic arrays formulas in September 2018, Microsoft introduced (in Office 365) a more accessible approach to creating them.&nbsp;</p>



<p>When we create such formulas, they are automatically populated or spilled into the corresponding adjacent cells. If Excel does not identify them for some reason, we can still use&nbsp;<strong>CTRL + SHIFT + ENTER</strong>.&nbsp;</p>



<p>Suppose you write a dynamic formula in Excel. In that case, it can now recognize if this formula can return multiple values or not.</p>



<h2><strong>Using Dynamic Arrays In An Example</strong></h2>



<p>Suppose that we have salaries for an average company for four years:</p>



<figure class="wp-block-image size-full"><img width="781" height="95" src="https://d2py08v4b28rs4.cloudfront.net/wp-content/uploads/Picture1.png" alt="" class="wp-image-3903" srcset="https://d2py08v4b28rs4.cloudfront.net/wp-content/uploads/Picture1.png 781w, https://d2py08v4b28rs4.cloudfront.net/wp-content/uploads/Picture1-300x36.png 300w, https://d2py08v4b28rs4.cloudfront.net/wp-content/uploads/Picture1-768x93.png 768w" sizes="(max-width: 781px) 100vw, 781px" /></figure>



<p>We can now tell Excel that the first two numbers (<strong>B2:C2)</strong>&nbsp;are a part of an array. To do this, we will place the following formula in&nbsp;<strong>cell B5</strong>:</p>



<p><strong>=B2:C2</strong></p>



<p>Before you press&nbsp;<strong>ENTER</strong>&nbsp;look at the formula bar.</p>



<p>Suppose you see that your formula is enclosed in curly brackets. In that case, it is a sign that Excel identifies it as an array formula.</p>



<figure class="wp-block-image size-full"><img loading="lazy" width="543" height="89" src="https://d2py08v4b28rs4.cloudfront.net/wp-content/uploads/Picture2.png" alt="" class="wp-image-3904" srcset="https://d2py08v4b28rs4.cloudfront.net/wp-content/uploads/Picture2.png 543w, https://d2py08v4b28rs4.cloudfront.net/wp-content/uploads/Picture2-300x49.png 300w" sizes="(max-width: 543px) 100vw, 543px" /></figure>



<p>Once you hit&nbsp;<strong>ENTER</strong>, the array will be &#8220;spilled&#8221; horizontally.&nbsp;</p>



<p>This means that Excel effectively duplicates the formula for each cell in the array.</p>



<p>A quick way of knowing that this is now an array is that we have the cells formatted with <strong>blue borders</strong>:</p>



<figure class="wp-block-image size-full"><img loading="lazy" width="504" height="93" src="https://d2py08v4b28rs4.cloudfront.net/wp-content/uploads/Picture3.png" alt="" class="wp-image-3905" srcset="https://d2py08v4b28rs4.cloudfront.net/wp-content/uploads/Picture3.png 504w, https://d2py08v4b28rs4.cloudfront.net/wp-content/uploads/Picture3-300x55.png 300w" sizes="(max-width: 504px) 100vw, 504px" /></figure>



<p>You can see that there is nothing scary about the arrays, right?&nbsp;</p>



<p>Now, suppose that we have the number of employees in the company for every year as well:</p>



<figure class="wp-block-image size-full"><img loading="lazy" width="785" height="116" src="https://d2py08v4b28rs4.cloudfront.net/wp-content/uploads/Picture4.png" alt="" class="wp-image-3906" srcset="https://d2py08v4b28rs4.cloudfront.net/wp-content/uploads/Picture4.png 785w, https://d2py08v4b28rs4.cloudfront.net/wp-content/uploads/Picture4-300x44.png 300w, https://d2py08v4b28rs4.cloudfront.net/wp-content/uploads/Picture4-768x113.png 768w" sizes="(max-width: 785px) 100vw, 785px" /></figure>



<p>Our goal now is to find out the average salary for every year. You could go along and input the formula&nbsp;<strong>cell B4</strong>:</p>



<p><strong>=B2/B3</strong></p>



<p>Then drag this formula to the end of our table.&nbsp;</p>



<p>You would have the following results:</p>



<figure class="wp-block-image size-full"><img loading="lazy" width="816" height="164" src="https://d2py08v4b28rs4.cloudfront.net/wp-content/uploads/Picture5.png" alt="" class="wp-image-3907" srcset="https://d2py08v4b28rs4.cloudfront.net/wp-content/uploads/Picture5.png 816w, https://d2py08v4b28rs4.cloudfront.net/wp-content/uploads/Picture5-300x60.png 300w, https://d2py08v4b28rs4.cloudfront.net/wp-content/uploads/Picture5-768x154.png 768w" sizes="(max-width: 816px) 100vw, 816px" /></figure>



<p>To avoid so many steps in a process, we can use a&nbsp;<strong>Dynamic Array</strong>.&nbsp;</p>



<p>To do this, we will enter the formula in&nbsp;<strong>cell G2</strong>:</p>



<p><strong>=B2:E2/B3:E3</strong></p>



<p>Excel will automatically recognize the pattern and will do the calculations for us:</p>



<figure class="wp-block-image size-full"><img loading="lazy" width="979" height="193" src="https://d2py08v4b28rs4.cloudfront.net/wp-content/uploads/Picture6.png" alt="" class="wp-image-3908" srcset="https://d2py08v4b28rs4.cloudfront.net/wp-content/uploads/Picture6.png 979w, https://d2py08v4b28rs4.cloudfront.net/wp-content/uploads/Picture6-300x59.png 300w, https://d2py08v4b28rs4.cloudfront.net/wp-content/uploads/Picture6-768x151.png 768w" sizes="(max-width: 979px) 100vw, 979px" /></figure>



<p>You can see that cells&nbsp;<strong>H2:J2</strong>&nbsp;now have a blue outline, meaning that the formula spilled across multiple cells.</p>



<p>Of course, if we change any value in our original table, the resulting numbers in the array will also change.&nbsp;</p>



<p>You may still need to correct the formatting of the results.</p>



<p>Excel automatically uses the&nbsp;<strong>General formatting&nbsp;</strong>style for numbers.</p>



<p>The example above worked on multiple cells.&nbsp;</p>



<p>We can also use dynamic arrays with&nbsp;<strong>single-cell</strong>&nbsp;array formulas.&nbsp;</p>



<p>For example, suppose you want to know the difference between&nbsp;<strong>$1,000</strong>&nbsp;and the average salary. In that case, we could use the following formula:</p>



<p><strong>=MIN(1000-B4:E4)</strong></p>



<p>We would get the <strong>number $37.57</strong> for the result:</p>



<figure class="wp-block-image size-full"><img loading="lazy" width="872" height="406" src="https://d2py08v4b28rs4.cloudfront.net/wp-content/uploads/Picture7.png" alt="" class="wp-image-3909" srcset="https://d2py08v4b28rs4.cloudfront.net/wp-content/uploads/Picture7.png 872w, https://d2py08v4b28rs4.cloudfront.net/wp-content/uploads/Picture7-300x140.png 300w, https://d2py08v4b28rs4.cloudfront.net/wp-content/uploads/Picture7-768x358.png 768w" sizes="(max-width: 872px) 100vw, 872px" /></figure>



<p>Before dynamic array formulas, the usual process would have been to subtract every number in the fourth&nbsp;<strong>row by 1,000</strong>&nbsp;and then find the minimum value of these values with the&nbsp;<strong>MIN formula</strong>, a complicated task.</p>



<h2><strong>Dynamic Array Functions</strong></h2>



<p>Let&#8217;s see some common de</p>



<p>With the development of dynamic arrays and their full integration into Excel with&nbsp;<strong>Office 365,</strong>&nbsp;Microsoft has recently added dynamic array functions to Excel.</p>



<p>These functions are:</p>



<p><strong>UNIQUE</strong>&nbsp;– we can use it to extract unique values from our range.</p>



<p><strong>FILTER&nbsp;</strong>– used to filter our data based on any criteria we specify.</p>



<p><strong>SORT</strong>&nbsp;&#8211; sorts our range by the desired column.</p>



<p><strong>SORTBY</strong>&nbsp;&#8211; sorts our range by another array or range.</p>



<p><strong>RANDARRAY&nbsp;</strong>&#8211; used to present an array of random numbers.</p>



<p><strong>SEQUENCE</strong>&nbsp;&#8211; used to present a list of sequential numbers.</p>



<h2><strong><strong>Conclusion</strong></strong></h2>



<p>Next time you are laboriously copying a formula across a table in Excel, pause and check whether you get the same results with a dynamic array formula.</p>



<p>Dynamic arrays can take a little getting used to, but you&#8217;ll never go back once you do &#8212; they are big timesavers and worth the time to master.</p>
<span class="tve-leads-two-step-trigger tl-2step-trigger-2626"></span><span class="tve-leads-two-step-trigger tl-2step-trigger-0"></span><p>The post <a rel="nofollow" href="https://nandeshwar.info/excel-2/dynamic-array-formulas-excel/">An Introduction To Dynamic Array Formulas In Excel</a> appeared first on <a rel="nofollow" href="https://nandeshwar.info">nandeshwar.info</a>.</p>
]]></content:encoded>
					
		
		
			</item>
		<item>
		<title>Machine Learning vs. Deep Learning &#8211; What is the Difference?</title>
		<link>https://nandeshwar.info/data-science-2/machine-learning-vs-deep-learning-what-is-the-difference/</link>
		
		<dc:creator><![CDATA[n.ashutosh]]></dc:creator>
		<pubDate>Sat, 18 Sep 2021 19:02:28 +0000</pubDate>
				<category><![CDATA[Data Science]]></category>
		<category><![CDATA[data science]]></category>
		<category><![CDATA[deep learning]]></category>
		<category><![CDATA[machine learning]]></category>
		<guid isPermaLink="false">https://nandeshwar.info/?p=3888</guid>

					<description><![CDATA[<p>Machine Learning vs. Deep Learning Artificial Intelligence (AI) has two mechanisms of learning: Machine Learning and Deep Learning. As these two phrases are often interchangeable, let&#8217;s discover the differences between them in this article. Note: This is a guest post by author Anita Basa, an enthusiastic Digital Marketer and content writer working at Tekslate.com. I write [&#8230;]</p>
<p>The post <a rel="nofollow" href="https://nandeshwar.info/data-science-2/machine-learning-vs-deep-learning-what-is-the-difference/">Machine Learning vs. Deep Learning &#8211; What is the Difference?</a> appeared first on <a rel="nofollow" href="https://nandeshwar.info">nandeshwar.info</a>.</p>
]]></description>
										<content:encoded><![CDATA[
<h2><strong>Machine Learning vs. Deep Learning</strong></h2>



<p><a href="https://nandeshwar.info/data-science-2/ways-artificial-intelligence-will-disrupt-nonprofit-fundraising/">Artificial Intelligence</a> (AI) has two mechanisms of learning: Machine Learning and <a href="https://nandeshwar.info/data-science-2/deep-learning-tensorflow-r-tutorial/">Deep Learning</a>. As these two phrases are often interchangeable, let&#8217;s discover the differences between them in this article.</p>



<p class="has-background" style="background-color:#ff993f">Note: This is a guest post by author Anita Basa, an enthusiastic Digital Marketer and content writer working at <a target="_blank" href="https://tekslate.com/" rel="noreferrer noopener"><strong>Tekslate.com</strong></a>. I write articles on trending IT-related topics such as Microsoft Dynamics CRM, Oracle, Salesforce, Cloud Technologies, Business Tools, and Software. You can find me on Linkedin: <a target="_blank" href="https://www.linkedin.com/in/anita-basa-051b4a125/" rel="noreferrer noopener"><strong>Anita Basa</strong></a></p>



<h3><strong>Machine Learning</strong></h3>



<p><a href="https://nandeshwar.info/fundraising-analytics-managers/">Machine learning</a> uses the information to instruct and search for accurate results. Its focal point is creating software that can easily access information and <a href="https://nandeshwar.info/data-science-2/how-to-automate-statistical-analysis-using-rmarkdown/">learn from the data</a>. Many algorithms within machine learning support the classification or prediction of outputs given some input data.</p>



<h3><strong>Deep Learning</strong></h3>



<p>Deep learning is a subset of Machine Learning, but artificial neural networks are the primary algorithms for training models. Researchers modeled artificial neural networks after the human brain to connect the different data inputs (aka nodes) through synapses. While in typical neural networks, the number of layers between the input and the output is few, the layers can grow in hundreds and more in deep learning models.</p>



<h3><strong>Kinds of Deep Learning Algorithms</strong></h3>



<p>Let&#8217;s see some common deep learning algorithms.</p>



<h4><strong>Recurrent Neural Networks:</strong></h4>



<p>The Recurrent Neural Networks use a critical component that is not available in simpler algorithms: memory. The computer can remember past data and decisions in memory and consider them when evaluating current data and introduce the power of context.</p>



<p>Researchers use Recurrent Neural Networks for NLP (<a href="https://nandeshwar.info/data-science-2/natural-language-generation-with-r-python/">natural language processing</a>) work because the computer understands a text better if it can access memory of tone and context.</p>



<h4><strong>Convolutional Neural Networks:</strong></h4>



<p>Researchers use Convolutional Neural Networks to work with pictures. The term convolutional is for the technique that applies a weight-based filter throughout each detail of an image, assisting the computer to recognize and react to elements in the image itself.</p>



<p>The technique is beneficial when you must scan a high-resolution image for a particular product or feature. A specialized field of studying photo data is called computer vision &#8212; and is a growing field.</p>



<h2><strong>Difference between Deep Learning and Machine Learning</strong></h2>



<p>After learning the basics of the machine and deep learning, here are some crucial points of <strong>machine learning vs. deep learning</strong></p>



<h3><strong>Feature Engineering</strong></h3>



<p><a href="https://www.tmwr.org/recipes.html">Feature Engineering</a> is a process of using domain knowledge to build features or inputs for training models. Data scientists use feature engineering to transform existing data to increase the accuracy of their learning models. It is a time-consuming process and requires knowledge of the data and techniques.</p>



<p>In&nbsp;<strong>Machine Learning</strong>, mainly the applied features must be analyzed by an expert and then hand-coded as per the domain and data type. For instance, the components can be shape, texture, pixel value, orientation, and position. The work of machine learning algorithms depends on how exactly the elements are identified and extracted.</p>



<p>In <strong>Deep Learning</strong>, the algorithms are trying to learn advanced features from data. It helps lessen the work of creating a new feature extractor for every problem. </p>



<h3><strong>Data Province</strong></h3>



<p>A principal distinction between deep learning and machine learning is its overall performance as the size of statics increases.</p>



<p>In Deep learning, if the information quantity is small, it does not perform well because the deep learning method requires a considerable quantity of data to comprehend it accurately.&nbsp;</p>



<p>The Machine Learning methods with their own designed rules will survive in this situation quickly and efficiently. </p>



<h3><strong>Interpretability</strong></h3>



<p>We use interpretability as a feature for comparison between machine and deep learning.&nbsp;</p>



<p>If we use deep learning to provide self–regulating scores to essays, its performance in scoring is impressive, and it is near-human performance. But there is a difficulty; it does not disclose the behind-the-scenes details or answer questions like why it has given that score. However, mathematically you can discover which nodes of the deep neural network are activated. Still, we don&#8217;t know what their neurons were supposed to copy and what these layers of neurons are performing collectively.&nbsp;</p>



<p>The Machine learning algorithms such as decision trees give you crisp rules as to why it is selected, making it easy to interpret the reason behind it. Moreover, the algorithms like decision trees and <a href="https://nandeshwar.info/data-mining-2/linear-regression-in-excel/">logistic regression</a> are used in heavily regulated businesses for interpretability.</p>



<h3><strong>Equipment Province</strong></h3>



<p>Deep Learning methods are mainly dependent on exclusive devices compared to old machine learning methods that are worked on backend machines. It occurs because the demand for deep learning algorithms includes GPUs that are an internal process of working.</p>



<p>Deep Learning methods naturally perform a vast quantity of matrix multiplication processes. These functions could effectively optimize by using GPU because GPU is specifically created with this goal.</p>



<h3><strong>Execution Time </strong></h3>



<p>Usually, deep learning algorithms take longer to train because there are various deep learning algorithms that provide training, which takes longer than usual. State-of-the-art deep learning algorithm ResNet is required two weeks to train entirely from scratch. However, the Machine Learning Algorithms take less time to prepare, ranging from a few minutes to a few hours.&nbsp;</p>



<p>The testing or validation time, however, is the opposite. Deep learning algorithms take less time to perform the testing process. However, suppose you compare it with k – nearest neighbor, a type of machine learning algorithm. In that case, the test time increases with the increasing data size even though it does not apply to all machine learning algorithms, as some take less time to run.  </p>



<h3><strong>Brainstorming Method</strong></h3>



<p>When solving the problem using machine learning algorithms, it is advisable to break them into various parts, resolve them specifically, and mix them to obtain results. In contrast, deep learning advocates solving the difficulty at the backend.</p>



<p>For example, if you have work for dual-target detection, it is to search the target and where it is presented in the photo.</p>



<p>In the Machine learning method, you can split the issue into target detection and target recognition. Firstly you use a complete pack detection method like grab cut to remove images and search required targets. Then all the identified things, you will use object detection methods like SVM with HOG to identify pertinent targets.</p>



<p>In the deep learning method, you would follow the procedure at the backend. In a Yolo net (it is a kind of deep learning method), you will present a photo, and it will provide you with the location and the title of the object.&nbsp;</p>



<h2><strong>Conclusion</strong></h2>



<p>Deep learning is an advanced machine learning system that relies on often unstructured and enormous quantities of data. Thus, deep learning can cater to a bigger cap of issues with extra ease and efficiency. Technological breakthroughs like Google&#8217;s Deep Mind are the epitome of present-day AI, facilitated by deep learning and neurological networks.</p>
<span class="tve-leads-two-step-trigger tl-2step-trigger-2626"></span><span class="tve-leads-two-step-trigger tl-2step-trigger-0"></span><p>The post <a rel="nofollow" href="https://nandeshwar.info/data-science-2/machine-learning-vs-deep-learning-what-is-the-difference/">Machine Learning vs. Deep Learning &#8211; What is the Difference?</a> appeared first on <a rel="nofollow" href="https://nandeshwar.info">nandeshwar.info</a>.</p>
]]></content:encoded>
					
		
		
			</item>
		<item>
		<title>Create PowerPoint Presentations with R and RMarkdown</title>
		<link>https://nandeshwar.info/data-science-2/create-powerpoint-presentations-r-and-rmarkdown/</link>
		
		<dc:creator><![CDATA[n.ashutosh]]></dc:creator>
		<pubDate>Mon, 24 May 2021 00:37:22 +0000</pubDate>
				<category><![CDATA[Data Science]]></category>
		<category><![CDATA[powerpoint]]></category>
		<category><![CDATA[presentations]]></category>
		<category><![CDATA[R]]></category>
		<category><![CDATA[Rmarkdown]]></category>
		<guid isPermaLink="false">https://nandeshwar.info/?page_id=3751</guid>

					<description><![CDATA[<p>In this post, you will learn how to create PowerPoint presentations using R and RMarkdown. There are other presentation options in RMarkdown and RStudio such as ioslides, slidy, and beamer, but for this post, we will focus on PowerPoint presentations.&#160; We will see three ways of creating PowerPoint presentations: Using the default RMarkdown option Using [&#8230;]</p>
<p>The post <a rel="nofollow" href="https://nandeshwar.info/data-science-2/create-powerpoint-presentations-r-and-rmarkdown/">Create PowerPoint Presentations with R and RMarkdown</a> appeared first on <a rel="nofollow" href="https://nandeshwar.info">nandeshwar.info</a>.</p>
]]></description>
										<content:encoded><![CDATA[
<p>In this post, you will learn how to create PowerPoint presentations using R and RMarkdown. There are other presentation options in RMarkdown and RStudio such as ioslides, slidy, and <a href="https://nandeshwar.info/data-science-2/automated-reports-and-dashboards-in-r/">beamer</a>, but for this post, we will focus on PowerPoint presentations.&nbsp;</p>



<figure class="wp-block-embed aligncenter is-type-video is-provider-youtube wp-block-embed-youtube wp-embed-aspect-16-9 wp-has-aspect-ratio"><div class="wp-block-embed__wrapper">
https://youtu.be/L0o4R0tuodI
</div><figcaption>Watch step-by-step instructions on YouTube</figcaption></figure>



<h4>We will see three ways of creating PowerPoint presentations:</h4>



<ol><li>Using the default RMarkdown option</li><li>Using a custom PowerPoint template</li><li>Using officedown/officer R libraries for further customization</li></ol>



<p>Before we jump into creating these presentations, I would like to share a quick overview of PowerPoint and presentation design best practices from two books: Presentation Zen and Strategic Storytelling.</p>



<h2>Presentation Zen</h2>



<p>Garr Reynolds, the author of <a href="https://amzn.to/3hOFYIc" target="_blank" rel="noreferrer noopener">Presentation Zen</a>, recommends the use of full-bleed background images with one sentence, a word, or statistic. He also recommends using the rule of thirds to place the point of focus in the intersection of lines dividing the slide into three equal sections horizontally and vertically. We will see whether we can make that happen. He also suggests using simple charts to make your point. Although these types of presentations are better for keynotes, we will try to apply some of these principles.</p>



<h2>Strategic Storytelling</h2>



<p>Dave McKinsey, the author of <a href="https://amzn.to/3hOFPo8" target="_blank" rel="noreferrer noopener">Strategic Storytelling</a>, provides examples from top management consulting companies such as McKinsey and Accenture. He explains why these <a href="https://nandeshwar.info/guide-to-improve-your-speaking-instantly/">business presentations</a> work. Some of the main points from this book are:</p>



<ol><li>Use of the Minto principle</li><li>Use the slide title to make your point or ask a question</li><li>Use of body content to support your point mentioned in the title</li></ol>



<p>Let’s talk about the Minto principle also called the Minto Pyramid.</p>



<p><a target="_blank" href="https://www.mckinsey.com/alumni/news-and-insights/global-news/alumni-news/barbara-minto-mece-i-invented-it-so-i-get-to-say-how-to-pronounce-it" rel="noreferrer noopener">Barbara Minto</a>&nbsp;was McKinsey consulting’s first female MBA professional. She invented the Minto pyramid to help consultants make effective business presentations. It provides a good structure for organizing your presentation content. You start with your main idea or recommendation. Then provide the supporting arguments with data and facts. Each supporting argument is further supported by more arguments and data. If you notice any of the consulting company’s presentations, you will see they have many charts, call-outs, and notes with references.&nbsp;</p>



<p><strong>With these presentation design best practices, let&#8217;s dive into creating PowerPoint presentations with R and RMarkdown.</strong></p>



<p>For this post, let&#8217;s say that we are studying the streaming services market. Our client is a company trying to enter in this market.</p>



<p>Here’s how we will <a href="https://nandeshwar.info/leadership/develop-great-content-presentation/">structure the presentation</a>:</p>



<ul><li>Slide 2: Our main recommendation.</li></ul>



<ul><li>Slide 3: Survey of the market: share prices</li></ul>



<ul><li>Slide 4: Survey of the market: market share</li></ul>



<ul><li>Slide 5: Opportunities</li></ul>



<ul><li>Slide 6: Side by side comparison of Netflix and Disney+</li></ul>



<ul><li>Slide 7: Conclusion</li></ul>



<div class="wp-block-image"><figure class="aligncenter size-large"><img loading="lazy" width="823" height="445" src="https://d2py08v4b28rs4.cloudfront.net/wp-content/uploads/powerpoint-presentation-r-rmarkdown-example.png" alt="PowerPoint presentation created in R example" class="wp-image-3759" srcset="https://d2py08v4b28rs4.cloudfront.net/wp-content/uploads/powerpoint-presentation-r-rmarkdown-example.png 823w, https://d2py08v4b28rs4.cloudfront.net/wp-content/uploads/powerpoint-presentation-r-rmarkdown-example-300x162.png 300w, https://d2py08v4b28rs4.cloudfront.net/wp-content/uploads/powerpoint-presentation-r-rmarkdown-example-768x415.png 768w" sizes="(max-width: 823px) 100vw, 823px" /></figure></div>



<p><strong>Let’s get started.</strong></p>



<p>First, let&#8217;s make sure you can build the example PowerPoint presentation in RStudio.</p>



<p>To start a new markdown document, click on File, New File, R Markdown, then Presentation, and finally PowerPoint.&nbsp;</p>



<p>The title is of my presentation is &#8220;Streaming Services Market Study&#8221; and the author is “Best Brians Consulting” (It is Brians and not brains after the movie Life of Brian)</p>



<p>Make sure that this document compiles and gives you a nice PowerPoint presentation.&nbsp;</p>



<p>Click on the &#8220;knit&#8221; button. Save your file. If everything works out, you should have a new PowerPoint presentation.&nbsp;</p>



<p>If everything worked out, edit the RMarkdown document with the new content for this streaming services presentation.</p>



<p>First, let’s change the knitr options to cache the data and suppress warnings and messages.</p>


<pre class="wp-block-code" aria-describedby="shcb-language-1" data-shcb-language-name="R" data-shcb-language-slug="r"><div><code class="hljs language-r">```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = FALSE, cache = TRUE, warning = FALSE, message = FALSE)
```</code></div><small class="shcb-language" id="shcb-language-1"><span class="shcb-language__label">Code language:</span> <span class="shcb-language__name">R</span> <span class="shcb-language__paren">(</span><span class="shcb-language__slug">r</span><span class="shcb-language__paren">)</span></small></pre>


<p>The first slide has recommendations.&nbsp;</p>


<pre class="wp-block-code" aria-describedby="shcb-language-2" data-shcb-language-name="R" data-shcb-language-slug="r"><div><code class="hljs language-r"><span class="hljs-comment">## Recommendations</span>
- This is a tough market with many established companies
- You will need a unique strategy to set yourself apart
- We recommend you enter <span class="hljs-keyword">in</span> a narrow market and not compete with the big companies</code></div><small class="shcb-language" id="shcb-language-2"><span class="shcb-language__label">Code language:</span> <span class="shcb-language__name">R</span> <span class="shcb-language__paren">(</span><span class="shcb-language__slug">r</span><span class="shcb-language__paren">)</span></small></pre>


<p>Our next slide has data with the share prices of the companies in this space.</p>



<p>Following the &#8220;Strategic Storytelling&#8221; book example, the title of our slide is: Companies such as Netflix have cemented their roles as streaming entertainment providers</p>



<p>This R code pulls the historical share prices of Netflix, Disney, and Comcast, ATT, and shows a <a href="https://nandeshwar.info/data-visualization/nyt-wapo-data-visualization-r/">ggplot multiple line plot</a>. It could be made prettier and more valuable with annotations, but for this exercise, it is good enough.</p>


<pre class="wp-block-code" aria-describedby="shcb-language-3" data-shcb-language-name="R" data-shcb-language-slug="r"><div><code class="hljs language-r shcb-wrap-lines"><span class="hljs-comment">## Companies such as Netflix have cemented their roles as streaming entertainment providers</span>

```{r sharepricesplot, fig.asp=0.6, dpi=300}
library(tidyquant)
library(lubridate)
library(ggplot2)
library(ggthemes)
library(dplyr)

my_tickers &lt;- c("NFLX", "DIS", "T", "CMCSA")

stock_prices &lt;- tq_get(x = my_tickers,
                       from = Sys.Date() %m-% years(5),
                       to = Sys.Date(),
                       get = "stock.prices")

g &lt;- ggplot(stock_prices,
            aes(x = date, 
                y = close, 
                group = symbol, 
                color = symbol,
                label = symbol)) +
  geom_line()


label_data &lt;- tribble(
  ~date,       ~symbol, ~close,
  as.Date("2019-01-02"), 'NFLX', 220,
  as.Date("2019-01-02"), 'DIS', 140,
  as.Date("2019-01-02"), 'CMCSA', 48,
  as.Date("2021-01-02"), 'T', 40,
  
)

g + 
  theme_wsj() +
  geom_text(data = label_data, aes(x = date, y = close, label = symbol, color = symbol)) +
  ggtitle(label = "Daily Closing Share Prices") +
  theme(legend.position = "none", 
        title = element_text(size = rel(1.3), family = "Arial"),
        plot.title.position = "plot") 

```</code></div><small class="shcb-language" id="shcb-language-3"><span class="shcb-language__label">Code language:</span> <span class="shcb-language__name">R</span> <span class="shcb-language__paren">(</span><span class="shcb-language__slug">r</span><span class="shcb-language__paren">)</span></small></pre>


<div class="wp-block-image"><figure class="aligncenter size-large"><img loading="lazy" width="1024" height="614" src="https://d2py08v4b28rs4.cloudfront.net/wp-content/uploads/powerpoint-presentation-r-rmarkdown-share-prices-ggplot-multiple-line-1024x614.png" alt="A multiple line plot with share prices for a PowerPoint presentation created using R and RMarkdown" class="wp-image-3761" srcset="https://d2py08v4b28rs4.cloudfront.net/wp-content/uploads/powerpoint-presentation-r-rmarkdown-share-prices-ggplot-multiple-line-1024x614.png 1024w, https://d2py08v4b28rs4.cloudfront.net/wp-content/uploads/powerpoint-presentation-r-rmarkdown-share-prices-ggplot-multiple-line-300x180.png 300w, https://d2py08v4b28rs4.cloudfront.net/wp-content/uploads/powerpoint-presentation-r-rmarkdown-share-prices-ggplot-multiple-line-768x461.png 768w, https://d2py08v4b28rs4.cloudfront.net/wp-content/uploads/powerpoint-presentation-r-rmarkdown-share-prices-ggplot-multiple-line.png 1500w" sizes="(max-width: 1024px) 100vw, 1024px" /></figure></div>



<p>Our next slide is the market share data.</p>



<p>This <a href="https://nandeshwar.info/data-visualization/waffle-chart-vs-dot-plot-vs-pie-charts/">R code creates a treemap</a> of market share data using ggplot and treemapify. This chart could also be made better, but this will do for now.</p>


<pre class="wp-block-code" aria-describedby="shcb-language-4" data-shcb-language-name="R" data-shcb-language-slug="r"><div><code class="hljs language-r shcb-wrap-lines"><span class="hljs-comment">## The market is packed with many streaming options</span>

```{r marketshare, fig.asp=0.6, dpi=300}
market_share &lt;- tribble(
  ~company, ~share,
  "Netflix", 0.2,
  "Amazon Prime", 0.16,
  "Hulu", 0.13,
  "Max", 0.12,
  "Disney", 0.11,
  "Apple", 0.05,
  "Peacock", 0.05,
  "ESPN", 0.04,
  "Starz", 0.03,
  "Showtime", 0.03,
  "Paramount", 0.03,
  "YouTube TV", 0.01,
  "Sling TV", 0.01,
  "BritBox", 0.01,
  "Other", 0.02,
)

library(treemapify)
library(scales)

ggplot(market_share, aes(area = share, fill = company, label = paste(company, percent(share), sep = "\n"))) +
  geom_treemap() +
  geom_treemap_text(color = "black", 
                    place = "center",
                    grow = F) +
  scale_fill_brewer(palette = "Set3") +
  labs(title = "Market Share of Streaming Services", 
       caption = "Source: The Wrap. https://www.thewrap.com/netflix-streaming-us-market-share-chart") + 
  theme(legend.position = "none", 
        plot.title = element_text(face = "bold"),
        plot.caption = element_text(hjust = 0),
        plot.title.position = "plot", 
        plot.caption.position = "plot")
```</code></div><small class="shcb-language" id="shcb-language-4"><span class="shcb-language__label">Code language:</span> <span class="shcb-language__name">R</span> <span class="shcb-language__paren">(</span><span class="shcb-language__slug">r</span><span class="shcb-language__paren">)</span></small></pre>


<figure class="wp-block-image size-large"><img loading="lazy" width="1024" height="614" src="https://d2py08v4b28rs4.cloudfront.net/wp-content/uploads/powerpoint-presentation-r-rmarkdown-market-share-1-1024x614.png" alt="A tree map created using R and Rmarkdown for PowerPoint" class="wp-image-3764" srcset="https://d2py08v4b28rs4.cloudfront.net/wp-content/uploads/powerpoint-presentation-r-rmarkdown-market-share-1-1024x614.png 1024w, https://d2py08v4b28rs4.cloudfront.net/wp-content/uploads/powerpoint-presentation-r-rmarkdown-market-share-1-300x180.png 300w, https://d2py08v4b28rs4.cloudfront.net/wp-content/uploads/powerpoint-presentation-r-rmarkdown-market-share-1-768x461.png 768w, https://d2py08v4b28rs4.cloudfront.net/wp-content/uploads/powerpoint-presentation-r-rmarkdown-market-share-1.png 1500w" sizes="(max-width: 1024px) 100vw, 1024px" /></figure>



<p>The next slide is a chart from a website showing opportunities. I have used include_graphics from knitr to add the image. I have also added a caption by using fig.cap option.</p>


<pre class="wp-block-code" aria-describedby="shcb-language-5" data-shcb-language-name="R" data-shcb-language-slug="r"><div><code class="hljs language-r shcb-wrap-lines"><span class="hljs-comment">## ... but it is possible to pull some customers away</span>

```{r streamingservicescomparision, fig.cap="Although Netflix still is the market leader, it dropped from 29% to 20% from 2020 to 2021. Source: https://www.thewrap.com/netflix-streaming-us-market-share-chart"}
knitr::include_graphics(path = "https://www.thewrap.com/wp-content/uploads/2021/03/040221-U.S.-Streaming-Market-Share-2020-versus-2021.png")
```</code></div><small class="shcb-language" id="shcb-language-5"><span class="shcb-language__label">Code language:</span> <span class="shcb-language__name">R</span> <span class="shcb-language__paren">(</span><span class="shcb-language__slug">r</span><span class="shcb-language__paren">)</span></small></pre>


<p>Next up is a case study of Disney+ vs. Netflix. I am using a two-column layout using a special Pandoc syntax. Within each column, I have a column header and a bulleted list. I also added the source in the first column.</p>


<pre class="wp-block-code" aria-describedby="shcb-language-6" data-shcb-language-name="R" data-shcb-language-slug="r"><div><code class="hljs language-r shcb-wrap-lines"><span class="hljs-comment">## A case in point: Disney+</span>

:::::: {.columns}
::: {.column}
Netflix

- 200M Subscribers
- Started <span class="hljs-number">13</span> years ago
- Projected to get 300M subscribers

Source: cnet
:::

::: {.column}
Disney+

- 95M Subscribers
- Started <span class="hljs-number">15</span> months ago
- Projected to reach 250M subscribers
:::

::::::</code></div><small class="shcb-language" id="shcb-language-6"><span class="shcb-language__label">Code language:</span> <span class="shcb-language__name">R</span> <span class="shcb-language__paren">(</span><span class="shcb-language__slug">r</span><span class="shcb-language__paren">)</span></small></pre>


<p>Lastly, we have the conclusion slide with one sentence.&nbsp;</p>


<pre class="wp-block-code" aria-describedby="shcb-language-7" data-shcb-language-name="R" data-shcb-language-slug="r"><div><code class="hljs language-r shcb-wrap-lines"><span class="hljs-comment">## Conclusion</span>

This is a tough market to <span class="hljs-keyword">break</span> into, but you can carve a niche out. 

Don<span class="hljs-string">'t go head-to-head with the big companies.</span></code></div><small class="shcb-language" id="shcb-language-7"><span class="shcb-language__label">Code language:</span> <span class="shcb-language__name">R</span> <span class="shcb-language__paren">(</span><span class="shcb-language__slug">r</span><span class="shcb-language__paren">)</span></small></pre>


<p>After hitting knit, you will see that the PowerPoint presentation has the line plot and the treemap we generated using ggplot.</p>



<p>It also has an external image from a URL.</p>



<p>It created the two-column layout as well.</p>



<p>But you notice some other problems too.</p>



<ul><li>Page numbers are missing.</li><li>The aspect ratio is not 16 by 9.</li><li>The slide titles are aligned in the center and the long title doesn&#8217;t fit neatly.</li><li>Both the charts are taking the same smallish space.</li><li>The conclusion slide looks bland.</li></ul>



<p>We can resolve some of these issues by providing our custom PowerPoint template.&nbsp;</p>



<ol><li>Create a new PowerPoint presentation.&nbsp;</li><li>Then edit the layouts.&nbsp;</li><li>Go to View and click on Slide Master.</li></ol>



<p>Here you will see all the default layout options.</p>



<p><a target="_blank" href="https://support.rstudio.com/hc/en-us/articles/360004672913-Rendering-PowerPoint-Presentations-with-RStudio#templates" rel="noreferrer noopener">RMarkdown and Pandoc&nbsp;</a>currently support only these four layouts:</p>



<ul><li>title</li><li>title and content</li><li>section header, and</li><li>two-column layout</li></ul>



<p>That means we can customize only these four layouts.&nbsp;</p>



<ol><li>Edit the master slide first.</li><li>Click on the “click to edit” text box.</li><li>Change the font type to your liking. I selected “Lato” for the title.</li><li>Next, I am selected Corbel for the body or content area.</li><li>Add custom text boxes in the footer for date and page numbers.</li><li>Insert the date and slide number from the Insert menu.</li><li>Change the font type and size.</li><li>Then delete all the existing boxes for footers.</li><li>I added a logo in the center of the slide. I created this simple logo using Logomakr.com</li></ol>



<p>Let&#8217;s also change the background color of the slides to something light. I chose #edf6f9 as the background color.</p>



<ol><li>Go to Background Styles.&nbsp;</li><li>Format Background.&nbsp;</li><li>Click on Color and then more colors.</li></ol>



<p>Let&#8217;s also change the color of the slide title. I chose #006d77.</p>



<p>Save the PowerPoint template. Add a reference in the YAML in your RMarkdown to this PowerPoint template like this:</p>


<pre class="wp-block-code" aria-describedby="shcb-language-8" data-shcb-language-name="R" data-shcb-language-slug="r"><div><code class="hljs language-r">output:
  powerpoint_presentation:
    reference_doc: template.pptx</code></div><small class="shcb-language" id="shcb-language-8"><span class="shcb-language__label">Code language:</span> <span class="shcb-language__name">R</span> <span class="shcb-language__paren">(</span><span class="shcb-language__slug">r</span><span class="shcb-language__paren">)</span></small></pre>


<p>Hit the &#8220;knit&#8221; button to see the results.</p>



<p>You can change the font type of the plots to match your presentation font type by using the &#8220;showtext&#8221; library. But I leave that for you.</p>



<p>If you have a simple presentation, this can work very well. You pull all your data in one place, create your charts, and add recommendations. You can edit the PowerPoint later, of course, but for <a href="https://nandeshwar.info/data-science-2/tableau-vs-r/">repeatability</a>, we want to keep as many things in RMarkdown as possible.</p>



<p>One disadvantage of this approach using Pandoc is that we are limited to only four layouts. If you wanted to add one title and one subtitle to your layout, it won&#8217;t work. Another disadvantage is an image or a table can&#8217;t exist with other content. If you add content to a slide that already has a chart or an image, it will get pushed to a new slide.</p>



<p>You can get around this problem using patchwork or similar libraries to place side-by-side graphs, tables, and text. But that may not give you the best resolution. We also can&#8217;t add random texts or call-outs.</p>



<p>There&#8217;s an option to add background images in HTML presentations, but it doesn&#8217;t work for PowerPoint.</p>



<p>Let&#8217;s explore the office and officedown libraries for further customization. You can use the officedown in YAML as officedown::rpptx_document and provide a template, but I don&#8217;t know the exact advantages of doing so.</p>



<p>Instead, we will create an R file and build the PowerPoint presentation one slide at a time.</p>



<p>Let&#8217;s load the officer and magrittr libraries.</p>


<pre class="wp-block-code" aria-describedby="shcb-language-9" data-shcb-language-name="R" data-shcb-language-slug="r"><div><code class="hljs language-r shcb-wrap-lines"><span class="hljs-keyword">library</span>(officer)
<span class="hljs-keyword">library</span>(magrittr)</code></div><small class="shcb-language" id="shcb-language-9"><span class="shcb-language__label">Code language:</span> <span class="shcb-language__name">R</span> <span class="shcb-language__paren">(</span><span class="shcb-language__slug">r</span><span class="shcb-language__paren">)</span></small></pre>


<p>Next, we will load our template file using the read_pptx function in a variable called my_ppt.</p>


<pre class="wp-block-code" aria-describedby="shcb-language-10" data-shcb-language-name="R" data-shcb-language-slug="r"><div><code class="hljs language-r shcb-wrap-lines">my_ppt &lt;- read_pptx(path = <span class="hljs-string">"template.pptx"</span>)</code></div><small class="shcb-language" id="shcb-language-10"><span class="shcb-language__label">Code language:</span> <span class="shcb-language__name">R</span> <span class="shcb-language__paren">(</span><span class="shcb-language__slug">r</span><span class="shcb-language__paren">)</span></small></pre>


<p>With the layout_summary function, we can check all the layouts available to us.</p>


<pre class="wp-block-code" aria-describedby="shcb-language-11" data-shcb-language-name="R" data-shcb-language-slug="r"><div><code class="hljs language-r shcb-wrap-lines">officer::layout_summary(my_ppt)</code></div><small class="shcb-language" id="shcb-language-11"><span class="shcb-language__label">Code language:</span> <span class="shcb-language__name">R</span> <span class="shcb-language__paren">(</span><span class="shcb-language__slug">r</span><span class="shcb-language__paren">)</span></small></pre>


<p>Add a new layout to your PowerPoint template with one text box and increase the font size to 80. Rename this layout to &#8216;one big number&#8217;.</p>



<p>Save the template and read the file again.</p>



<p>Now when you run layout_summary you will see that this new layout is available to us.</p>



<p>The layout_properties function lets us see the available content areas we can change.</p>


<pre class="wp-block-code" aria-describedby="shcb-language-12" data-shcb-language-name="R" data-shcb-language-slug="r"><div><code class="hljs language-r shcb-wrap-lines">layout_properties(my_ppt, layout =<span class="hljs-string">"Title Slide"</span>)</code></div><small class="shcb-language" id="shcb-language-12"><span class="shcb-language__label">Code language:</span> <span class="shcb-language__name">R</span> <span class="shcb-language__paren">(</span><span class="shcb-language__slug">r</span><span class="shcb-language__paren">)</span></small></pre>


<p>This is an important function. Notice the `ph type` argument value. We need to use the value of `ph type` to set the content.</p>



<p>It will be easier with an example.</p>


<pre class="wp-block-code" aria-describedby="shcb-language-13" data-shcb-language-name="R" data-shcb-language-slug="r"><div><code class="hljs language-r shcb-code-table shcb-line-numbers shcb-wrap-lines"><span class='shcb-loc'><span>my_ppt &lt;- add_slide(my_ppt, 
</span></span><span class='shcb-loc'><span>                    layout = <span class="hljs-string">"Title Slide"</span>, 
</span></span><span class='shcb-loc'><span>                    master = <span class="hljs-string">"Office Theme"</span>) %&gt;%
</span></span><span class='shcb-loc'><span>  ph_with(value = <span class="hljs-string">"Streaming Services Market Study"</span>, 
</span></span><span class='shcb-loc'><span>          location = ph_location_type(type = <span class="hljs-string">"ctrTitle"</span>)) %&gt;%
</span></span><span class='shcb-loc'><span>  ph_with(value = <span class="hljs-string">"Best Brians Consulting"</span>, 
</span></span><span class='shcb-loc'><span>          location = ph_location_type(type = <span class="hljs-string">"subTitle"</span>))
</span></span></code></div><small class="shcb-language" id="shcb-language-13"><span class="shcb-language__label">Code language:</span> <span class="shcb-language__name">R</span> <span class="shcb-language__paren">(</span><span class="shcb-language__slug">r</span><span class="shcb-language__paren">)</span></small></pre>


<p>Let&#8217;s create our first slide with the presentation title slide.</p>



<p>To add a slide we use add_slide function. To this function, we specify the layout and the theme to use.</p>



<p>To add content we use the ph_with function. The value argument of this function specifies the content.</p>



<p>And now the important part: where do we want the content to go? To find the location, we use ph_location_type function and pass the type we saw in ph type.</p>



<p>For the opening, title slide the box is called ctrTitle.</p>



<p>Let&#8217;s add our title.</p>



<p>Also, add the author to ph type of subtitle.</p>



<p>Let&#8217;s save this new presentation using the print command.</p>


<pre class="wp-block-code" aria-describedby="shcb-language-14" data-shcb-language-name="R" data-shcb-language-slug="r"><div><code class="hljs language-r shcb-wrap-lines">print(my_ppt, target = <span class="hljs-string">"officer-presentation.pptx"</span>) </code></div><small class="shcb-language" id="shcb-language-14"><span class="shcb-language__label">Code language:</span> <span class="shcb-language__name">R</span> <span class="shcb-language__paren">(</span><span class="shcb-language__slug">r</span><span class="shcb-language__paren">)</span></small></pre>


<p>Open the PowerPoint and check whether you can see the title and subtitle.</p>



<p>Let&#8217;s add the second slide with recommendations. But check the available shape types using the layout_properties function for the Title and Content layout. You will that we have a title and body types available.</p>



<p>Let&#8217;s add &#8216;recommendations&#8217; as the title and pass the bullet points for this slide as a vector to the body location.</p>


<pre class="wp-block-code" aria-describedby="shcb-language-15" data-shcb-language-name="R" data-shcb-language-slug="r"><div><code class="hljs language-r shcb-wrap-lines">my_ppt &lt;- add_slide(my_ppt, 
                    layout = <span class="hljs-string">"Title and Content"</span>, 
                    master = <span class="hljs-string">"Office Theme"</span>) %&gt;%
  ph_with(value = <span class="hljs-string">"Recommendations"</span>, 
          location = ph_location_type(type = <span class="hljs-string">"title"</span>)) %&gt;%
  ph_with(value = c(<span class="hljs-string">"This is a tough market with many established companies"</span>,
                    <span class="hljs-string">"You will need a unique strategy to set yourself apart"</span>,
                    <span class="hljs-string">"We recommend you enter in a narrow market and not compete with the big companies"</span>), 
          location = ph_location_type(type = <span class="hljs-string">"body"</span>))</code></div><small class="shcb-language" id="shcb-language-15"><span class="shcb-language__label">Code language:</span> <span class="shcb-language__name">R</span> <span class="shcb-language__paren">(</span><span class="shcb-language__slug">r</span><span class="shcb-language__paren">)</span></small></pre>


<p>Save the document again and see what it looks like.</p>



<p>Next, add a slide with the share prices chart, but we need to create this chart first. This code will save the chart in a ggplot object called share_prices_plot.</p>


<pre class="wp-block-code" aria-describedby="shcb-language-16" data-shcb-language-name="R" data-shcb-language-slug="r"><div><code class="hljs language-r shcb-wrap-lines">my_ppt &lt;- add_slide(my_ppt, 
                    layout = <span class="hljs-string">"Title and Content"</span>, 
<span class="hljs-keyword">library</span>(tidyquant)
<span class="hljs-keyword">library</span>(lubridate)
<span class="hljs-keyword">library</span>(ggplot2)
<span class="hljs-keyword">library</span>(ggthemes)
<span class="hljs-keyword">library</span>(dplyr)

my_tickers &lt;- c(<span class="hljs-string">"NFLX"</span>, <span class="hljs-string">"DIS"</span>, <span class="hljs-string">"T"</span>, <span class="hljs-string">"CMCSA"</span>)

stock_prices &lt;- tq_get(x = my_tickers,
                       from = Sys.Date() %m-% years(<span class="hljs-number">5</span>),
                       to = Sys.Date(),
                       get = <span class="hljs-string">"stock.prices"</span>)

g &lt;- ggplot(stock_prices,
            aes(x = date, 
                y = close, 
                group = symbol, 
                color = symbol,
                label = symbol)) +
  geom_line()


label_data &lt;- tribble(
  ~date,       ~symbol, ~close,
  as.Date(<span class="hljs-string">"2019-01-02"</span>), <span class="hljs-string">'NFLX'</span>, <span class="hljs-number">220</span>,
  as.Date(<span class="hljs-string">"2019-01-02"</span>), <span class="hljs-string">'DIS'</span>, <span class="hljs-number">140</span>,
  as.Date(<span class="hljs-string">"2019-01-02"</span>), <span class="hljs-string">'CMCSA'</span>, <span class="hljs-number">48</span>,
  as.Date(<span class="hljs-string">"2021-01-02"</span>), <span class="hljs-string">'T'</span>, <span class="hljs-number">40</span>,
  
)

share_prices_plot &lt;- g + 
  theme_wsj() +
  geom_text(data = label_data, aes(x = date, y = close, label = symbol, color = symbol)) +
  ggtitle(label = <span class="hljs-string">"Daily Closing Share Prices"</span>) +
  theme(legend.position = <span class="hljs-string">"none"</span>, 
        title = element_text(size = rel(<span class="hljs-number">1.3</span>), family = <span class="hljs-string">"Arial"</span>),
        plot.title.position = <span class="hljs-string">"plot"</span>) </code></div><small class="shcb-language" id="shcb-language-16"><span class="shcb-language__label">Code language:</span> <span class="shcb-language__name">R</span> <span class="shcb-language__paren">(</span><span class="shcb-language__slug">r</span><span class="shcb-language__paren">)</span></small></pre>


<p>We will again use title and body locations to add this plot and the title. The only difference is that we can provide a resolution of the chart.</p>


<pre class="wp-block-code" aria-describedby="shcb-language-17" data-shcb-language-name="R" data-shcb-language-slug="r"><div><code class="hljs language-r shcb-wrap-lines">my_ppt &lt;- add_slide(my_ppt, 
                    layout = <span class="hljs-string">"Title and Content"</span>, 
                    master = <span class="hljs-string">"Office Theme"</span>) %&gt;%
  ph_with(value = <span class="hljs-string">"Companies such as Netflix have cemented their roles as streaming entertainment providers"</span>, 
          location = ph_location_type(type = <span class="hljs-string">"title"</span>)) %&gt;%
  ph_with(value =  share_prices_plot, 
          location = ph_location_type(type = <span class="hljs-string">"body"</span>),
          res = <span class="hljs-number">300</span>)</code></div><small class="shcb-language" id="shcb-language-17"><span class="shcb-language__label">Code language:</span> <span class="shcb-language__name">R</span> <span class="shcb-language__paren">(</span><span class="shcb-language__slug">r</span><span class="shcb-language__paren">)</span></small></pre>


<p>Next, let&#8217;s add the tree map with the market share</p>


<pre class="wp-block-code" aria-describedby="shcb-language-18" data-shcb-language-name="R" data-shcb-language-slug="r"><div><code class="hljs language-r shcb-wrap-lines">market_share &lt;- tribble(
  ~company, ~share,
  <span class="hljs-string">"Netflix"</span>, <span class="hljs-number">0.2</span>,
  <span class="hljs-string">"Amazon Prime"</span>, <span class="hljs-number">0.16</span>,
  <span class="hljs-string">"Hulu"</span>, <span class="hljs-number">0.13</span>,
  <span class="hljs-string">"Max"</span>, <span class="hljs-number">0.12</span>,
  <span class="hljs-string">"Disney"</span>, <span class="hljs-number">0.11</span>,
  <span class="hljs-string">"Apple"</span>, <span class="hljs-number">0.05</span>,
  <span class="hljs-string">"Peacock"</span>, <span class="hljs-number">0.05</span>,
  <span class="hljs-string">"ESPN"</span>, <span class="hljs-number">0.04</span>,
  <span class="hljs-string">"Starz"</span>, <span class="hljs-number">0.03</span>,
  <span class="hljs-string">"Showtime"</span>, <span class="hljs-number">0.03</span>,
  <span class="hljs-string">"Paramount"</span>, <span class="hljs-number">0.03</span>,
  <span class="hljs-string">"YouTube TV"</span>, <span class="hljs-number">0.01</span>,
  <span class="hljs-string">"Sling TV"</span>, <span class="hljs-number">0.01</span>,
  <span class="hljs-string">"BritBox"</span>, <span class="hljs-number">0.01</span>,
  <span class="hljs-string">"Other"</span>, <span class="hljs-number">0.02</span>,
)

<span class="hljs-keyword">library</span>(treemapify)
<span class="hljs-keyword">library</span>(scales)

market_share_tree_map &lt;- ggplot(market_share, aes(area = share, fill = company, label = paste(company, percent(share), sep = <span class="hljs-string">"\n"</span>))) +
  geom_treemap() +
  geom_treemap_text(color = <span class="hljs-string">"black"</span>, 
                    place = <span class="hljs-string">"center"</span>,
                    grow = <span class="hljs-literal">F</span>) +
  scale_fill_brewer(palette = <span class="hljs-string">"Set3"</span>) +
  labs(title = <span class="hljs-string">"Market Share of Streaming Services"</span>, 
       caption = <span class="hljs-string">"Source: The Wrap. https://www.thewrap.com/netflix-streaming-us-market-share-chart"</span>) + 
  theme(legend.position = <span class="hljs-string">"none"</span>, 
        plot.title = element_text(face = <span class="hljs-string">"bold"</span>),
        plot.caption = element_text(hjust = <span class="hljs-number">0</span>),
        plot.title.position = <span class="hljs-string">"plot"</span>, 
        plot.caption.position = <span class="hljs-string">"plot"</span>)</code></div><small class="shcb-language" id="shcb-language-18"><span class="shcb-language__label">Code language:</span> <span class="shcb-language__name">R</span> <span class="shcb-language__paren">(</span><span class="shcb-language__slug">r</span><span class="shcb-language__paren">)</span></small></pre>

<pre class="wp-block-code" aria-describedby="shcb-language-19" data-shcb-language-name="R" data-shcb-language-slug="r"><div><code class="hljs language-r shcb-wrap-lines">my_ppt &lt;- add_slide(my_ppt, 
                    layout = <span class="hljs-string">"Title and Content"</span>, 
                    master = <span class="hljs-string">"Office Theme"</span>) %&gt;%
  ph_with(value = <span class="hljs-string">"The market is packed with many streaming options"</span>, 
          location = ph_location_type(type = <span class="hljs-string">"title"</span>)) %&gt;%
  ph_with(value =  market_share_tree_map, 
          location = ph_location_type(type = <span class="hljs-string">"body"</span>),
          res = <span class="hljs-number">300</span>)</code></div><small class="shcb-language" id="shcb-language-19"><span class="shcb-language__label">Code language:</span> <span class="shcb-language__name">R</span> <span class="shcb-language__paren">(</span><span class="shcb-language__slug">r</span><span class="shcb-language__paren">)</span></small></pre>


<p>With the default RMarkdown/Pandoc option, we can place an image and text side-by-side, but let&#8217;s try that with the officer package and the &#8216;Picture with Caption&#8217; layout.</p>



<p>We can see we have body, title, and pic locations available for us with the &#8216;Picture with Caption&#8217; layout.</p>



<p>Let&#8217;s add the year-over-year market share comparison chart from the wrap in the pic location and some text in the body location.</p>


<pre class="wp-block-code" aria-describedby="shcb-language-20" data-shcb-language-name="R" data-shcb-language-slug="r"><div><code class="hljs language-r shcb-wrap-lines">ph_wiyoy_market_comp_img &lt;- file.path(<span class="hljs-string">"wrap-yoy-market-share.png"</span>)

my_ppt &lt;- add_slide(my_ppt, 
                    layout = <span class="hljs-string">"Picture with Caption"</span>, 
                    master = <span class="hljs-string">"Office Theme"</span>) %&gt;%
  ph_with(value = <span class="hljs-string">"... but it is possible to pull some customers away"</span>, 
          location = ph_location_type(type = <span class="hljs-string">"title"</span>)) %&gt;%
  ph_with(value =  external_img(yoy_market_comp_img), 
          location = ph_location_type(type = <span class="hljs-string">"pic"</span>)) %&gt;%
  ph_with(value =  <span class="hljs-string">"Although Netflix still is the market leader, it dropped from 29% to 20% from 2020 to 2021. Source: https://www.thewrap.com/netflix-streaming-us-market-share-chart"</span>, 
          location = ph_location_type(type = <span class="hljs-string">"body"</span>))</code></div><small class="shcb-language" id="shcb-language-20"><span class="shcb-language__label">Code language:</span> <span class="shcb-language__name">R</span> <span class="shcb-language__paren">(</span><span class="shcb-language__slug">r</span><span class="shcb-language__paren">)</span></small></pre>


<p>Let&#8217;s add and save the file again.</p>



<p>I don&#8217;t like the location of the source text. I want to move it close to the footer.</p>



<p>To do so, I need to first create a style or properties I want to apply to this text.</p>



<p>We will use the fp_text function to make the text color light gray and set the font size to 8.</p>



<p>We will name this variable source_box_style.&nbsp;</p>


<pre class="wp-block-code" aria-describedby="shcb-language-21" data-shcb-language-name="R" data-shcb-language-slug="r"><div><code class="hljs language-r shcb-wrap-lines">source_box_style &lt;- fp_text(bold = <span class="hljs-literal">FALSE</span>, color = <span class="hljs-string">"gray50"</span>, font.size = <span class="hljs-number">8</span>)</code></div><small class="shcb-language" id="shcb-language-21"><span class="shcb-language__label">Code language:</span> <span class="shcb-language__name">R</span> <span class="shcb-language__paren">(</span><span class="shcb-language__slug">r</span><span class="shcb-language__paren">)</span></small></pre>


<p>To add a text box at any location, we need to provide the text box location using the left, top, and other coordinates to the ph_location function.&nbsp;</p>


<pre class="wp-block-code" aria-describedby="shcb-language-22" data-shcb-language-name="R" data-shcb-language-slug="r"><div><code class="hljs language-r shcb-wrap-lines">source_box_location &lt;- ph_location(
  left = <span class="hljs-number">1</span>,
  top = <span class="hljs-number">6.6</span>,
  height = <span class="hljs-number">0.3</span>,
  width = <span class="hljs-number">11</span>,
  newlabel = <span class="hljs-string">"source_box"</span>
)</code></div><small class="shcb-language" id="shcb-language-22"><span class="shcb-language__label">Code language:</span> <span class="shcb-language__name">R</span> <span class="shcb-language__paren">(</span><span class="shcb-language__slug">r</span><span class="shcb-language__paren">)</span></small></pre>


<p>And to get the text box to show up on the slide, we need to use the fpar function, which creates a paragraph object for PowerPoint. In fpar, we will wrap our source text and provide the style property we created earlier with the ftext function.&nbsp;</p>


<pre class="wp-block-code" aria-describedby="shcb-language-23" data-shcb-language-name="R" data-shcb-language-slug="r"><div><code class="hljs language-r shcb-wrap-lines">source_box_location &lt;- ph_location(
  left = <span class="hljs-number">1</span>,
  top = <span class="hljs-number">6.6</span>,
  height = <span class="hljs-number">0.3</span>,
my_ppt &lt;- add_slide(my_ppt, 
                    layout = <span class="hljs-string">"Picture with Caption"</span>, 
                    master = <span class="hljs-string">"Office Theme"</span>) %&gt;%
  ph_with(value = <span class="hljs-string">"... but it is possible to pull some customers away"</span>, 
          location = ph_location_type(type = <span class="hljs-string">"title"</span>)) %&gt;%
  ph_with(value =  external_img(yoy_market_comp_img), 
          location = ph_location_type(type = <span class="hljs-string">"pic"</span>)) %&gt;%
  ph_with(value =  <span class="hljs-string">"Although Netflix still is the market leader, it dropped from 29% to 20% from 2020 to 2021."</span>, 
          location = ph_location_type(type = <span class="hljs-string">"body"</span>)) %&gt;% 
  ph_with(value =  fpar(ftext(text = <span class="hljs-string">"Source: https://www.thewrap.com/netflix-streaming-us-market-share-chart"</span>, 
                         prop = source_box_style)),
          location = source_box_location)</code></div><small class="shcb-language" id="shcb-language-23"><span class="shcb-language__label">Code language:</span> <span class="shcb-language__name">R</span> <span class="shcb-language__paren">(</span><span class="shcb-language__slug">r</span><span class="shcb-language__paren">)</span></small></pre>


<p>Here are the steps again to create a text box:</p>



<ol><li>Define style or properties</li><li>Decide and provide the coordinates to ph_location function</li><li>Format the given text using the style and ftext function, and</li><li>finally wrap the ftext call in fpar function to create a paragraph object</li></ol>



<p>You can find the location coordinates by creating a box in PowerPoint and adjusting it. To make it easier, I saved the location object in a variable called source_box_location.&nbsp;</p>



<p>Save the PowerPoint presentation again.&nbsp;</p>



<p>Using this same principle, let&#8217;s create a callout box and add it to the share prices slide. I created callout_box_style and callout_box_location.</p>


<pre class="wp-block-code" aria-describedby="shcb-language-24" data-shcb-language-name="R" data-shcb-language-slug="r"><div><code class="hljs language-r shcb-wrap-lines">callout_box_style &lt;- fp_text(bold = <span class="hljs-literal">TRUE</span>, color = <span class="hljs-string">"black"</span>, font.size = <span class="hljs-number">16</span>, font.family = <span class="hljs-string">"Corbel"</span>)

callout_box_location &lt;- ph_location(
  left = <span class="hljs-number">2.6</span>,
  top = <span class="hljs-number">3</span>,
  height = <span class="hljs-number">2</span>,
  width = <span class="hljs-number">2</span>,
  newlabel = <span class="hljs-string">"call_out_box"</span>
)</code></div><small class="shcb-language" id="shcb-language-24"><span class="shcb-language__label">Code language:</span> <span class="shcb-language__name">R</span> <span class="shcb-language__paren">(</span><span class="shcb-language__slug">r</span><span class="shcb-language__paren">)</span></small></pre>


<p>To select the slide with share prices, we need to use &#8220;on_slide&#8221; function and provide the slide number.&nbsp;</p>


<pre class="wp-block-code" aria-describedby="shcb-language-25" data-shcb-language-name="R" data-shcb-language-slug="r"><div><code class="hljs language-r shcb-wrap-lines">my_ppt &lt;- on_slide(my_ppt, index = <span class="hljs-number">4</span>) %&gt;%
  ph_with(value =  fpar(ftext(text = <span class="hljs-string">"Netflix had some stumbles but it is going up steadily"</span>, 
                              prop = callout_box_style)),
          location = callout_box_location)</code></div><small class="shcb-language" id="shcb-language-25"><span class="shcb-language__label">Code language:</span> <span class="shcb-language__name">R</span> <span class="shcb-language__paren">(</span><span class="shcb-language__slug">r</span><span class="shcb-language__paren">)</span></small></pre>


<p>Make this change and then save the presentation again.</p>


<pre class="wp-block-code" aria-describedby="shcb-language-26" data-shcb-language-name="R" data-shcb-language-slug="r"><div><code class="hljs language-r shcb-wrap-lines">print(my_ppt, target = <span class="hljs-string">"officer-presentation.pptx"</span>)</code></div><small class="shcb-language" id="shcb-language-26"><span class="shcb-language__label">Code language:</span> <span class="shcb-language__name">R</span> <span class="shcb-language__paren">(</span><span class="shcb-language__slug">r</span><span class="shcb-language__paren">)</span></small></pre>


<p>Of course, it would be easier to draw the text box later in the PowerPoint, but this gives an additional tool to place text boxes with dynamic content at our chosen locations.&nbsp;</p>



<p>For the final addition, I would like to add a background image that bleeds into the slide and add some text on it. We will use the one big number layout we created earlier.&nbsp;</p>



<p>Let&#8217;s check the layout properties. We see the property name as &#8220;one big number.&#8221;</p>



<p>I downloaded a photo from unsplash. I named it tv-3.jpg, whose path I am storing in bg_image_path variable.&nbsp;</p>


<pre class="wp-block-code" aria-describedby="shcb-language-27" data-shcb-language-name="R" data-shcb-language-slug="r"><div><code class="hljs language-r shcb-wrap-lines">bg_image_path &lt;- file.path(<span class="hljs-string">"tv-3.jpg"</span>)</code></div><small class="shcb-language" id="shcb-language-27"><span class="shcb-language__label">Code language:</span> <span class="shcb-language__name">R</span> <span class="shcb-language__paren">(</span><span class="shcb-language__slug">r</span><span class="shcb-language__paren">)</span></small></pre>


<p>Next, let&#8217;s add a slide using the add_slide function and &#8220;one big number&#8221; as the layout.&nbsp;</p>



<p>Then we will use the ph_with function to place the external image and 13.38 as the width and height. We will use the ph_location_fullsize function to return the proper dimensions from the slide layout.&nbsp;</p>



<p>Lastly, we will use another ph_with function to add the text of &#8220;31% drop&#8221; at the text placeholder location.&nbsp;</p>


<pre class="wp-block-code" aria-describedby="shcb-language-28" data-shcb-language-name="R" data-shcb-language-slug="r"><div><code class="hljs language-r shcb-wrap-lines">bmy_ppt &lt;- add_slide(my_ppt, 
                    layout = <span class="hljs-string">"one big number"</span>, 
                    master = <span class="hljs-string">"Office Theme"</span>) %&gt;%
  ph_with(value =  external_img(bg_image_path, width = <span class="hljs-number">13.38</span>, height = <span class="hljs-number">13.38</span>), 
         location = ph_location_fullsize(),
          use_loc_size = <span class="hljs-literal">FALSE</span>) %&gt;% 
  ph_with(value = <span class="hljs-string">"31% drop"</span>,
          location = ph_location_type(type = <span class="hljs-string">"body"</span>))

print(my_ppt, target = <span class="hljs-string">"officer-presentation.pptx"</span>)</code></div><small class="shcb-language" id="shcb-language-28"><span class="shcb-language__label">Code language:</span> <span class="shcb-language__name">R</span> <span class="shcb-language__paren">(</span><span class="shcb-language__slug">r</span><span class="shcb-language__paren">)</span></small></pre>


<div class="wp-block-image is-style-default"><figure class="aligncenter size-large"><img loading="lazy" width="1024" height="586" src="https://d2py08v4b28rs4.cloudfront.net/wp-content/uploads/powerpoint-presentation-r-rmarkdown-pandoc-template-officer-1024x586.gif" alt="powerpoint presentation r rmarkdown a quick demo" class="wp-image-3765" srcset="https://d2py08v4b28rs4.cloudfront.net/wp-content/uploads/powerpoint-presentation-r-rmarkdown-pandoc-template-officer-1024x586.gif 1024w, https://d2py08v4b28rs4.cloudfront.net/wp-content/uploads/powerpoint-presentation-r-rmarkdown-pandoc-template-officer-300x172.gif 300w, https://d2py08v4b28rs4.cloudfront.net/wp-content/uploads/powerpoint-presentation-r-rmarkdown-pandoc-template-officer-768x439.gif 768w" sizes="(max-width: 1024px) 100vw, 1024px" /></figure></div>



<p>These steps should give you a good-looking <strong>PowerPoint presentation using RMarkdown</strong>. Here&#8217;s the summary of our process:</p>



<ul><li>We created a PowerPoint presentation using the default knitr options in RMarkdown. It worked well but didn&#8217;t offer customization.&nbsp;</li><li>Therefore, we created a PowerPoint template to make the presentation look better, and it did!</li><li>But for more control, we looked at the officer R package and created our layouts, and placed objects where we wanted them.</li></ul>



<p><strong>I hope this helps you when you are trying to create PowerPoint presentations using R and RMarkdown. Let me know what you think by leaving a comment.&nbsp;</strong></p>
<span class="tve-leads-two-step-trigger tl-2step-trigger-2626"></span><span class="tve-leads-two-step-trigger tl-2step-trigger-0"></span><p>The post <a rel="nofollow" href="https://nandeshwar.info/data-science-2/create-powerpoint-presentations-r-and-rmarkdown/">Create PowerPoint Presentations with R and RMarkdown</a> appeared first on <a rel="nofollow" href="https://nandeshwar.info">nandeshwar.info</a>.</p>
]]></content:encoded>
					
		
		
			</item>
		<item>
		<title>Best Books on Data Visualization</title>
		<link>https://nandeshwar.info/books/best-data-visualization-books/</link>
		
		<dc:creator><![CDATA[n.ashutosh]]></dc:creator>
		<pubDate>Tue, 01 Sep 2020 05:57:41 +0000</pubDate>
				<category><![CDATA[Books]]></category>
		<category><![CDATA[data visualization]]></category>
		<guid isPermaLink="false">https://nandeshwar.info/?p=3342</guid>

					<description><![CDATA[<p>Best books on data visualization is a bold claim, but my best doesn't have to be someone else's best. 🙂 I have benefitted and learned from these books, and you would enjoy reading them too. Although it is an old book, its concepts are still relevant today. William Cleveland, a professor of Statistics and Computer [&#8230;]</p>
<p>The post <a rel="nofollow" href="https://nandeshwar.info/books/best-data-visualization-books/">Best Books on Data Visualization</a> appeared first on <a rel="nofollow" href="https://nandeshwar.info">nandeshwar.info</a>.</p>
]]></description>
										<content:encoded><![CDATA[<div class="thrv_wrapper thrv_text_element">
<p>Best books on <a href="https://ds4fr.nandeshwar.info/data-visualization-1.html" target="_blank" class="tve-froala fr-basic" style="outline: none;">data visualization</a> is a bold claim, but my best doesn't have to be someone else's best. <img src="https://s.w.org/images/core/emoji/13.1.0/72x72/1f642.png" alt="🙂" class="wp-smiley" style="height: 1em; max-height: 1em;" /> I have benefitted and learned from these books, and you would enjoy reading them too.</p>
</div>
<div class="thrv_wrapper thrv_contentbox_shortcode thrv-content-box">
<div class="tve-content-box-background"></div>
<div class="tve-cb">
<div class="thrv_wrapper thrv_contentbox_shortcode thrv-content-box">
<div class="tve-content-box-background"></div>
<div class="tve-cb">
<div class="thrv_wrapper tve_wp_shortcode">
<div class="tve_shortcode_raw" style="display: none"></div>
<div class="tve_shortcode_rendered">
<h3><a class="aawp-link" href="https://www.amazon.com/dp/0963488406?tag=nandeshwarinf-20&linkCode=ogi&th=1&psc=1" title="Visualizing Data" target="_blank" rel="nofollow"  data-aawp-product-id="0963488406" data-aawp-product-title="Visualizing Data" data-aawp-click-tracking="true">Visualizing Data</a></h3>
</div>
</div>
<div class="thrv_wrapper thrv-columns" style="--tcb-col-el-width:1076;" data-css="tve-u-16f15bc11a9">
<div class="tcb-flex-row tcb-resized tcb--cols--2" data-css="tve-u-16f15bb53a4">
<div class="tcb-flex-col c-33" data-css="tve-u-16f15b15bbf" style="">
<div class="tcb-col">
<div class="thrv_wrapper tve_wp_shortcode" data-css="tve-u-16f15b692bd">
<div class="tve_shortcode_raw" style="display: none"></div>
<div class="tve_shortcode_rendered"><a  data-aawp-product-id="0963488406" data-aawp-product-title="Visualizing Data" data-aawp-click-tracking="true" href="https://www.amazon.com/dp/0963488406?tag=nandeshwarinf-20&linkCode=ogi&th=1&psc=1" title="Visualizing Data" rel="nofollow" target="_blank"><img src="https://m.media-amazon.com/images/I/41T1OwLJAML._SL160_.jpg" alt="Visualizing Data" /></a>
</div>
</div>
</div>
</div>
<div class="tcb-flex-col c-66" data-css="tve-u-16f15b15bc2" style="">
<div class="tcb-col">
<div class="thrv_wrapper thrv_text_element">
<p>Although it is an old book, its concepts are still relevant today. <a href="http://ml.stat.purdue.edu/stat695t/" target="_blank" class="tve-froala" style="outline: none;">William Cleveland</a>, a professor of Statistics and Computer Science at Purdue University, provides great examples of using <a href="https://nandeshwar.info/data-visualization/waffle-chart-vs-dot-plot-vs-pie-charts/" target="_blank" class="tve-froala" style="outline: none;">simple charts</a> and exercising caution with aspect ratios. It is written for an academic audience, but it is hardly intimidating to practitioners. The best part: you can find the <a href="https://stats.idre.ucla.edu/other/examples/vizdata/" target="_blank" class="tve-froala" style="outline: none;">R code used for creating the charts</a> from this book.</p>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
<div class="thrv_wrapper tve_wp_shortcode">
<div class="tve_shortcode_raw" style="display: none"></div>
<div class="tve_shortcode_rendered">
<h3><a class="aawp-link" href="https://www.amazon.com/dp/0961392142?tag=nandeshwarinf-20&linkCode=ogi&th=1&psc=1" title="The Visual Display of Quantitative Information, 2nd Ed." target="_blank" rel="nofollow"  data-aawp-product-id="0961392142" data-aawp-product-title="The Visual Display of Quantitative Information 2nd Ed." data-aawp-click-tracking="true">The Visual Display of Quantitative Information, 2nd Ed.</a></h3>
</div>
</div>
<div class="thrv_wrapper thrv-columns" style="--tcb-col-el-width:1078;" data-css="tve-u-16f15bc11a9">
<div class="tcb-flex-row tcb-resized tcb--cols--2" data-css="tve-u-16f15bb53a4">
<div class="tcb-flex-col c-33" data-css="tve-u-16f15b15bbf" style="">
<div class="tcb-col">
<div class="thrv_wrapper tve_wp_shortcode" data-css="tve-u-16f15b692bd">
<div class="tve_shortcode_raw" style="display: none"></div>
<div class="tve_shortcode_rendered"><a  data-aawp-product-id="0961392142" data-aawp-product-title="The Visual Display of Quantitative Information 2nd Ed." data-aawp-click-tracking="true" href="https://www.amazon.com/dp/0961392142?tag=nandeshwarinf-20&linkCode=ogi&th=1&psc=1" title="The Visual Display of Quantitative Information, 2nd Ed." rel="nofollow" target="_blank"><img src="https://m.media-amazon.com/images/I/41lXBygkVwL._SL160_.jpg" alt="The Visual Display of Quantitative Information, 2nd Ed." /></a>
</div>
</div>
</div>
</div>
<div class="tcb-flex-col c-66" data-css="tve-u-16f15b15bc2" style="">
<div class="tcb-col">
<div class="thrv_wrapper thrv_text_element">
<p>A <a href="https://nandeshwar.info/data-visualization/economist-data-visualization-us-map-using-r/" target="_blank">data visualization</a> book list will remain incomplete if missing Tufte's books. <a href="https://www.edwardtufte.com/bboard" target="_blank" class="tve-froala" style="outline: none;">Edward Tufte</a>, with his beautiful design of the book and charts, has led the charge against bad <a href="https://nandeshwar.info/data-visualization/improving-data-visualizations-giving-usa-report/" target="_blank" class="tve-froala" style="outline: none;">data visualization a.k.a chart junk</a>.</p>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
<div class="thrv_wrapper thrv_contentbox_shortcode thrv-content-box">
<div class="tve-content-box-background"></div>
<div class="tve-cb">
<div class="thrv_wrapper tve_wp_shortcode">
<div class="tve_shortcode_raw" style="display: none"></div>
<div class="tve_shortcode_rendered">
<h3><a class="aawp-link" href="https://www.amazon.com/dp/0970601972?tag=nandeshwarinf-20&linkCode=ogi&th=1&psc=1" title="Show Me the Numbers: Designing Tables and Graphs to Enlighten" target="_blank" rel="nofollow"  data-aawp-product-id="0970601972" data-aawp-product-title="Show Me the Numbers  Designing Tables and Graphs to Enlighten" data-aawp-click-tracking="true">Show Me the Numbers: Designing Tables and Graphs to Enlighten</a></h3>
</div>
</div>
<div class="thrv_wrapper thrv-columns" style="--tcb-col-el-width:1078;" data-css="tve-u-16f15bc11a9">
<div class="tcb-flex-row tcb-resized tcb--cols--2" data-css="tve-u-16f15bb53a4">
<div class="tcb-flex-col c-33" data-css="tve-u-16f15b15bbf" style="">
<div class="tcb-col">
<div class="thrv_wrapper tve_wp_shortcode" data-css="tve-u-16f15b692bd">
<div class="tve_shortcode_raw" style="display: none"></div>
<div class="tve_shortcode_rendered"><a  data-aawp-product-id="0970601972" data-aawp-product-title="Show Me the Numbers  Designing Tables and Graphs to Enlighten" data-aawp-click-tracking="true" href="https://www.amazon.com/dp/0970601972?tag=nandeshwarinf-20&linkCode=ogi&th=1&psc=1" title="Show Me the Numbers: Designing Tables and Graphs to Enlighten" rel="nofollow" target="_blank"><img src="https://m.media-amazon.com/images/I/41aMPkpTLvL._SL160_.jpg" alt="Show Me the Numbers: Designing Tables and Graphs to Enlighten" /></a>
</div>
</div>
</div>
</div>
<div class="tcb-flex-col c-66" data-css="tve-u-16f15b15bc2" style="">
<div class="tcb-col">
<div class="thrv_wrapper thrv_text_element tve-froala fr-box">
<p><span data-preserver-spaces="true">When I first started reading about </span><a href="https://nandeshwar.info/data-science-2/automated-reports-and-dashboards-in-r/" target="_blank" class="tve-froala" style="outline: none;"><span data-preserver-spaces="true">dashboards</span></a><span data-preserver-spaces="true"> and good chart design, I bumped into Stephen Few's book</span><a href="https://amzn.to/36MZ4qH" target="_blank" class="tve-froala" style="outline: none;"><span data-preserver-spaces="true">&nbsp;Information Dashboard Design</span></a><span data-preserver-spaces="true">. This book changed how I looked at charts and tables. Show me the numbers is a great book if you haven't read any of Few's works.&nbsp;</span></p>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
<div class="thrv_wrapper thrv_contentbox_shortcode thrv-content-box">
<div class="tve-content-box-background"></div>
<div class="tve-cb">
<div class="thrv_wrapper tve_wp_shortcode">
<div class="tve_shortcode_raw" style="display: none"></div>
<div class="tve_shortcode_rendered">
<h3><a class="aawp-link" href="https://www.amazon.com/dp/1119002257?tag=nandeshwarinf-20&linkCode=ogi&th=1&psc=1" title="Storytelling with Data: A Data Visualization Guide for Business Professionals" target="_blank" rel="nofollow"  data-aawp-product-id="1119002257" data-aawp-product-title="Storytelling with Data  A Data Visualization Guide for Business Professionals" data-aawp-click-tracking="true">Storytelling with Data: A Data Visualization Guide for Business Professionals</a></h3>
</div>
</div>
<div class="thrv_wrapper thrv-columns" style="--tcb-col-el-width:1078;" data-css="tve-u-16f15bc11a9">
<div class="tcb-flex-row tcb-resized tcb--cols--2" data-css="tve-u-16f15bb53a4">
<div class="tcb-flex-col c-33" data-css="tve-u-16f15b15bbf" style="">
<div class="tcb-col">
<div class="thrv_wrapper tve_wp_shortcode" data-css="tve-u-16f15b692bd">
<div class="tve_shortcode_raw" style="display: none"></div>
<div class="tve_shortcode_rendered"><a  data-aawp-product-id="1119002257" data-aawp-product-title="Storytelling with Data  A Data Visualization Guide for Business Professionals" data-aawp-click-tracking="true" href="https://www.amazon.com/dp/1119002257?tag=nandeshwarinf-20&linkCode=ogi&th=1&psc=1" title="Storytelling with Data: A Data Visualization Guide for Business Professionals" rel="nofollow" target="_blank"><img src="https://m.media-amazon.com/images/I/41OonY0kRWL._SL160_.jpg" alt="Storytelling with Data: A Data Visualization Guide for Business Professionals" /></a>
</div>
</div>
</div>
</div>
<div class="tcb-flex-col c-66" data-css="tve-u-16f15b15bc2" style="">
<div class="tcb-col">
<div class="thrv_wrapper thrv_text_element">
<p><a href="https://www.storytellingwithdata.com/" target="_blank">Cole Kanflic's</a> Storytelling with data is a great addition to the collection of data visualization books. It is beginner-friendly and does not assume the knowledge of any programming. Most, if not all, the charts in the book can be created in <a href="https://nandeshwar.info/excel-advanced-tips-tricks/" target="_blank" class="tve-froala" style="outline: none;">Excel</a>. The result is pleasing to the eyes and is clutter-free.</p>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
<div class="thrv_wrapper thrv_contentbox_shortcode thrv-content-box">
<div class="tve-content-box-background"></div>
<div class="tve-cb">
<div class="thrv_wrapper tve_wp_shortcode">
<div class="tve_shortcode_raw" style="display: none"></div>
<div class="tve_shortcode_rendered">
<h3><a class="aawp-link" href="https://www.amazon.com/dp/1633690709?tag=nandeshwarinf-20&linkCode=ogi&th=1&psc=1" title="Good Charts: The HBR Guide to Making Smarter, More Persuasive Data Visualizations" target="_blank" rel="nofollow"  data-aawp-product-id="1633690709" data-aawp-product-title="Good Charts  The HBR Guide to Making Smarter More Persuasive Data Visualizations" data-aawp-click-tracking="true">Good Charts: The HBR Guide to Making Smarter, More Persuasive Data Visualizations</a></h3>
</div>
</div>
<div class="thrv_wrapper thrv-columns" style="--tcb-col-el-width:1078;" data-css="tve-u-16f15bc11a9">
<div class="tcb-flex-row tcb-resized tcb--cols--2" data-css="tve-u-16f15bb53a4">
<div class="tcb-flex-col c-33" data-css="tve-u-16f15b15bbf" style="">
<div class="tcb-col">
<div class="thrv_wrapper tve_wp_shortcode" data-css="tve-u-16f15b692bd">
<div class="tve_shortcode_raw" style="display: none"></div>
<div class="tve_shortcode_rendered"><a  data-aawp-product-id="1633690709" data-aawp-product-title="Good Charts  The HBR Guide to Making Smarter More Persuasive Data Visualizations" data-aawp-click-tracking="true" href="https://www.amazon.com/dp/1633690709?tag=nandeshwarinf-20&linkCode=ogi&th=1&psc=1" title="Good Charts: The HBR Guide to Making Smarter, More Persuasive Data Visualizations" rel="nofollow" target="_blank"><img src="https://m.media-amazon.com/images/I/413wxDrgtcL._SL160_.jpg" alt="Good Charts: The HBR Guide to Making Smarter, More Persuasive Data Visualizations" /></a>
</div>
</div>
</div>
</div>
<div class="tcb-flex-col c-66" data-css="tve-u-16f15b15bc2" style="">
<div class="tcb-col">
<div class="thrv_wrapper thrv_text_element">
<p>The charts in this book have more of an infographic-y feel to them -- but that's not a negative. I have advocated for simplicity and clarity and criticized "interesting" things, but often you need to splash the charts with some design elements to gain attention. This book provides you great examples of such charts.</p>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
<div class="thrv_wrapper thrv_contentbox_shortcode thrv-content-box">
<div class="tve-content-box-background"></div>
<div class="tve-cb">
<div class="thrv_wrapper tve_wp_shortcode">
<div class="tve_shortcode_raw" style="display: none"></div>
<div class="tve_shortcode_rendered">
<h3><a class="aawp-link" href="https://www.amazon.com/dp/0393347281?tag=nandeshwarinf-20&linkCode=ogi&th=1&psc=1" title="The Wall Street Journal Guide to Information Graphics: The Dos and Don'ts of Presenting Data,..." target="_blank" rel="nofollow"  data-aawp-product-id="0393347281" data-aawp-product-title="The Wall Street Journal Guide to Information Graphics  The Dos and Don ts of Presenting Data Facts and Figures" data-aawp-click-tracking="true">The Wall Street Journal Guide to Information Graphics: The Dos and Don'ts of Presenting Data,...</a></h3>
</div>
</div>
<div class="thrv_wrapper thrv-columns" style="--tcb-col-el-width:1078;" data-css="tve-u-16f15bc11a9">
<div class="tcb-flex-row tcb-resized tcb--cols--2" data-css="tve-u-16f15bb53a4">
<div class="tcb-flex-col c-33" data-css="tve-u-16f15b15bbf" style="">
<div class="tcb-col">
<div class="thrv_wrapper tve_wp_shortcode" data-css="tve-u-16f15b692bd">
<div class="tve_shortcode_raw" style="display: none"></div>
<div class="tve_shortcode_rendered"><a  data-aawp-product-id="0393347281" data-aawp-product-title="The Wall Street Journal Guide to Information Graphics  The Dos and Don ts of Presenting Data Facts and Figures" data-aawp-click-tracking="true" href="https://www.amazon.com/dp/0393347281?tag=nandeshwarinf-20&linkCode=ogi&th=1&psc=1" title="The Wall Street Journal Guide to Information Graphics: The Dos and Don'ts of Presenting Data,..." rel="nofollow" target="_blank"><img src="https://m.media-amazon.com/images/I/41GGSRB6EVL._SL160_.jpg" alt="The Wall Street Journal Guide to Information Graphics: The Dos and Don'ts of Presenting Data,..." /></a>
</div>
</div>
</div>
</div>
<div class="tcb-flex-col c-66" data-css="tve-u-16f15b15bc2" style="">
<div class="tcb-col">
<div class="thrv_wrapper thrv_text_element">
<p><a href="http://www.donawong.com/" target="_blank">Dona Wong</a>, a former graphics editor at the Wall Street Journal and Tufte's student, gives concise advice on colors, chart types, and clarity. This book is my first recommendation for anyone interested in data visualization.</p>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
<div class="thrv_wrapper thrv_contentbox_shortcode thrv-content-box">
<div class="tve-content-box-background"></div>
<div class="tve-cb">
<div class="thrv_wrapper tve_wp_shortcode">
<div class="tve_shortcode_raw" style="display: none"></div>
<div class="tve_shortcode_rendered">
<h3><a class="aawp-link" href="https://www.amazon.com/dp/1119282713?tag=nandeshwarinf-20&linkCode=ogi&th=1&psc=1" title="The Big Book of Dashboards: Visualizing Your Data Using Real-World Business Scenarios" target="_blank" rel="nofollow"  data-aawp-product-id="1119282713" data-aawp-product-title="The Big Book of Dashboards  Visualizing Your Data Using Real-World Business Scenarios" data-aawp-click-tracking="true">The Big Book of Dashboards: Visualizing Your Data Using Real-World Business Scenarios</a></h3>
</div>
</div>
<div class="thrv_wrapper thrv-columns" style="--tcb-col-el-width:1078;" data-css="tve-u-16f15bc11a9">
<div class="tcb-flex-row tcb-resized tcb--cols--2" data-css="tve-u-16f15bb53a4">
<div class="tcb-flex-col c-33" data-css="tve-u-16f15b15bbf" style="">
<div class="tcb-col">
<div class="thrv_wrapper tve_wp_shortcode" data-css="tve-u-16f15b692bd">
<div class="tve_shortcode_raw" style="display: none"></div>
<div class="tve_shortcode_rendered"><a  data-aawp-product-id="1119282713" data-aawp-product-title="The Big Book of Dashboards  Visualizing Your Data Using Real-World Business Scenarios" data-aawp-click-tracking="true" href="https://www.amazon.com/dp/1119282713?tag=nandeshwarinf-20&linkCode=ogi&th=1&psc=1" title="The Big Book of Dashboards: Visualizing Your Data Using Real-World Business Scenarios" rel="nofollow" target="_blank"><img src="https://m.media-amazon.com/images/I/518yHZZNm4L._SL160_.jpg" alt="The Big Book of Dashboards: Visualizing Your Data Using Real-World Business Scenarios" /></a>
</div>
</div>
</div>
</div>
<div class="tcb-flex-col c-66" data-css="tve-u-16f15b15bc2" style="">
<div class="tcb-col">
<div class="thrv_wrapper thrv_text_element">
<p>The big book of dashboards provides a great introduction to key data visualization principles and then presents valuable examples of dashboards that you may see in real life.</p>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
<div class="thrv_wrapper thrv_contentbox_shortcode thrv-content-box">
<div class="tve-content-box-background"></div>
<div class="tve-cb">
<div class="thrv_wrapper tve_wp_shortcode">
<div class="tve_shortcode_raw" style="display: none"></div>
<div class="tve_shortcode_rendered">
<h3><a class="aawp-link" href="https://www.amazon.com/dp/1492031089?tag=nandeshwarinf-20&linkCode=ogi&th=1&psc=1" title="Fundamentals of Data Visualization: A Primer on Making Informative and Compelling Figures" target="_blank" rel="nofollow"  data-aawp-product-id="1492031089" data-aawp-product-title="Fundamentals of Data Visualization  A Primer on Making Informative and Compelling Figures" data-aawp-click-tracking="true">Fundamentals of Data Visualization: A Primer on Making Informative and Compelling Figures</a></h3>
</div>
</div>
<div class="thrv_wrapper thrv-columns" style="--tcb-col-el-width:1078;" data-css="tve-u-16f15bc11a9">
<div class="tcb-flex-row tcb-resized tcb--cols--2" data-css="tve-u-16f15bb53a4">
<div class="tcb-flex-col c-33" data-css="tve-u-16f15b15bbf" style="">
<div class="tcb-col">
<div class="thrv_wrapper tve_wp_shortcode" data-css="tve-u-16f15b692bd">
<div class="tve_shortcode_raw" style="display: none"></div>
<div class="tve_shortcode_rendered"><a  data-aawp-product-id="1492031089" data-aawp-product-title="Fundamentals of Data Visualization  A Primer on Making Informative and Compelling Figures" data-aawp-click-tracking="true" href="https://www.amazon.com/dp/1492031089?tag=nandeshwarinf-20&linkCode=ogi&th=1&psc=1" title="Fundamentals of Data Visualization: A Primer on Making Informative and Compelling Figures" rel="nofollow" target="_blank"><img src="https://m.media-amazon.com/images/I/517DybM0hSL._SL160_.jpg" alt="Fundamentals of Data Visualization: A Primer on Making Informative and Compelling Figures" /></a>
</div>
</div>
</div>
</div>
<div class="tcb-flex-col c-66" data-css="tve-u-16f15b15bc2" style="">
<div class="tcb-col">
<div class="thrv_wrapper thrv_text_element">
<p>In the fundamentals of data visualization, Wilke shows countless examples that differentiate a good chart from a bad chart. This book is special because Wike created his own plotting style available via an R package.</p>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
<div class="thrv_wrapper thrv_contentbox_shortcode thrv-content-box">
<div class="tve-content-box-background"></div>
<div class="tve-cb">
<div class="thrv_wrapper tve_wp_shortcode">
<div class="tve_shortcode_raw" style="display: none"></div>
<div class="tve_shortcode_rendered">
<h3><a class="aawp-link" href="https://www.amazon.com/dp/1616897147?tag=nandeshwarinf-20&linkCode=ogi&th=1&psc=1" title="Observe Collect Draw!: A Visual Journal" target="_blank" rel="nofollow"  data-aawp-product-id="1616897147" data-aawp-product-title="Observe Collect Draw!  A Visual Journal" data-aawp-click-tracking="true">Observe Collect Draw!: A Visual Journal</a></h3>
</div>
</div>
<div class="thrv_wrapper thrv-columns" style="--tcb-col-el-width:1078;" data-css="tve-u-16f15bc11a9">
<div class="tcb-flex-row tcb-resized tcb--cols--2" data-css="tve-u-16f15bb53a4">
<div class="tcb-flex-col c-33" data-css="tve-u-16f15b15bbf" style="">
<div class="tcb-col">
<div class="thrv_wrapper tve_wp_shortcode" data-css="tve-u-16f15b692bd">
<div class="tve_shortcode_raw" style="display: none"></div>
<div class="tve_shortcode_rendered"><a  data-aawp-product-id="1616897147" data-aawp-product-title="Observe Collect Draw!  A Visual Journal" data-aawp-click-tracking="true" href="https://www.amazon.com/dp/1616897147?tag=nandeshwarinf-20&linkCode=ogi&th=1&psc=1" title="Observe Collect Draw!: A Visual Journal" rel="nofollow" target="_blank"><img src="https://m.media-amazon.com/images/I/51g3DzKEe2L._SL160_.jpg" alt="Observe Collect Draw!: A Visual Journal" /></a>
</div>
</div>
</div>
</div>
<div class="tcb-flex-col c-66" data-css="tve-u-16f15b15bc2" style="">
<div class="tcb-col">
<div class="thrv_wrapper thrv_text_element">
<p>Observe, collect, draw is a special book. One of my design professors recommended it to me. While it is not strictly a data visualization book of Tufte or Few variety, it provides great examples of how even simple data can be viewed with the help of a graphic. This book is great for ideation.</p>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
<div class="thrv_wrapper thrv_contentbox_shortcode thrv-content-box">
<div class="tve-content-box-background"></div>
<div class="tve-cb">
<div class="thrv_wrapper tve_wp_shortcode">
<div class="tve_shortcode_raw" style="display: none"></div>
<div class="tve_shortcode_rendered">
<h3><a class="aawp-link" href="https://www.amazon.com/dp/1324001569?tag=nandeshwarinf-20&linkCode=ogi&th=1&psc=1" title="How Charts Lie: Getting Smarter about Visual Information" target="_blank" rel="nofollow"  data-aawp-product-id="1324001569" data-aawp-product-title="How Charts Lie  Getting Smarter about Visual Information" data-aawp-click-tracking="true">How Charts Lie: Getting Smarter about Visual Information</a></h3>
</div>
</div>
<div class="thrv_wrapper thrv-columns" style="--tcb-col-el-width:1078;" data-css="tve-u-16f15bc11a9">
<div class="tcb-flex-row tcb-resized tcb--cols--2" data-css="tve-u-16f15bb53a4">
<div class="tcb-flex-col c-33" data-css="tve-u-16f15b15bbf" style="">
<div class="tcb-col">
<div class="thrv_wrapper tve_wp_shortcode" data-css="tve-u-16f15b692bd">
<div class="tve_shortcode_raw" style="display: none"></div>
<div class="tve_shortcode_rendered"><a  data-aawp-product-id="1324001569" data-aawp-product-title="How Charts Lie  Getting Smarter about Visual Information" data-aawp-click-tracking="true" href="https://www.amazon.com/dp/1324001569?tag=nandeshwarinf-20&linkCode=ogi&th=1&psc=1" title="How Charts Lie: Getting Smarter about Visual Information" rel="nofollow" target="_blank"><img src="https://m.media-amazon.com/images/I/31tkHADVxpL._SL160_.jpg" alt="How Charts Lie: Getting Smarter about Visual Information" /></a>
</div>
</div>
</div>
</div>
<div class="tcb-flex-col c-66" data-css="tve-u-16f15b15bc2" style="">
<div class="tcb-col">
<div class="thrv_wrapper thrv_text_element">
<p>Alberto Cairo offers many examples of how the chart creators knowingly or unknowingly "lie." He shows how they do this: using faulty axis starting points, aspect ratios, and outright wrong data. This book will help you see how charts can manipulate certain emotions, and you will learn to question or avoid such charts.</p>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
<div class="thrv_wrapper thrv_contentbox_shortcode thrv-content-box">
<div class="tve-content-box-background" data-css="tve-u-16f15c45369"></div>
<div class="tve-cb">
<div class="thrv_wrapper tve_wp_shortcode">
<div class="tve_shortcode_raw" style="display: none"></div>
<div class="tve_shortcode_rendered">
<h3><a class="aawp-link" href="https://www.amazon.com/dp/0692057846?tag=nandeshwarinf-20&linkCode=ogi&th=1&psc=1" title="Data Science for Fundraising: Build Data-Driven Solutions Using R" target="_blank" rel="nofollow"  data-aawp-product-id="0692057846" data-aawp-product-title="Data Science for Fundraising  Build Data-Driven Solutions Using R" data-aawp-click-tracking="true">Data Science for Fundraising: Build Data-Driven Solutions Using R</a></h3>
</div>
</div>
<div class="thrv_wrapper thrv-columns" style="--tcb-col-el-width:1078;" data-css="tve-u-16f15bc11a9">
<div class="tcb-flex-row tcb-resized tcb--cols--2" data-css="tve-u-16f15bb53a4">
<div class="tcb-flex-col c-33" data-css="tve-u-16f15b15bbf" style="">
<div class="tcb-col">
<div class="thrv_wrapper tve_wp_shortcode" data-css="tve-u-16f15b692bd">
<div class="tve_shortcode_raw" style="display: none"></div>
<div class="tve_shortcode_rendered"><a  data-aawp-product-id="0692057846" data-aawp-product-title="Data Science for Fundraising  Build Data-Driven Solutions Using R" data-aawp-click-tracking="true" href="https://www.amazon.com/dp/0692057846?tag=nandeshwarinf-20&linkCode=ogi&th=1&psc=1" title="Data Science for Fundraising: Build Data-Driven Solutions Using R" rel="nofollow" target="_blank"><img src="https://m.media-amazon.com/images/I/41uX3EUlr7L._SL160_.jpg" alt="Data Science for Fundraising: Build Data-Driven Solutions Using R" /></a>
</div>
</div>
</div>
</div>
<div class="tcb-flex-col c-66" data-css="tve-u-16f15b15bc2" style="">
<div class="tcb-col">
<div class="thrv_wrapper thrv_text_element">
<p>Yes, I co-wrote this book. Yes, I am biased. The <a href="https://ds4fr.nandeshwar.info/data-visualization-1.html" target="_blank" class="tve-froala" style="outline: none;">data visualization chapter</a> in this book is so detailed and long that it could have become a separate book. I wrote about principles of effective data visualization, summarizing my <a href="https://nandeshwar.info/books/best-books-learn-r-programming/" target="_blank" class="tve-froala" style="outline: none;">learnings from the books</a> before me. I advised which charts to use. And I provided detailed R code to create your own effective graphics.</p>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
<div class="thrv_wrapper thrv_text_element">
<p><strong>Here's the list of books on data visualization again for your quick review.</strong></p>
</div>
<div class="thrv_wrapper tve_wp_shortcode">
<div class="tve_shortcode_raw" style="display: none"></div>
<div class="tve_shortcode_rendered">
<div class="aawp">

            
            
<div class="aawp-product aawp-product--horizontal aawp-product--ribbon aawp-product--sale"  data-aawp-product-id="0963488406" data-aawp-product-title="Visualizing Data" data-aawp-click-tracking="true">

    <span class="aawp-product__ribbon aawp-product__ribbon--sale">Sale</span>
    <div class="aawp-product__thumb">
        <a class="aawp-product__image-link"
           href="https://www.amazon.com/dp/0963488406?tag=nandeshwarinf-20&linkCode=ogi&th=1&psc=1" title="Visualizing Data" rel="nofollow" target="_blank">
            <img class="aawp-product__image" src="https://m.media-amazon.com/images/I/41T1OwLJAML._SL160_.jpg" alt="Visualizing Data"  />
        </a>

            </div>

    <div class="aawp-product__content">
        <a class="aawp-product__title" href="https://www.amazon.com/dp/0963488406?tag=nandeshwarinf-20&linkCode=ogi&th=1&psc=1" title="Visualizing Data" rel="nofollow" target="_blank">
            Visualizing Data        </a>
        <div class="aawp-product__description">
            <ul><li>Very nice appearance</li><li>Hardcover Book</li><li>Cleveland, William S. (Author)</li><li>English (Publication Language)</li><li>360 Pages - 06/21/1993 (Publication Date) - Hobart Pr (Publisher)</li></ul>        </div>
    </div>

    <div class="aawp-product__footer">

        <div class="aawp-product__pricing">
                                                        
                            <span class="aawp-product__price aawp-product__price--current">&#36;69.00</span>
            
            <a class="aawp-check-prime" href="https://www.amazon.com/gp/prime/?tag=nandeshwarinf-20" title="Amazon Prime" rel="nofollow" target="_blank"></a>        </div>

                <a class="aawp-button aawp-button--buy aawp-button aawp-button--amazon aawp-button--icon aawp-button--icon-black" href="https://www.amazon.com/dp/0963488406?tag=nandeshwarinf-20&#038;linkCode=ogi&#038;th=1&#038;psc=1" title="Buy on Amazon" target="_blank" rel="nofollow">Buy on Amazon</a>
            </div>

</div>

            
            
<div class="aawp-product aawp-product--horizontal aawp-product--ribbon aawp-product--sale"  data-aawp-product-id="0961392142" data-aawp-product-title="The Visual Display of Quantitative Information 2nd Ed." data-aawp-click-tracking="true">

    <span class="aawp-product__ribbon aawp-product__ribbon--sale">Sale</span>
    <div class="aawp-product__thumb">
        <a class="aawp-product__image-link"
           href="https://www.amazon.com/dp/0961392142?tag=nandeshwarinf-20&linkCode=ogi&th=1&psc=1" title="The Visual Display of Quantitative Information, 2nd Ed." rel="nofollow" target="_blank">
            <img class="aawp-product__image" src="https://m.media-amazon.com/images/I/41lXBygkVwL._SL160_.jpg" alt="The Visual Display of Quantitative Information, 2nd Ed."  />
        </a>

            </div>

    <div class="aawp-product__content">
        <a class="aawp-product__title" href="https://www.amazon.com/dp/0961392142?tag=nandeshwarinf-20&linkCode=ogi&th=1&psc=1" title="The Visual Display of Quantitative Information, 2nd Ed." rel="nofollow" target="_blank">
            The Visual Display of Quantitative Information, 2nd Ed.        </a>
        <div class="aawp-product__description">
            <ul><li>Hardcover Book</li><li>Edward R. Tufte (Author)</li><li>English (Publication Language)</li><li>197 Pages - 01/14/1997 (Publication Date) - Graphics Pr (Publisher)</li></ul>        </div>
    </div>

    <div class="aawp-product__footer">

        <div class="aawp-product__pricing">
                                                        
                            <span class="aawp-product__price aawp-product__price--current">&#36;22.54</span>
            
                    </div>

                <a class="aawp-button aawp-button--buy aawp-button aawp-button--amazon aawp-button--icon aawp-button--icon-black" href="https://www.amazon.com/dp/0961392142?tag=nandeshwarinf-20&#038;linkCode=ogi&#038;th=1&#038;psc=1" title="Buy on Amazon" target="_blank" rel="nofollow">Buy on Amazon</a>
            </div>

</div>

            
            
<div class="aawp-product aawp-product--horizontal aawp-product--ribbon aawp-product--sale"  data-aawp-product-id="0970601972" data-aawp-product-title="Show Me the Numbers  Designing Tables and Graphs to Enlighten" data-aawp-click-tracking="true">

    <span class="aawp-product__ribbon aawp-product__ribbon--sale">Sale</span>
    <div class="aawp-product__thumb">
        <a class="aawp-product__image-link"
           href="https://www.amazon.com/dp/0970601972?tag=nandeshwarinf-20&linkCode=ogi&th=1&psc=1" title="Show Me the Numbers: Designing Tables and Graphs to Enlighten" rel="nofollow" target="_blank">
            <img class="aawp-product__image" src="https://m.media-amazon.com/images/I/41aMPkpTLvL._SL160_.jpg" alt="Show Me the Numbers: Designing Tables and Graphs to Enlighten"  />
        </a>

            </div>

    <div class="aawp-product__content">
        <a class="aawp-product__title" href="https://www.amazon.com/dp/0970601972?tag=nandeshwarinf-20&linkCode=ogi&th=1&psc=1" title="Show Me the Numbers: Designing Tables and Graphs to Enlighten" rel="nofollow" target="_blank">
            Show Me the Numbers: Designing Tables and Graphs to Enlighten        </a>
        <div class="aawp-product__description">
            <ul><li>Used Book in Good Condition</li><li>Hardcover Book</li><li>Few, Stephen (Author)</li><li>English (Publication Language)</li><li>371 Pages - 06/01/2012 (Publication Date) - Analytics Press (Publisher)</li></ul>        </div>
    </div>

    <div class="aawp-product__footer">

        <div class="aawp-product__pricing">
                                                        
                            <span class="aawp-product__price aawp-product__price--current">&#36;20.24</span>
            
            <a class="aawp-check-prime" href="https://www.amazon.com/gp/prime/?tag=nandeshwarinf-20" title="Amazon Prime" rel="nofollow" target="_blank"></a>        </div>

                <a class="aawp-button aawp-button--buy aawp-button aawp-button--amazon aawp-button--icon aawp-button--icon-black" href="https://www.amazon.com/dp/0970601972?tag=nandeshwarinf-20&#038;linkCode=ogi&#038;th=1&#038;psc=1" title="Buy on Amazon" target="_blank" rel="nofollow">Buy on Amazon</a>
            </div>

</div>

            
            
<div class="aawp-product aawp-product--horizontal aawp-product--ribbon aawp-product--sale"  data-aawp-product-id="1119002257" data-aawp-product-title="Storytelling with Data  A Data Visualization Guide for Business Professionals" data-aawp-click-tracking="true">

    <span class="aawp-product__ribbon aawp-product__ribbon--sale">Sale</span>
    <div class="aawp-product__thumb">
        <a class="aawp-product__image-link"
           href="https://www.amazon.com/dp/1119002257?tag=nandeshwarinf-20&linkCode=ogi&th=1&psc=1" title="Storytelling with Data: A Data Visualization Guide for Business Professionals" rel="nofollow" target="_blank">
            <img class="aawp-product__image" src="https://m.media-amazon.com/images/I/41OonY0kRWL._SL160_.jpg" alt="Storytelling with Data: A Data Visualization Guide for Business Professionals"  />
        </a>

            </div>

    <div class="aawp-product__content">
        <a class="aawp-product__title" href="https://www.amazon.com/dp/1119002257?tag=nandeshwarinf-20&linkCode=ogi&th=1&psc=1" title="Storytelling with Data: A Data Visualization Guide for Business Professionals" rel="nofollow" target="_blank">
            Storytelling with Data: A Data Visualization Guide for Business Professionals        </a>
        <div class="aawp-product__description">
            <ul><li>Wiley</li><li>Language: english</li><li>Book - storytelling with data: a data visualization guide for business professionals</li><li>Nussbaumer Knaflic, Cole (Author)</li><li>English (Publication Language)</li></ul>        </div>
    </div>

    <div class="aawp-product__footer">

        <div class="aawp-product__pricing">
                                                        
                            <span class="aawp-product__price aawp-product__price--current">&#36;26.99</span>
            
            <a class="aawp-check-prime" href="https://www.amazon.com/gp/prime/?tag=nandeshwarinf-20" title="Amazon Prime" rel="nofollow" target="_blank"></a>        </div>

                <a class="aawp-button aawp-button--buy aawp-button aawp-button--amazon aawp-button--icon aawp-button--icon-black" href="https://www.amazon.com/dp/1119002257?tag=nandeshwarinf-20&#038;linkCode=ogi&#038;th=1&#038;psc=1" title="Buy on Amazon" target="_blank" rel="nofollow">Buy on Amazon</a>
            </div>

</div>

            
            
<div class="aawp-product aawp-product--horizontal aawp-product--ribbon aawp-product--sale"  data-aawp-product-id="1633690709" data-aawp-product-title="Good Charts  The HBR Guide to Making Smarter More Persuasive Data Visualizations" data-aawp-click-tracking="true">

    <span class="aawp-product__ribbon aawp-product__ribbon--sale">Sale</span>
    <div class="aawp-product__thumb">
        <a class="aawp-product__image-link"
           href="https://www.amazon.com/dp/1633690709?tag=nandeshwarinf-20&linkCode=ogi&th=1&psc=1" title="Good Charts: The HBR Guide to Making Smarter, More Persuasive Data Visualizations" rel="nofollow" target="_blank">
            <img class="aawp-product__image" src="https://m.media-amazon.com/images/I/413wxDrgtcL._SL160_.jpg" alt="Good Charts: The HBR Guide to Making Smarter, More Persuasive Data Visualizations"  />
        </a>

            </div>

    <div class="aawp-product__content">
        <a class="aawp-product__title" href="https://www.amazon.com/dp/1633690709?tag=nandeshwarinf-20&linkCode=ogi&th=1&psc=1" title="Good Charts: The HBR Guide to Making Smarter, More Persuasive Data Visualizations" rel="nofollow" target="_blank">
            Good Charts: The HBR Guide to Making Smarter, More Persuasive Data Visualizations        </a>
        <div class="aawp-product__description">
            <ul><li>Berinato, Scott (Author)</li><li>English (Publication Language)</li><li>264 Pages - 05/17/2016 (Publication Date) - Harvard Business Review Press (Publisher)</li></ul>        </div>
    </div>

    <div class="aawp-product__footer">

        <div class="aawp-product__pricing">
                                                        
                            <span class="aawp-product__price aawp-product__price--current">&#36;30.21</span>
            
                    </div>

                <a class="aawp-button aawp-button--buy aawp-button aawp-button--amazon aawp-button--icon aawp-button--icon-black" href="https://www.amazon.com/dp/1633690709?tag=nandeshwarinf-20&#038;linkCode=ogi&#038;th=1&#038;psc=1" title="Buy on Amazon" target="_blank" rel="nofollow">Buy on Amazon</a>
            </div>

</div>

            
            
<div class="aawp-product aawp-product--horizontal aawp-product--ribbon aawp-product--sale"  data-aawp-product-id="0393347281" data-aawp-product-title="The Wall Street Journal Guide to Information Graphics  The Dos and Don ts of Presenting Data Facts and Figures" data-aawp-click-tracking="true">

    <span class="aawp-product__ribbon aawp-product__ribbon--sale">Sale</span>
    <div class="aawp-product__thumb">
        <a class="aawp-product__image-link"
           href="https://www.amazon.com/dp/0393347281?tag=nandeshwarinf-20&linkCode=ogi&th=1&psc=1" title="The Wall Street Journal Guide to Information Graphics: The Dos and Don'ts of Presenting Data,..." rel="nofollow" target="_blank">
            <img class="aawp-product__image" src="https://m.media-amazon.com/images/I/41GGSRB6EVL._SL160_.jpg" alt="The Wall Street Journal Guide to Information Graphics: The Dos and Don'ts of Presenting Data,..."  />
        </a>

            </div>

    <div class="aawp-product__content">
        <a class="aawp-product__title" href="https://www.amazon.com/dp/0393347281?tag=nandeshwarinf-20&linkCode=ogi&th=1&psc=1" title="The Wall Street Journal Guide to Information Graphics: The Dos and Don'ts of Presenting Data,..." rel="nofollow" target="_blank">
            The Wall Street Journal Guide to Information Graphics: The Dos and Don'ts of Presenting Data,...        </a>
        <div class="aawp-product__description">
            <ul><li>Wong, Dona M. (Author)</li><li>English (Publication Language)</li><li>160 Pages - 12/16/2013 (Publication Date) - W. W. Norton & Company (Publisher)</li></ul>        </div>
    </div>

    <div class="aawp-product__footer">

        <div class="aawp-product__pricing">
                                                        
                            <span class="aawp-product__price aawp-product__price--current">&#36;19.44</span>
            
                    </div>

                <a class="aawp-button aawp-button--buy aawp-button aawp-button--amazon aawp-button--icon aawp-button--icon-black" href="https://www.amazon.com/dp/0393347281?tag=nandeshwarinf-20&#038;linkCode=ogi&#038;th=1&#038;psc=1" title="Buy on Amazon" target="_blank" rel="nofollow">Buy on Amazon</a>
            </div>

</div>

            
            
<div class="aawp-product aawp-product--horizontal aawp-product--ribbon aawp-product--sale"  data-aawp-product-id="1119282713" data-aawp-product-title="The Big Book of Dashboards  Visualizing Your Data Using Real-World Business Scenarios" data-aawp-click-tracking="true">

    <span class="aawp-product__ribbon aawp-product__ribbon--sale">Sale</span>
    <div class="aawp-product__thumb">
        <a class="aawp-product__image-link"
           href="https://www.amazon.com/dp/1119282713?tag=nandeshwarinf-20&linkCode=ogi&th=1&psc=1" title="The Big Book of Dashboards: Visualizing Your Data Using Real-World Business Scenarios" rel="nofollow" target="_blank">
            <img class="aawp-product__image" src="https://m.media-amazon.com/images/I/518yHZZNm4L._SL160_.jpg" alt="The Big Book of Dashboards: Visualizing Your Data Using Real-World Business Scenarios"  />
        </a>

            </div>

    <div class="aawp-product__content">
        <a class="aawp-product__title" href="https://www.amazon.com/dp/1119282713?tag=nandeshwarinf-20&linkCode=ogi&th=1&psc=1" title="The Big Book of Dashboards: Visualizing Your Data Using Real-World Business Scenarios" rel="nofollow" target="_blank">
            The Big Book of Dashboards: Visualizing Your Data Using Real-World Business Scenarios        </a>
        <div class="aawp-product__description">
            <ul><li>Wexler, Steve (Author)</li><li>English (Publication Language)</li><li>448 Pages - 04/24/2017 (Publication Date) - Wiley (Publisher)</li></ul>        </div>
    </div>

    <div class="aawp-product__footer">

        <div class="aawp-product__pricing">
                                                        
                            <span class="aawp-product__price aawp-product__price--current">&#36;22.91</span>
            
            <a class="aawp-check-prime" href="https://www.amazon.com/gp/prime/?tag=nandeshwarinf-20" title="Amazon Prime" rel="nofollow" target="_blank"></a>        </div>

                <a class="aawp-button aawp-button--buy aawp-button aawp-button--amazon aawp-button--icon aawp-button--icon-black" href="https://www.amazon.com/dp/1119282713?tag=nandeshwarinf-20&#038;linkCode=ogi&#038;th=1&#038;psc=1" title="Buy on Amazon" target="_blank" rel="nofollow">Buy on Amazon</a>
            </div>

</div>

            
            
<div class="aawp-product aawp-product--horizontal aawp-product--ribbon aawp-product--sale"  data-aawp-product-id="1492031089" data-aawp-product-title="Fundamentals of Data Visualization  A Primer on Making Informative and Compelling Figures" data-aawp-click-tracking="true">

    <span class="aawp-product__ribbon aawp-product__ribbon--sale">Sale</span>
    <div class="aawp-product__thumb">
        <a class="aawp-product__image-link"
           href="https://www.amazon.com/dp/1492031089?tag=nandeshwarinf-20&linkCode=ogi&th=1&psc=1" title="Fundamentals of Data Visualization: A Primer on Making Informative and Compelling Figures" rel="nofollow" target="_blank">
            <img class="aawp-product__image" src="https://m.media-amazon.com/images/I/517DybM0hSL._SL160_.jpg" alt="Fundamentals of Data Visualization: A Primer on Making Informative and Compelling Figures"  />
        </a>

            </div>

    <div class="aawp-product__content">
        <a class="aawp-product__title" href="https://www.amazon.com/dp/1492031089?tag=nandeshwarinf-20&linkCode=ogi&th=1&psc=1" title="Fundamentals of Data Visualization: A Primer on Making Informative and Compelling Figures" rel="nofollow" target="_blank">
            Fundamentals of Data Visualization: A Primer on Making Informative and Compelling Figures        </a>
        <div class="aawp-product__description">
            <ul><li>Wilke, Claus O. (Author)</li><li>English (Publication Language)</li><li>387 Pages - 05/14/2019 (Publication Date) - O'Reilly Media (Publisher)</li></ul>        </div>
    </div>

    <div class="aawp-product__footer">

        <div class="aawp-product__pricing">
                                                        
                            <span class="aawp-product__price aawp-product__price--current">&#36;49.29</span>
            
            <a class="aawp-check-prime" href="https://www.amazon.com/gp/prime/?tag=nandeshwarinf-20" title="Amazon Prime" rel="nofollow" target="_blank"></a>        </div>

                <a class="aawp-button aawp-button--buy aawp-button aawp-button--amazon aawp-button--icon aawp-button--icon-black" href="https://www.amazon.com/dp/1492031089?tag=nandeshwarinf-20&#038;linkCode=ogi&#038;th=1&#038;psc=1" title="Buy on Amazon" target="_blank" rel="nofollow">Buy on Amazon</a>
            </div>

</div>

            
            
<div class="aawp-product aawp-product--horizontal aawp-product--ribbon aawp-product--sale"  data-aawp-product-id="1616897147" data-aawp-product-title="Observe Collect Draw!  A Visual Journal" data-aawp-click-tracking="true">

    <span class="aawp-product__ribbon aawp-product__ribbon--sale">Sale</span>
    <div class="aawp-product__thumb">
        <a class="aawp-product__image-link"
           href="https://www.amazon.com/dp/1616897147?tag=nandeshwarinf-20&linkCode=ogi&th=1&psc=1" title="Observe Collect Draw!: A Visual Journal" rel="nofollow" target="_blank">
            <img class="aawp-product__image" src="https://m.media-amazon.com/images/I/51g3DzKEe2L._SL160_.jpg" alt="Observe Collect Draw!: A Visual Journal"  />
        </a>

            </div>

    <div class="aawp-product__content">
        <a class="aawp-product__title" href="https://www.amazon.com/dp/1616897147?tag=nandeshwarinf-20&linkCode=ogi&th=1&psc=1" title="Observe Collect Draw!: A Visual Journal" rel="nofollow" target="_blank">
            Observe Collect Draw!: A Visual Journal        </a>
        <div class="aawp-product__description">
            <ul><li>Lupi, Giorgia (Author)</li><li>English (Publication Language)</li><li>160 Pages - 09/25/2018 (Publication Date) - Princeton Architectural Press (Publisher)</li></ul>        </div>
    </div>

    <div class="aawp-product__footer">

        <div class="aawp-product__pricing">
                                                        
                            <span class="aawp-product__price aawp-product__price--current">&#36;18.93</span>
            
            <a class="aawp-check-prime" href="https://www.amazon.com/gp/prime/?tag=nandeshwarinf-20" title="Amazon Prime" rel="nofollow" target="_blank"></a>        </div>

                <a class="aawp-button aawp-button--buy aawp-button aawp-button--amazon aawp-button--icon aawp-button--icon-black" href="https://www.amazon.com/dp/1616897147?tag=nandeshwarinf-20&#038;linkCode=ogi&#038;th=1&#038;psc=1" title="Buy on Amazon" target="_blank" rel="nofollow">Buy on Amazon</a>
            </div>

</div>

            
            
<div class="aawp-product aawp-product--horizontal aawp-product--ribbon aawp-product--sale"  data-aawp-product-id="1324001569" data-aawp-product-title="How Charts Lie  Getting Smarter about Visual Information" data-aawp-click-tracking="true">

    <span class="aawp-product__ribbon aawp-product__ribbon--sale">Sale</span>
    <div class="aawp-product__thumb">
        <a class="aawp-product__image-link"
           href="https://www.amazon.com/dp/1324001569?tag=nandeshwarinf-20&linkCode=ogi&th=1&psc=1" title="How Charts Lie: Getting Smarter about Visual Information" rel="nofollow" target="_blank">
            <img class="aawp-product__image" src="https://m.media-amazon.com/images/I/31tkHADVxpL._SL160_.jpg" alt="How Charts Lie: Getting Smarter about Visual Information"  />
        </a>

            </div>

    <div class="aawp-product__content">
        <a class="aawp-product__title" href="https://www.amazon.com/dp/1324001569?tag=nandeshwarinf-20&linkCode=ogi&th=1&psc=1" title="How Charts Lie: Getting Smarter about Visual Information" rel="nofollow" target="_blank">
            How Charts Lie: Getting Smarter about Visual Information        </a>
        <div class="aawp-product__description">
            <ul><li>Hardcover Book</li><li>Cairo, Alberto (Author)</li><li>English (Publication Language)</li><li>256 Pages - 10/15/2019 (Publication Date) - W. W. Norton & Company (Publisher)</li></ul>        </div>
    </div>

    <div class="aawp-product__footer">

        <div class="aawp-product__pricing">
                                                        
                            <span class="aawp-product__price aawp-product__price--current">&#36;19.95</span>
            
            <a class="aawp-check-prime" href="https://www.amazon.com/gp/prime/?tag=nandeshwarinf-20" title="Amazon Prime" rel="nofollow" target="_blank"></a>        </div>

                <a class="aawp-button aawp-button--buy aawp-button aawp-button--amazon aawp-button--icon aawp-button--icon-black" href="https://www.amazon.com/dp/1324001569?tag=nandeshwarinf-20&#038;linkCode=ogi&#038;th=1&#038;psc=1" title="Buy on Amazon" target="_blank" rel="nofollow">Buy on Amazon</a>
            </div>

</div>

            
            
<div class="aawp-product aawp-product--horizontal"  data-aawp-product-id="0692057846" data-aawp-product-title="Data Science for Fundraising  Build Data-Driven Solutions Using R" data-aawp-click-tracking="true">

    
    <div class="aawp-product__thumb">
        <a class="aawp-product__image-link"
           href="https://www.amazon.com/dp/0692057846?tag=nandeshwarinf-20&linkCode=ogi&th=1&psc=1" title="Data Science for Fundraising: Build Data-Driven Solutions Using R" rel="nofollow" target="_blank">
            <img class="aawp-product__image" src="https://m.media-amazon.com/images/I/41uX3EUlr7L._SL160_.jpg" alt="Data Science for Fundraising: Build Data-Driven Solutions Using R"  />
        </a>

            </div>

    <div class="aawp-product__content">
        <a class="aawp-product__title" href="https://www.amazon.com/dp/0692057846?tag=nandeshwarinf-20&linkCode=ogi&th=1&psc=1" title="Data Science for Fundraising: Build Data-Driven Solutions Using R" rel="nofollow" target="_blank">
            Data Science for Fundraising: Build Data-Driven Solutions Using R        </a>
        <div class="aawp-product__description">
            <ul><li>Nandeshwar, Ashutosh R (Author)</li><li>English (Publication Language)</li><li>568 Pages - 02/14/2018 (Publication Date) - Data Insight Partners LLC (Publisher)</li></ul>        </div>
    </div>

    <div class="aawp-product__footer">

        <div class="aawp-product__pricing">
            
                            <span class="aawp-product__price aawp-product__price--current">&#36;53.99</span>
            
            <a class="aawp-check-prime" href="https://www.amazon.com/gp/prime/?tag=nandeshwarinf-20" title="Amazon Prime" rel="nofollow" target="_blank"></a>        </div>

                <a class="aawp-button aawp-button--buy aawp-button aawp-button--amazon aawp-button--icon aawp-button--icon-black" href="https://www.amazon.com/dp/0692057846?tag=nandeshwarinf-20&#038;linkCode=ogi&#038;th=1&#038;psc=1" title="Buy on Amazon" target="_blank" rel="nofollow">Buy on Amazon</a>
            </div>

</div>

    
</div>

</div>
</div>
<div class="thrv_wrapper thrv_text_element">
<p style="" data-css="tve-u-17448124bdb">Disclaimer: Amazon affiliate links on this page.</p>
</div>
<div class="tcb_flag" style="display: none"></div>
<span class="tve-leads-two-step-trigger tl-2step-trigger-2626"></span><span class="tve-leads-two-step-trigger tl-2step-trigger-0"></span><p>The post <a rel="nofollow" href="https://nandeshwar.info/books/best-data-visualization-books/">Best Books on Data Visualization</a> appeared first on <a rel="nofollow" href="https://nandeshwar.info">nandeshwar.info</a>.</p>
]]></content:encoded>
					
		
		
			</item>
		<item>
		<title>Pie Chart vs. Bar Chart</title>
		<link>https://nandeshwar.info/data-visualization/pie-chart-vs-bar-chart/</link>
		
		<dc:creator><![CDATA[n.ashutosh]]></dc:creator>
		<pubDate>Sun, 30 Aug 2020 00:47:33 +0000</pubDate>
				<category><![CDATA[Data Visualization]]></category>
		<category><![CDATA[bar charts]]></category>
		<category><![CDATA[data visualization]]></category>
		<category><![CDATA[pie charts]]></category>
		<category><![CDATA[R]]></category>
		<guid isPermaLink="false">https://nandeshwar.info/?p=3632</guid>

					<description><![CDATA[<p>Introduction The argument of pie charts vs.&#160;bar charts is almost 100 years old, going back to Walter Eells’ paper titled “The Relative Merits of Circles and Bars for Representing Component Parts.” While pie charts are more common in business presentations, bar charts are finding increased use. There are advantages and disadvantages to both. In this [&#8230;]</p>
<p>The post <a rel="nofollow" href="https://nandeshwar.info/data-visualization/pie-chart-vs-bar-chart/">Pie Chart vs. Bar Chart</a> appeared first on <a rel="nofollow" href="https://nandeshwar.info">nandeshwar.info</a>.</p>
]]></description>
										<content:encoded><![CDATA[
<h2>Introduction</h2>



<p>The argument of pie charts vs.&nbsp;bar charts is almost 100 years old, going back to <a href="https://eagereyes.org/blog/2015/ye-olde-pie-chart-debate" target="_blank" rel="noreferrer noopener">Walter Eells’</a> paper titled “The Relative Merits of Circles and Bars for Representing Component Parts.” While <a href="https://ds4fr.nandeshwar.info/data-visualization-1.html#pie-charts">pie charts</a> are more common in business <a href="https://nandeshwar.info/guide-to-improve-your-speaking-instantly/">presentations</a>, bar charts are finding increased use. There are advantages and disadvantages to both. In this post, you will learn about those as well as see alternatives.</p>



<h2>Pie Charts</h2>



<h3>When to use a pie chart?</h3>



<p>Analysts create pie charts to show the distribution of proportions, such as the percent of votes different candidates get in an election, or the proportion of revenue streams for a company. This is the simplest way to show “parts of the whole.”</p>



<div class="figure" style="text-align: center"><span id="fig:simple-pie-chart"></span>
<img src="https://d2py08v4b28rs4.cloudfront.net/wp-content/uploads/simple-pie-chart.png" alt="simple pie chart" class="wp-image-3658" width="400" srcset="https://d2py08v4b28rs4.cloudfront.net/wp-content/uploads/simple-pie-chart.png 861w, https://d2py08v4b28rs4.cloudfront.net/wp-content/uploads/simple-pie-chart-300x245.png 300w, https://d2py08v4b28rs4.cloudfront.net/wp-content/uploads/simple-pie-chart-768x627.png 768w" sizes="(max-width: 861px) 100vw, 861px" />
<p class="caption">
Figure 1: Simple pie chart
</p>
</div>



<p>Pie charts become <a href="https://nandeshwar.info/data-visualization/improving-data-visualizations-giving-usa-report/">confusing or unclear</a> when many proportions, hence slices, are present. They are especially challenging when the slices are small. To solve this problem, creators use colors and legends to differentiate slices, but end up worsening the problem.</p>



<p>For example, compare charts I, II, and III below.</p>



<div class="figure" style="text-align: center"><span id="fig:pie-chart-lots-of-slices-colors-legends-combined"></span>
<img src="https://d2py08v4b28rs4.cloudfront.net/wp-content/uploads/pie-chart-lots-of-slices-colors-legends-combined.png" alt="Three different pie charts with colors and legends">
<p class="caption">
Figure 2: Three different pie charts with colors and legends
</p>
</div>



<p>In the chart I, it is hard to differentiate between G and F. Chart II solves the differentiation problem, but we don’t know the value of G or F. Chart III tries to help with a legend, which may be needed if the labels are long, but cause another problem: looking for the key every time.</p>



<p>A potential solution is adding direct labels with data to each slice. While labels help clarify and offer precision, they defeat the purpose of creating a chart.</p>



<div class="figure" style="text-align: center"><span id="fig:pie-chart-with-lots-of-slices-colors-data-labels"></span>
<img src="https://d2py08v4b28rs4.cloudfront.net/wp-content/uploads/pie-chart-with-lots-of-slices-colors-data-labels.png" alt="Pie charts with data labels" width="90%">
<p class="caption">
Figure 3: Pie charts with data labels
</p>
</div>



<p>We can create <a href="https://nandeshwar.info/data-visualization/wall-street-journal-data-visualization-r/">better pie charts</a> by reducing the number of slices by collapsing some of them or highlighting the important ones.</p>



<p>Let’s see these one at a time.</p>



<h3>Reduced number of slices</h3>



<p>We will put all the small ones in the “other” category.</p>



<div class="figure" style="text-align: center"><span id="fig:pie-chart-slices-combined-slices"></span>
<img src="https://d2py08v4b28rs4.cloudfront.net/wp-content/uploads/pie-chart-slices-combined-slices.png" alt="Pie chart with collapsed categories" width="80%">
<p class="caption">
Figure 4: Pie chart with collapsed categories
</p>
</div>



<h3>Highlight an important slice</h3>



<p>We will keep the color of all slices the same, except for the one we want to highlight. Highlighting can be used to make a point along with an annotation.</p>



<div class="figure" style="text-align: center"><span id="fig:pie-chart-slices-highlight-colors-label"></span>
<img src="https://d2py08v4b28rs4.cloudfront.net/wp-content/uploads/pie-chart-slices-highlight-colors-label.png" alt="Pie chart with a slice highlighted" width="80%">
<p class="caption">
Figure 5: Pie chart with a slice highlighted
</p>
</div>



<h3>Advantages of a pie chart</h3>



<ul><li>You can see the portion of a category as compared to other the portions</li><li>It offers an easy way to compare large and small slices</li><li>It is intuitive</li></ul>



<h3>Disadvantages of a pie chart</h3>



<ul><li>It makes comparison harder of almost similar sized slices</li><li>With too many proportions or slices, it is harder to see the difference</li><li>When colors or legends are used to differentiate slices, understanding is harder</li><li>Too many labels crowd the chart, and the main point is lost</li></ul>



<h2>Bar charts</h2>



<h3>When to use bar charts?</h3>



<p>Analysts create bar charts for a variety of uses: to show the absolute or proportional value i.e. total sales by product or proportion of a product sales of the total sales. Sometimes analysts create bar charts to show year over year numbers.</p>



<p>Although bar charts don’t show a part of the whole intuitively, they make comparisons of each proportion easy.</p>



<p>See the example below. You can see that the categories C and D are of the same size, and category G is the smallest.</p>



<div class="figure" style="text-align: center"><span id="fig:simple-bar-chart"></span>
<img src="https://d2py08v4b28rs4.cloudfront.net/wp-content/uploads/simple-bar-chart.png" alt="Simple bar chart" width="80%">
<p class="caption">
Figure 6: Simple bar chart
</p>
</div>



<p>This chart can be made better by arranging (or sorting) the bars by their lengths.</p>



<div class="figure" style="text-align: center"><span id="fig:simple-bar-chart-ordered"></span>
<img src="https://d2py08v4b28rs4.cloudfront.net/wp-content/uploads/simple-bar-chart-ordered.png" alt="Ordered bar chart" width="80%">
<p class="caption">
Figure 7: Ordered bar chart
</p>
</div>



<p>We can make this chart aesthetically pleasing by placing invisible or white gridlines.</p>



<div class="figure" style="text-align: center"><span id="fig:simple-bar-chart-ordered-white-gridlines"></span>
<img src="https://d2py08v4b28rs4.cloudfront.net/wp-content/uploads/simple-bar-chart-ordered-white-gridlines.png" alt="Ordered bar chart with white gridlines" width="80%">
<p class="caption">
Figure 8: Ordered bar chart with white gridlines
</p>
</div>



<p>Bar charts (the vertical ones) also have problems when the labels are long. Often they are at an angle, and you need to turn your head to read them.</p>



<div class="figure" style="text-align: center"><span id="fig:simple-bar-chart-ordered-white-gridlines-long-label"></span>
<img src="https://d2py08v4b28rs4.cloudfront.net/wp-content/uploads/simple-bar-chart-ordered-white-gridlines-long-label.png" alt="Bar charts with long labels" width="80%">
<p class="caption">
Figure 9: Bar charts with long labels
</p>
</div>



<p>You can solve this problem by moving the labels to the y-axis:</p>



<div class="figure" style="text-align: center"><span id="fig:simple-bar-chart-ordered-white-gridlines-long-label-yaxis"></span>
<img src="https://d2py08v4b28rs4.cloudfront.net/wp-content/uploads/simple-bar-chart-ordered-white-gridlines-long-label-yaxis.png" alt="Bar charts with long labels flipped on the y-axis" width="80%">
<p class="caption">
Figure 10: Bar charts with long labels flipped on the y-axis
</p>
</div>



<p>Although the bar charts make the comparison easier than a pie chart, knowing the size of the smaller bars is similarly hard as knowing the size of a small slice from a pie chart. But at least the bar charts have gridlines to help us.</p>



<p>Just like the pie charts, we can try to overcome this problem by adding labels to the bars.</p>



<div class="figure" style="text-align: center"><span id="fig:simple-bar-chart-ordered-white-gridlines-long-label-yaxis-labels"></span>
<img src="https://d2py08v4b28rs4.cloudfront.net/wp-content/uploads/simple-bar-chart-ordered-white-gridlines-long-label-yaxis-labels.png" alt="Bar charts with long labels flipped on the y-axis with some data labels" width="90%">
<p class="caption">
Figure 11: Bar charts with long labels flipped on the y-axis with some data labels
</p>
</div>



<h3>Advantages of bar charts</h3>



<ul><li>Comparison among categories is easier</li><li>Gridlines help guide the reader</li><li>Length of the bar makes understanding easier</li><li>Color coding and legends are not required (for single bars)</li><li>Direct labeling is easier</li></ul>



<h3>Disadvantages of bar charts</h3>



<ul><li>Smaller bars don’t offer precision</li><li>Hard to “see” the proportion, that is the part of the whole</li></ul>



<h2>When to use a pie chart vs. bar chart</h2>



<p><strong>Use pie charts when:</strong></p>



<ul><li>The number of categories is small</li><li>Readers can differentiate slices (unless you are making a point)</li><li>You don&#8217;t need to rely on many colors or labels to explain the proportions</li><li>The total adds up to 100%</li></ul>



<p><strong>Use bar charts when:</strong></p>



<ul><li>You have many categories (not too many)</li><li>You need to compare numbers side-by-side (caution: more than two bars are hard for readers</li></ul>



<h2>Waffle Charts</h2>



<p><a href="https://nandeshwar.info/data-visualization/waffle-chart-vs-dot-plot-vs-pie-charts/">Waffle charts</a> combine bar charts and pie charts into one chart and compound their problems. While you can see the proportions, precision is lost, especially when the proportions don’t fit in one square. And, you still have to decode the color and legend to understand the chart. Also, longer labels will create problems with legend positioning. These charts could be useful with fewer categories, or if analysts want to make a point.</p>



<div class="figure" style="text-align: center"><span id="fig:simple-waffel-chart"></span>
<img src="https://d2py08v4b28rs4.cloudfront.net/wp-content/uploads/simple-waffel-chart.png" alt="A waffle chart" width="90%">
<p class="caption">
Figure 12: A waffle chart
</p>
</div>



<h2>Dot charts</h2>



<p>Dot charts or dot plots, popularized by William Cleveland, are similar to bar plots, except the bars are replaced with a dot at the end of the bar.</p>



<p>It’s easier to see an example:</p>



<div class="figure" style="text-align: center"><span id="fig:simple-dot-chart"></span>
<img src="https://d2py08v4b28rs4.cloudfront.net/wp-content/uploads/simple-dot-chart.png" alt="A simple dot chart" width="80%">
<p class="caption">
Figure 13: A simple dot chart
</p>
</div>



<p>Dot charts use the best qualities of a bar chart such as ordered data, labels on the y-axis, and reducing the need to rely on color for decoding the data. It exceeds the usefulness of a bar chart by using a smaller space than a bar chart.</p>



<blockquote class="wp-block-quote"><p>Cleveland said this about area and perception, “Using area to encode quantitative information is a poor graphical method. Effects that can be readily perceived in other visualizations are often lost in an encoding by area.”</p></blockquote>



<h3>Advantages of dot charts</h3>



<ul><li>Comparison is easier</li><li>Take little space</li><li>Differences among categories are noticeable</li><li>Color or legend coding not required (for single categories)</li></ul>



<h3>Disadvantages of dot charts</h3>



<ul><li>Some precision is lost</li><li>They might be new to viewers who may find the charts dull</li></ul>



<hr class="wp-block-separator is-style-wide"/>



<h2>Head to head: pie charts vs.&nbsp;bar charts</h2>



<p>Let’s look at how pie charts compare against bar charts with real data. These data are from: “Reveal from The Center for Investigative Reporting and The Center for Employment Equity,” <a href="https://www.revealnews.org/topic/silicon-valley-diversity/">a study on diversity</a> in silicon valley.</p>



<p><strong>First up, let’s look at all the distribution of IT professionals by roles.</strong></p>



<div class="figure" style="text-align: center"><span id="fig:job-IT-cat-pie-bar-combined"></span>
<img src="https://d2py08v4b28rs4.cloudfront.net/wp-content/uploads/job-IT-cat-pie-bar-combined.png" alt="Bar chart vs. pie chart comparing job categories of employees in Silicon Valley companies" width="90%">
<p class="caption">
Figure 14: Bar chart vs.&nbsp;pie chart comparing job categories of employees in Silicon Valley companies
</p>
</div>



<div class="figure" style="text-align: center"><span id="fig:job-IT-cat-pie-bar-combined-reduced-height"></span>
<img src="https://d2py08v4b28rs4.cloudfront.net/wp-content/uploads/job-IT-cat-pie-bar-combined-reduced-height-1024x205.png" alt="Bar chart vs. pie chart comparing job categories of employees in Silicon Valley companies with reduced height" width="90%">
<p class="caption">
Figure 15: Bar chart vs.&nbsp;pie chart comparing job categories of employees in Silicon Valley companies with reduced height
</p>
</div>



<h3>What do we see</h3>



<ul><li>Both the pie and bar chart in Figure <a href="#fig:job-IT-cat-pie-bar-combined">14</a> clearly show that professionals account for more than 50% of the workforce.</li><li>It is hard to see the percentage of executives.</li><li>Both the charts clearly show that there more “other workers” than there are managers.</li><li>It is hard to see the percentage of “other workers” in the pie chart, but you can see that it is slightly below 30% in the bar chart.</li><li>When we reduced the height of both plots by 50% in Figure <a href="#fig:job-IT-cat-pie-bar-combined-reduced-height">15</a>, you can see that the bar chart is still readable and gives you the same information as Figure <a href="#fig:job-IT-cat-pie-bar-combined">14</a></li></ul>



<p><strong>Verdict</strong>: slight advantage bar charts. If we add data labels to both the charts, they will be at the same level.</p>



<hr class="wp-block-separator is-style-wide"/>



<p><strong>Now, let’s look at the distribution of ethnicities/races.</strong></p>



<p>Here&#8217;s another head to head: pie charts vs. bar charts round II.</p>



<div class="figure" style="text-align: center"><span id="fig:IT-race-pie-bar-combined"></span>
<img src="https://d2py08v4b28rs4.cloudfront.net/wp-content/uploads/IT-race-pie-bar-combined.png" alt="Bar chart vs. pie chart comparing ethnicities/races of employees in Silicon Valley companies" width="90%">
<p class="caption">
Figure 15: Bar chart vs.&nbsp;pie chart comparing ethnicities/races of employees in Silicon Valley companies
</p>
</div>



<h3>What do we see</h3>



<ul><li>Both the pie and bar chart in Figure <a href="#fig:IT-race-pie-bar-combined">15</a> clearly show that white employees account for more than 50% of the workforce.</li><li>Both the charts show that Asian employees account for slightly over 25% of the workforce</li><li>The bar chart shows that there are almost double the number of Latinx employees than Black employees. We can see the same ratio for Asian and white employees in the bar chart.</li><li>It is hard to see the percentage of “other” and Black employees.</li></ul>



<blockquote class="wp-block-quote"><p><strong>Verdict</strong>: slight advantage bar charts. With data labels, the pie chart may come out ahead.</p></blockquote>



<p>What if we want to see the job category and race/ethnicity at the same time. We can use stack bar charts or side-by-side bar charts, but what about the <a href="https://nandeshwar.info/data-visualization/waffle-chart-vs-dot-plot-vs-pie-charts/">pie charts</a>?</p>



<p>We can combine the ethnicity/race with the job category and create more slices. We can also use hues for different ethnicity/races and place them close to each other. Round III of pie charts vs. bar charts .</p>



<hr class="wp-block-separator is-style-wide"/>



<p><strong>Job category and race/ethnicity</strong></p>



<div class="figure" style="text-align: center"><span id="fig:pie-chart-race-job-cat-combined"></span>
<img src="https://d2py08v4b28rs4.cloudfront.net/wp-content/uploads/pie-chart-race-job-cat-combined.png" alt="A pie chart showing ethnicities/races and job categories of employees in Silicon Valley companies" width="60%">
<p class="caption">
Figure 16: A pie chart showing ethnicities/races and job categories of employees in Silicon Valley companies
</p>
</div>



<div class="figure" style="text-align: center"><span id="fig:bar-chart-race-job-cat-combined"></span>
<img src="https://d2py08v4b28rs4.cloudfront.net/wp-content/uploads/bar-chart-race-job-cat-combined.png" alt="A bar chart showing ethnicities/races and job categories of employees in Silicon Valley companies" width="80%">
<p class="caption">
Figure 17: A bar chart showing ethnicities/races and job categories of employees in Silicon Valley companies
</p>
</div>



<h3>What do we see</h3>



<ul><li>Many labels in the pie chart (Figure <a href="#fig:pie-chart-race-job-cat-combined">16</a>) are unreadable</li><li>We see white “professionals” account for most of the IT workforce, followed by Asian “professionals”. This can be seen clearly in both the charts (Figure <a href="#fig:pie-chart-race-job-cat-combined">16</a> and <a href="#fig:bar-chart-race-job-cat-combined">17</a>)</li><li>In the bar chart (Figure <a href="#fig:bar-chart-race-job-cat-combined">17</a>), you can see managers who are white makeup for the most number of managers.</li><li>In the bar chart, you can see the non-existence of Latinx, Black, and other executives though the data does contain a small percentage of executives from these groups.</li><li>Although you can’t say it with precision, in the bar chart, you can see the ratio of Asian managers to Asian professionals compared to white managers to white professionals is smaller. [It is 0.2 compared to .37]</li></ul>



<blockquote class="wp-block-quote"><p><strong>Verdict</strong>: meh! Except for one or two large patterns, you have to search into the charts to make meaningful observations.</p></blockquote>



<p>We can try facet or “small multiples” to see whether these charts are any better. Let’s see panels of ethnicity/race and job category distribution. Round IV of panel pie charts vs. bar charts.</p>



<h2>Small multiples</h2>



<h3>Panels of ethnicity/race with job category distribution</h3>



<div class="figure" style="text-align: center"><span id="fig:job-cat-race-panel-pie"></span>
<img src="https://d2py08v4b28rs4.cloudfront.net/wp-content/uploads/job-cat-race-panel-pie.png" alt="Pie chart panels showing job categories of employees in Silicon Valley companies by ethnicity/race" width="90%">
<p class="caption">
Figure 18: Pie chart panels showing job categories of employees in Silicon Valley companies by ethnicity/race
</p>
</div>



<div class="figure" style="text-align: center"><span id="fig:job-cat-race-panel-bar"></span>
<img src="https://d2py08v4b28rs4.cloudfront.net/wp-content/uploads/job-cat-race-panel-bar.png" alt="Bar chart panels showing job categories of employees in Silicon Valley companies by ethnicity/race" width="90%">
<p class="caption">
Figure 19: Bar chart panels showing job categories of employees in Silicon Valley companies by ethnicity/race
</p>
</div>



<h3>What do we see</h3>



<ul><li>You have to jump from one chart to another to make comparisons, more so with pie charts</li><li>In the pie charts (Figure <a href="#fig:job-cat-race-panel-pie">18</a>), you see:
<ul>
<li>A large percentage of Asian employees are “professionals”</li>
<li>More than half of the Black and Latinx employees fall in the “Other workers” category</li>
</ul></li><li>In the bar charts (Figure <a href="#fig:job-cat-race-panel-bar">19</a>), you see:
<ul>
<li>The ratio of managers to “professionals” look smaller compared to the other ethnicities/races</li>
<li>The ratio of “other workers” to “professionals” look bigger for Black and Latinx employees compared to the other ethnicities/races</li>
</ul></li></ul>



<blockquote class="wp-block-quote"><p><strong>Verdict</strong>: tie. Both charts offer a different understanding. The pie charts show underrepresentation or over-representation more intuitively, but the comparison between graphs is harder than the bar charts.</p></blockquote>



<hr class="wp-block-separator is-style-wide"/>



<h3>Panels of job category by ethnicity/race distribution</h3>



<p>Let’s see panels of job category by ethnicity/race distribution. Round V of pie charts vs. bar charts.</p>



<div class="figure" style="text-align: center"><span id="fig:job-cat-race-panel-pie-v2"></span>
<img src="https://d2py08v4b28rs4.cloudfront.net/wp-content/uploads/job-cat-race-panel-pie-v2.png" alt="Pie chart panels showing ethnicity/race of employees in Silicon Valley companies by job category" width="90%">
<p class="caption">
Figure 20: Pie chart panels showing ethnicity/race of employees in Silicon Valley companies by job category
</p>
</div>



<div class="figure" style="text-align: center"><span id="fig:job-cat-race-panel-bar-v2"></span>
<img src="https://d2py08v4b28rs4.cloudfront.net/wp-content/uploads/job-cat-race-panel-bar-v2.png" alt="Bar chart panels showing ethnicity/race of employees in Silicon Valley companies by job category" width="90%">
<p class="caption">
Figure 21: Bar chart panels showing ethnicity/race of employees in Silicon Valley companies by job category
</p>
</div>



<h3>What do we see</h3>



<ul><li>In the pie charts (Figure <a href="#fig:job-cat-race-panel-pie-v2">20</a>), you can see:
<ul>
<li>white employees account for a large portion of the executives</li>
<li>Asian employees have a large percentage of “professionals”</li>
<li>Latinx employees have a higher representation in the “other workers” categories compared to the other job categories.</li>
</ul></li><li>In the bar charts (Figure <a href="#fig:job-cat-race-panel-bar-v2">21</a>), you can see:
<ul>
<li>Black employees are underrepresented in all of the categories, except for the “other workers” category.</li>
</ul></li></ul>



<blockquote class="wp-block-quote"><p><strong>Verdict</strong>: pie charts are slightly ahead. Maybe because the 73% and 52% of white employees in the executive and “professionals” categories respectively jumped out to me. The data labels are harder to see for smaller slices, and if we added those to the bar charts, it could be a tie.</p></blockquote>



<hr class="wp-block-separator is-style-wide"/>



<h2>Dot charts</h2>



<div class="figure" style="text-align: center"><span id="fig:race-within-job-cat-dot-plot"></span>
<img src="https://d2py08v4b28rs4.cloudfront.net/wp-content/uploads/race-within-job-cat-dot-plot.png" alt="A dot chart of race/ethnicity distribution within job category" width="80%">
<p class="caption">
Figure 22: A dot chart of race/ethnicity distribution within job category
</p>
</div>



<div class="figure" style="text-align: center"><span id="fig:job-cat-within-race-dot-plot"></span>
<img src="https://d2py08v4b28rs4.cloudfront.net/wp-content/uploads/job-cat-within-race-dot-plot.png" alt="A dot chart of job category distribution within race/ethnicity" width="80%">
<p class="caption">
Figure 23: A dot chart of job category distribution within race/ethnicity
</p>
</div>



<h3>What do we see</h3>



<ul><li>In the dot chart of Figure <a href="#fig:race-within-job-cat-dot-plot">22</a>, in which we compare ethnicity/race within the job categories, we see:
<ul>
<li>since we have more width available compared to the bar charts, the gap between white executives (72%) and Asian executives (21%) looks larger than shown in the bar charts</li>
<li>The comparison among groups is easier within the panel and with other panels. You don’t need to read the y-axis labels every time.</li>
<li>You can see the wide gap between white/Asian “professionals” and Latinx/Black/Other “professionals.”</li>
</ul></li><li>In the dot chart of Figure <a href="#fig:job-cat-within-race-dot-plot">23</a>, in which we compare job categories within the ethnicities/races, we see:
<ul>
<li>there are many more Asian “professionals” than there are executives or managers</li>
<li>there are many more Black/Latinx “other workers” than there are “professionals,” executives, or managers</li>
</ul></li></ul>



<p>The dot charts offer more information in a small space compared to the bar or pie charts. The comparison is easier though some precision is lost. You don’t have to use color to distinguish categories similar to bar charts. They also look very clean.</p>



<hr class="wp-block-separator is-style-wide"/>



<p><strong>What if we want to compare genders within the job categories and ethnicities/races?</strong></p>



<div class="wp-block-image"><figure class="aligncenter"><img src="https://media.giphy.com/media/glmRyiSI3v5E4/giphy.gif" alt=""/></figure></div>



<p>I doubt we will get good, easily explainable graphs. Simple bar charts and pie charts are out. We can try dot charts with two dots for the genders in the data, and side-by-side bar charts.</p>



<p>Here are three attempts:</p>



<ul><li>Figure <a href="#fig:job-categories-ethnicity-race-distribution-gender-bar-chart">24</a> shows gender representation within a job category by race. When you add the values of all purple bars or green bars in a job category, you will get a total of 100%.</li><li>Figure <a href="#fig:job-categories-ethnicity-race-distribution-gender-dot-chart">25</a> is same as Figure <a href="#fig:job-categories-ethnicity-race-distribution-gender-bar-chart">24</a>, but in a dot chart form.</li><li>Figure <a href="#fig:gender-within-job-cat-dot-plot">26</a> shows the distribution of all genders and ethnicities/races within a job category. When you add all the values within a job category, you will get a total of 100%.</li></ul>



<p>While Figure <a href="#fig:job-categories-ethnicity-race-distribution-gender-bar-chart">24</a> and <a href="#fig:job-categories-ethnicity-race-distribution-gender-dot-chart">25</a> show genders compare among ethnicities/races within a job category, Figure <a href="#fig:gender-within-job-cat-dot-plot">26</a> shows how genders from all ethnicities/races compare with each other within a job category.</p>



<div class="figure" style="text-align: center"><span id="fig:job-categories-ethnicity-race-distribution-gender-bar-chart"></span>
<img src="https://d2py08v4b28rs4.cloudfront.net/wp-content/uploads/job-categories-ethnicity-race-distribution-gender-bar-chart.png" alt="Side-by-side bar charts of job categories and ethnicity/race distribution by gender" width="80%">
<p class="caption">
Figure 24: Side-by-side bar charts of job categories and ethnicity/race distribution by gender
</p>
</div>



<div class="figure" style="text-align: center"><span id="fig:job-categories-ethnicity-race-distribution-gender-dot-chart"></span>
<img src="https://d2py08v4b28rs4.cloudfront.net/wp-content/uploads/job-categories-ethnicity-race-distribution-gender-dot-chart.png" alt="Dot charts of job categories and ethnicity/race distribution by gender" width="80%">
<p class="caption">
Figure 25: Dot charts of job categories and ethnicity/race distribution by gender
</p>
</div>



<div class="figure" style="text-align: center"><span id="fig:gender-within-job-cat-dot-plot"></span>
<img src="https://d2py08v4b28rs4.cloudfront.net/wp-content/uploads/gender-within-job-cat-dot-plot.png" alt="Dot charts of job categories and ethnicity/race distribution by gender" width="80%">
<p class="caption">
Figure 26: Dot charts of job categories and ethnicity/race distribution by gender
</p>
</div>



<h2>A note on dot charts</h2>



<p>As we saw in the previous examples, pie charts aren&#8217;t suitable for multi-level comparison, and although bar charts are good alternatives, dot charts offer more flexibility conveying similar information without a loss of perception or understanding. Dot charts don&#8217;t require color-coding also because we can use different symbols (patterned bar charts look ugly. I know because I have created many of them before).</p>



<h2>Conclusion</h2>



<p>I was surprised by some of the graphs. Pie charts were better in some instances, and side-by-side bar charts were better than dot charts in at least one case. That is why you need to create <a href="https://nandeshwar.info/data-visualization/economist-data-visualization-us-map-using-r/">multiple designs</a> before settling on one. And of course, it also depends on the objective of your overall narrative. For example, Figure <a href="#fig:job-categories-ethnicity-race-distribution-gender-bar-chart">24</a> and <a href="#fig:gender-within-job-cat-dot-plot">26</a> have similar data, but they show two different things. Context is critical. Also, important are <a href="https://nandeshwar.info/learning/thinkers-game-critical-thinking-important-practice/">design skills</a> – some graphs out of the box may not be ready for sharing, but with editing and annotating, charts can speak for themselves.</p>



<hr class="wp-block-separator is-style-wide"/>



<h2>Appendix: R Code</h2>


<pre class="wp-block-code" aria-describedby="shcb-language-29" data-shcb-language-name="R" data-shcb-language-slug="r"><div><code class="hljs language-r shcb-code-table shcb-line-numbers shcb-wrap-lines"><span class='shcb-loc'><span><span class="hljs-comment">####################################################################################################</span>
</span></span><span class='shcb-loc'><span><span class="hljs-comment">## ----setup</span>
</span></span><span class='shcb-loc'><span>
</span></span><span class='shcb-loc'><span>
</span></span><span class='shcb-loc'><span>knitr::opts_chunk$set(
</span></span><span class='shcb-loc'><span>  echo = <span class="hljs-literal">FALSE</span>, message = <span class="hljs-literal">FALSE</span>, <span class="hljs-keyword">warning</span> = <span class="hljs-literal">FALSE</span>,
</span></span><span class='shcb-loc'><span>  fig.width = <span class="hljs-number">6</span>,
</span></span><span class='shcb-loc'><span>  fig.align = <span class="hljs-string">"center"</span>,
</span></span><span class='shcb-loc'><span>  dpi = <span class="hljs-number">96</span>
</span></span><span class='shcb-loc'><span>)
</span></span><span class='shcb-loc'><span>
</span></span><span class='shcb-loc'><span><span class="hljs-keyword">library</span>(scales)
</span></span><span class='shcb-loc'><span><span class="hljs-keyword">library</span>(ggplot2)
</span></span><span class='shcb-loc'><span><span class="hljs-keyword">library</span>(ggforce)
</span></span><span class='shcb-loc'><span><span class="hljs-keyword">library</span>(patchwork)
</span></span><span class='shcb-loc'><span><span class="hljs-keyword">library</span>(dplyr)
</span></span><span class='shcb-loc'><span><span class="hljs-keyword">library</span>(ggthemes)
</span></span><span class='shcb-loc'><span><span class="hljs-keyword">library</span>(waffle)
</span></span><span class='shcb-loc'><span><span class="hljs-keyword">library</span>(readr)
</span></span><span class='shcb-loc'><span><span class="hljs-keyword">library</span>(RColorBrewer)
</span></span><span class='shcb-loc'><span><span class="hljs-keyword">library</span>(ggtext)
</span></span><span class='shcb-loc'><span>
</span></span><span class='shcb-loc'><span>race_colors &lt;- c(<span class="hljs-string">"White"</span> = <span class="hljs-string">"#9e9ac8"</span>, <span class="hljs-string">"Asian"</span> = <span class="hljs-string">"#6baed6"</span>, <span class="hljs-string">"Latinx"</span> = <span class="hljs-string">"#fd8d3c"</span>, <span class="hljs-string">"Black"</span> = <span class="hljs-string">"#74c476"</span>, <span class="hljs-string">"Other"</span> = <span class="hljs-string">"#fb6a4a"</span>)
</span></span><span class='shcb-loc'><span>job_cat_colors &lt;- c(<span class="hljs-string">"Other workers"</span> = <span class="hljs-string">"#8da0cb"</span>, <span class="hljs-string">"Professionals"</span> = <span class="hljs-string">"#e78ac3"</span>, <span class="hljs-string">"Managers"</span> = <span class="hljs-string">"#fc8d62"</span>, <span class="hljs-string">"Executives"</span> = <span class="hljs-string">"#66c2a5"</span>)
</span></span><span class='shcb-loc'><span>
</span></span><span class='shcb-loc'><span>ordered_lvls_race &lt;- c(<span class="hljs-string">"White"</span>, <span class="hljs-string">"Asian"</span>, <span class="hljs-string">"Latinx"</span>, <span class="hljs-string">"Black"</span>, <span class="hljs-string">"Other"</span>)
</span></span><span class='shcb-loc'><span>ordered_lvls_job_cat &lt;- c(<span class="hljs-string">"Executives"</span>, <span class="hljs-string">"Managers"</span>, <span class="hljs-string">"Professionals"</span>, <span class="hljs-string">"Other workers"</span>)
</span></span><span class='shcb-loc'><span><span class="hljs-comment">####################################################################################################</span>
</span></span><span class='shcb-loc'><span>
</span></span><span class='shcb-loc'><span>
</span></span><span class='shcb-loc'><span>
</span></span><span class='shcb-loc'><span><span class="hljs-comment">####################################################################################################</span>
</span></span><span class='shcb-loc'><span><span class="hljs-comment">## Simple pie chart in R</span>
</span></span><span class='shcb-loc'><span>par(
</span></span><span class='shcb-loc'><span>  mar = c(rep(<span class="hljs-number">.8</span>, <span class="hljs-number">4</span>)),
</span></span><span class='shcb-loc'><span>  mai = rep(<span class="hljs-number">0.1</span>, <span class="hljs-number">4</span>)
</span></span><span class='shcb-loc'><span>)
</span></span><span class='shcb-loc'><span>pie(c(<span class="hljs-number">0.3</span>, <span class="hljs-number">0.4</span>, <span class="hljs-number">0.3</span>), labels = c(<span class="hljs-string">"A"</span>, <span class="hljs-string">"B"</span>, <span class="hljs-string">"C"</span>), col = <span class="hljs-string">"grey80"</span>, border = <span class="hljs-string">"white"</span>)
</span></span><span class='shcb-loc'><span><span class="hljs-comment">####################################################################################################</span>
</span></span><span class='shcb-loc'><span>
</span></span><span class='shcb-loc'><span><span class="hljs-comment">####################################################################################################</span>
</span></span><span class='shcb-loc'><span><span class="hljs-comment">## Three different pie charts with colors and legends</span>
</span></span><span class='shcb-loc'><span>data &lt;- data.frame(
</span></span><span class='shcb-loc'><span>  val = c(<span class="hljs-number">0.3</span>, <span class="hljs-number">0.4</span>, <span class="hljs-number">0.1</span>, <span class="hljs-number">0.1</span>, <span class="hljs-number">0.05</span>, <span class="hljs-number">0.02</span>, <span class="hljs-number">0.01</span>, <span class="hljs-number">0.02</span>),
</span></span><span class='shcb-loc'><span>  cat = LETTERS[<span class="hljs-number">1</span>:<span class="hljs-number">8</span>],
</span></span><span class='shcb-loc'><span>  long_cat = c(<span class="hljs-string">"Cream of Wheat"</span>, <span class="hljs-string">"Malt-O-Meal"</span>, <span class="hljs-string">"Maypo"</span>, <span class="hljs-string">"Quaker Oats"</span>, <span class="hljs-string">"Cinnamon Crunch"</span>, <span class="hljs-string">"Scott's Porage Oats"</span>, <span class="hljs-string">"Cap'n Crunch"</span>, <span class="hljs-string">"Cheerios"</span>),
</span></span><span class='shcb-loc'><span>  stringsAsFactors = <span class="hljs-literal">FALSE</span>
</span></span><span class='shcb-loc'><span>)
</span></span><span class='shcb-loc'><span>
</span></span><span class='shcb-loc'><span>
</span></span><span class='shcb-loc'><span><span class="hljs-comment"># copied from Claus Wilke beautiful pie chart in r </span>
</span></span><span class='shcb-loc'><span><span class="hljs-comment"># https://github.com/clauswilke/dataviz/blob/master/nested_proportions.Rmd</span>
</span></span><span class='shcb-loc'><span><span class="hljs-comment"># I haven't paid attention to know how this works</span>
</span></span><span class='shcb-loc'><span><span class="hljs-comment"># create a pie chart from data frame in r</span>
</span></span><span class='shcb-loc'><span>
</span></span><span class='shcb-loc'><span>data &lt;- data %&gt;%
</span></span><span class='shcb-loc'><span>  arrange(val) %&gt;%
</span></span><span class='shcb-loc'><span>  mutate(
</span></span><span class='shcb-loc'><span>    end_angle = <span class="hljs-number">2</span> * pi * cumsum(val) / sum(val), 
</span></span><span class='shcb-loc'><span>    start_angle = lag(end_angle, default = <span class="hljs-number">0</span>), 
</span></span><span class='shcb-loc'><span>    mid_angle = <span class="hljs-number">0.5</span> * (start_angle + end_angle), 
</span></span><span class='shcb-loc'><span>    hjust = ifelse(mid_angle &gt; pi, <span class="hljs-number">1</span>, <span class="hljs-number">0</span>),
</span></span><span class='shcb-loc'><span>    vjust = ifelse(mid_angle &lt; pi / <span class="hljs-number">2</span> | mid_angle &gt; <span class="hljs-number">3</span> * pi / <span class="hljs-number">2</span>, <span class="hljs-number">0</span>, <span class="hljs-number">1</span>)
</span></span><span class='shcb-loc'><span>  )
</span></span><span class='shcb-loc'><span>
</span></span><span class='shcb-loc'><span>rpie &lt;- <span class="hljs-number">1</span>
</span></span><span class='shcb-loc'><span>rlabel_out &lt;- <span class="hljs-number">1.05</span> * rpie
</span></span><span class='shcb-loc'><span>rlabel_in &lt;- <span class="hljs-number">0.6</span> * rpie
</span></span><span class='shcb-loc'><span>
</span></span><span class='shcb-loc'><span>
</span></span><span class='shcb-loc'><span>p1 &lt;- ggplot(data) +
</span></span><span class='shcb-loc'><span>  geom_arc_bar(
</span></span><span class='shcb-loc'><span>    aes(
</span></span><span class='shcb-loc'><span>      x0 = <span class="hljs-number">0</span>, y0 = <span class="hljs-number">0</span>, r0 = <span class="hljs-number">0</span>, r = rpie,
</span></span><span class='shcb-loc'><span>      start = start_angle, end = end_angle
</span></span><span class='shcb-loc'><span>    ),
</span></span><span class='shcb-loc'><span>    fill = <span class="hljs-string">"grey90"</span>,
</span></span><span class='shcb-loc'><span>    color = <span class="hljs-string">"white"</span>
</span></span><span class='shcb-loc'><span>  ) +
</span></span><span class='shcb-loc'><span>  coord_fixed() +
</span></span><span class='shcb-loc'><span>  geom_text(
</span></span><span class='shcb-loc'><span>    aes(
</span></span><span class='shcb-loc'><span>      x = rlabel_out * sin(mid_angle),
</span></span><span class='shcb-loc'><span>      y = rlabel_out * cos(mid_angle),
</span></span><span class='shcb-loc'><span>      label = cat,
</span></span><span class='shcb-loc'><span>      hjust = hjust, vjust = vjust
</span></span><span class='shcb-loc'><span>    ),
</span></span><span class='shcb-loc'><span>    size = <span class="hljs-number">10</span> / .pt
</span></span><span class='shcb-loc'><span>  ) +
</span></span><span class='shcb-loc'><span>  theme_void() +
</span></span><span class='shcb-loc'><span>  ggtitle(<span class="hljs-string">"I"</span>) +
</span></span><span class='shcb-loc'><span>  scale_x_continuous(
</span></span><span class='shcb-loc'><span>    name = <span class="hljs-literal">NULL</span>,
</span></span><span class='shcb-loc'><span>    limits = c(-<span class="hljs-number">1.5</span>, <span class="hljs-number">1.4</span>),
</span></span><span class='shcb-loc'><span>    expand = c(<span class="hljs-number">0</span>, <span class="hljs-number">0</span>)
</span></span><span class='shcb-loc'><span>  ) +
</span></span><span class='shcb-loc'><span>  scale_y_continuous(
</span></span><span class='shcb-loc'><span>    name = <span class="hljs-literal">NULL</span>,
</span></span><span class='shcb-loc'><span>    limits = c(-<span class="hljs-number">1.2</span>, <span class="hljs-number">1.3</span>),
</span></span><span class='shcb-loc'><span>    expand = c(<span class="hljs-number">0</span>, <span class="hljs-number">0</span>)
</span></span><span class='shcb-loc'><span>  ) +
</span></span><span class='shcb-loc'><span>  theme(plot.title = element_text(color = <span class="hljs-string">"#1A3BA5"</span>, hjust = <span class="hljs-number">0.5</span>, face = <span class="hljs-string">"bold"</span>, size = rel(<span class="hljs-number">1.5</span>)))
</span></span><span class='shcb-loc'><span>
</span></span><span class='shcb-loc'><span>p2 &lt;- ggplot(data) +
</span></span><span class='shcb-loc'><span>  geom_arc_bar(
</span></span><span class='shcb-loc'><span>    aes(
</span></span><span class='shcb-loc'><span>      x0 = <span class="hljs-number">0</span>, y0 = <span class="hljs-number">0</span>, r0 = <span class="hljs-number">0</span>, r = rpie,
</span></span><span class='shcb-loc'><span>      start = start_angle, end = end_angle,
</span></span><span class='shcb-loc'><span>      fill = cat,
</span></span><span class='shcb-loc'><span>    ),
</span></span><span class='shcb-loc'><span>    color = <span class="hljs-string">"white"</span>
</span></span><span class='shcb-loc'><span>  ) +
</span></span><span class='shcb-loc'><span>  coord_fixed() +
</span></span><span class='shcb-loc'><span>  geom_text(
</span></span><span class='shcb-loc'><span>    aes(
</span></span><span class='shcb-loc'><span>      x = rlabel_out * sin(mid_angle),
</span></span><span class='shcb-loc'><span>      y = rlabel_out * cos(mid_angle),
</span></span><span class='shcb-loc'><span>      label = cat,
</span></span><span class='shcb-loc'><span>      hjust = hjust, vjust = vjust
</span></span><span class='shcb-loc'><span>    ),
</span></span><span class='shcb-loc'><span>    size = <span class="hljs-number">10</span> / .pt
</span></span><span class='shcb-loc'><span>  ) +
</span></span><span class='shcb-loc'><span>  theme_void() +
</span></span><span class='shcb-loc'><span>  ggtitle(<span class="hljs-string">"II"</span>) +
</span></span><span class='shcb-loc'><span>  scale_x_continuous(
</span></span><span class='shcb-loc'><span>    name = <span class="hljs-literal">NULL</span>,
</span></span><span class='shcb-loc'><span>    limits = c(-<span class="hljs-number">1.5</span>, <span class="hljs-number">1.4</span>),
</span></span><span class='shcb-loc'><span>    expand = c(<span class="hljs-number">0</span>, <span class="hljs-number">0</span>)
</span></span><span class='shcb-loc'><span>  ) +
</span></span><span class='shcb-loc'><span>  scale_y_continuous(
</span></span><span class='shcb-loc'><span>    name = <span class="hljs-literal">NULL</span>,
</span></span><span class='shcb-loc'><span>    limits = c(-<span class="hljs-number">1.2</span>, <span class="hljs-number">1.3</span>),
</span></span><span class='shcb-loc'><span>    expand = c(<span class="hljs-number">0</span>, <span class="hljs-number">0</span>)
</span></span><span class='shcb-loc'><span>  ) +
</span></span><span class='shcb-loc'><span>  scale_fill_brewer(type = <span class="hljs-string">"qual"</span>, palette = <span class="hljs-string">"Set2"</span>) +
</span></span><span class='shcb-loc'><span>  theme(legend.position = <span class="hljs-string">"none"</span>) +
</span></span><span class='shcb-loc'><span>  theme(plot.title = element_text(color = <span class="hljs-string">"#1A3BA5"</span>, hjust = <span class="hljs-number">0.5</span>, face = <span class="hljs-string">"bold"</span>, size = rel(<span class="hljs-number">1.5</span>)))
</span></span><span class='shcb-loc'><span>
</span></span><span class='shcb-loc'><span>p3 &lt;- ggplot(data) +
</span></span><span class='shcb-loc'><span>  geom_arc_bar(
</span></span><span class='shcb-loc'><span>    aes(
</span></span><span class='shcb-loc'><span>      x0 = <span class="hljs-number">0</span>, y0 = <span class="hljs-number">0</span>, r0 = <span class="hljs-number">0</span>, r = rpie,
</span></span><span class='shcb-loc'><span>      start = start_angle, end = end_angle,
</span></span><span class='shcb-loc'><span>      fill = cat,
</span></span><span class='shcb-loc'><span>    ),
</span></span><span class='shcb-loc'><span>    color = <span class="hljs-string">"gray85"</span>
</span></span><span class='shcb-loc'><span>  ) +
</span></span><span class='shcb-loc'><span>  coord_fixed() +
</span></span><span class='shcb-loc'><span>  geom_text(
</span></span><span class='shcb-loc'><span>    aes(
</span></span><span class='shcb-loc'><span>      x = rlabel_out * sin(mid_angle),
</span></span><span class='shcb-loc'><span>      y = rlabel_out * cos(mid_angle),
</span></span><span class='shcb-loc'><span>      label = cat,
</span></span><span class='shcb-loc'><span>      hjust = hjust, vjust = vjust
</span></span><span class='shcb-loc'><span>    ),
</span></span><span class='shcb-loc'><span>    size = <span class="hljs-number">10</span> / .pt
</span></span><span class='shcb-loc'><span>  ) +
</span></span><span class='shcb-loc'><span>  theme_void() +
</span></span><span class='shcb-loc'><span>  ggtitle(<span class="hljs-string">"III"</span>) +
</span></span><span class='shcb-loc'><span>  scale_x_continuous(
</span></span><span class='shcb-loc'><span>    name = <span class="hljs-literal">NULL</span>,
</span></span><span class='shcb-loc'><span>    limits = c(-<span class="hljs-number">1.5</span>, <span class="hljs-number">1.4</span>),
</span></span><span class='shcb-loc'><span>    expand = c(<span class="hljs-number">0</span>, <span class="hljs-number">0</span>)
</span></span><span class='shcb-loc'><span>  ) +
</span></span><span class='shcb-loc'><span>  scale_y_continuous(
</span></span><span class='shcb-loc'><span>    name = <span class="hljs-literal">NULL</span>,
</span></span><span class='shcb-loc'><span>    limits = c(-<span class="hljs-number">1.2</span>, <span class="hljs-number">1.3</span>),
</span></span><span class='shcb-loc'><span>    expand = c(<span class="hljs-number">0</span>, <span class="hljs-number">0</span>)
</span></span><span class='shcb-loc'><span>  ) +
</span></span><span class='shcb-loc'><span>  scale_fill_brewer(type = <span class="hljs-string">"seq"</span>, palette = <span class="hljs-string">"Oranges"</span>) +
</span></span><span class='shcb-loc'><span>  theme(plot.title = element_text(color = <span class="hljs-string">"#1A3BA5"</span>, hjust = <span class="hljs-number">0.5</span>, face = <span class="hljs-string">"bold"</span>, size = rel(<span class="hljs-number">1.5</span>)))
</span></span><span class='shcb-loc'><span>
</span></span><span class='shcb-loc'><span>
</span></span><span class='shcb-loc'><span>p1 + p2 + p3
</span></span><span class='shcb-loc'><span><span class="hljs-comment">####################################################################################################</span>
</span></span><span class='shcb-loc'><span>
</span></span><span class='shcb-loc'><span>
</span></span><span class='shcb-loc'><span><span class="hljs-comment">####################################################################################################</span>
</span></span><span class='shcb-loc'><span><span class="hljs-comment">## Pie charts with data labels</span>
</span></span><span class='shcb-loc'><span>p4 &lt;- ggplot(data) +
</span></span><span class='shcb-loc'><span>  geom_arc_bar(
</span></span><span class='shcb-loc'><span>    aes(
</span></span><span class='shcb-loc'><span>      x0 = <span class="hljs-number">0</span>, y0 = <span class="hljs-number">0</span>, r0 = <span class="hljs-number">0</span>, r = rpie,
</span></span><span class='shcb-loc'><span>      start = start_angle, end = end_angle
</span></span><span class='shcb-loc'><span>    ),
</span></span><span class='shcb-loc'><span>    fill = <span class="hljs-string">"grey90"</span>,
</span></span><span class='shcb-loc'><span>    color = <span class="hljs-string">"white"</span>
</span></span><span class='shcb-loc'><span>  ) +
</span></span><span class='shcb-loc'><span>  coord_fixed(clip = <span class="hljs-string">"off"</span>) +
</span></span><span class='shcb-loc'><span>  geom_text(
</span></span><span class='shcb-loc'><span>    aes(
</span></span><span class='shcb-loc'><span>      x = rlabel_out * sin(mid_angle),
</span></span><span class='shcb-loc'><span>      y = rlabel_out * cos(mid_angle),
</span></span><span class='shcb-loc'><span>      label = paste(cat, percent(val), sep = <span class="hljs-string">": "</span>),
</span></span><span class='shcb-loc'><span>      hjust = hjust, vjust = vjust
</span></span><span class='shcb-loc'><span>    ),
</span></span><span class='shcb-loc'><span>    size = <span class="hljs-number">10</span> / .pt
</span></span><span class='shcb-loc'><span>  ) +
</span></span><span class='shcb-loc'><span>  theme_void() +
</span></span><span class='shcb-loc'><span>  ggtitle(<span class="hljs-string">"Label outside"</span>) +
</span></span><span class='shcb-loc'><span>  scale_x_continuous(
</span></span><span class='shcb-loc'><span>    name = <span class="hljs-literal">NULL</span>,
</span></span><span class='shcb-loc'><span>    limits = c(-<span class="hljs-number">1.5</span>, <span class="hljs-number">1.4</span>),
</span></span><span class='shcb-loc'><span>    expand = c(<span class="hljs-number">0</span>, <span class="hljs-number">0</span>)
</span></span><span class='shcb-loc'><span>  ) +
</span></span><span class='shcb-loc'><span>  scale_y_continuous(
</span></span><span class='shcb-loc'><span>    name = <span class="hljs-literal">NULL</span>,
</span></span><span class='shcb-loc'><span>    limits = c(-<span class="hljs-number">1.2</span>, <span class="hljs-number">1.3</span>),
</span></span><span class='shcb-loc'><span>    expand = c(<span class="hljs-number">0</span>, <span class="hljs-number">0</span>)
</span></span><span class='shcb-loc'><span>  ) +
</span></span><span class='shcb-loc'><span>  theme(plot.title = element_text(color = <span class="hljs-string">"#1A3BA5"</span>, hjust = <span class="hljs-number">0.5</span>, face = <span class="hljs-string">"bold"</span>, size = rel(<span class="hljs-number">1.5</span>)))
</span></span><span class='shcb-loc'><span>
</span></span><span class='shcb-loc'><span>
</span></span><span class='shcb-loc'><span>p5 &lt;- ggplot(data) +
</span></span><span class='shcb-loc'><span>  geom_arc_bar(
</span></span><span class='shcb-loc'><span>    aes(
</span></span><span class='shcb-loc'><span>      x0 = <span class="hljs-number">0</span>, y0 = <span class="hljs-number">0</span>, r0 = <span class="hljs-number">0</span>, r = rpie,
</span></span><span class='shcb-loc'><span>      start = start_angle, end = end_angle
</span></span><span class='shcb-loc'><span>    ),
</span></span><span class='shcb-loc'><span>    fill = <span class="hljs-string">"grey90"</span>,
</span></span><span class='shcb-loc'><span>    color = <span class="hljs-string">"white"</span>
</span></span><span class='shcb-loc'><span>  ) +
</span></span><span class='shcb-loc'><span>  coord_fixed(clip = <span class="hljs-string">"off"</span>) +
</span></span><span class='shcb-loc'><span>  geom_text(
</span></span><span class='shcb-loc'><span>    aes(
</span></span><span class='shcb-loc'><span>      x = rlabel_in * sin(mid_angle),
</span></span><span class='shcb-loc'><span>      y = rlabel_in * cos(mid_angle),
</span></span><span class='shcb-loc'><span>      label = ifelse(val &lt; <span class="hljs-number">.1</span>, <span class="hljs-literal">NA</span>, paste(cat, percent(val), sep = <span class="hljs-string">": "</span>))
</span></span><span class='shcb-loc'><span>    ),
</span></span><span class='shcb-loc'><span>    size = <span class="hljs-number">10</span> / .pt
</span></span><span class='shcb-loc'><span>  ) +
</span></span><span class='shcb-loc'><span>  theme_void() +
</span></span><span class='shcb-loc'><span>  ggtitle(<span class="hljs-string">"Label inside"</span>) +
</span></span><span class='shcb-loc'><span>  scale_x_continuous(
</span></span><span class='shcb-loc'><span>    name = <span class="hljs-literal">NULL</span>,
</span></span><span class='shcb-loc'><span>    limits = c(-<span class="hljs-number">1.5</span>, <span class="hljs-number">1.4</span>),
</span></span><span class='shcb-loc'><span>    expand = c(<span class="hljs-number">0</span>, <span class="hljs-number">0</span>)
</span></span><span class='shcb-loc'><span>  ) +
</span></span><span class='shcb-loc'><span>  scale_y_continuous(
</span></span><span class='shcb-loc'><span>    name = <span class="hljs-literal">NULL</span>,
</span></span><span class='shcb-loc'><span>    limits = c(-<span class="hljs-number">1.2</span>, <span class="hljs-number">1.3</span>),
</span></span><span class='shcb-loc'><span>    expand = c(<span class="hljs-number">0</span>, <span class="hljs-number">0</span>)
</span></span><span class='shcb-loc'><span>  ) +
</span></span><span class='shcb-loc'><span>  theme(plot.title = element_text(color = <span class="hljs-string">"#1A3BA5"</span>, hjust = <span class="hljs-number">0.5</span>, face = <span class="hljs-string">"bold"</span>, size = rel(<span class="hljs-number">1.5</span>)))
</span></span><span class='shcb-loc'><span>
</span></span><span class='shcb-loc'><span>
</span></span><span class='shcb-loc'><span>wrap_plots(p4, p5, heights = <span class="hljs-number">1</span>)
</span></span><span class='shcb-loc'><span><span class="hljs-comment">####################################################################################################</span>
</span></span><span class='shcb-loc'><span>
</span></span><span class='shcb-loc'><span><span class="hljs-comment">####################################################################################################</span>
</span></span><span class='shcb-loc'><span><span class="hljs-comment">## Pie chart with collapsed categories</span>
</span></span><span class='shcb-loc'><span>
</span></span><span class='shcb-loc'><span>collapsed_data &lt;- data.frame(
</span></span><span class='shcb-loc'><span>  val = c(<span class="hljs-number">0.3</span>, <span class="hljs-number">0.4</span>, <span class="hljs-number">0.1</span>, <span class="hljs-number">0.2</span>),
</span></span><span class='shcb-loc'><span>  cat = c(LETTERS[<span class="hljs-number">1</span>:<span class="hljs-number">3</span>], <span class="hljs-string">"Other"</span>),
</span></span><span class='shcb-loc'><span>  stringsAsFactors = <span class="hljs-literal">FALSE</span>
</span></span><span class='shcb-loc'><span>)
</span></span><span class='shcb-loc'><span>
</span></span><span class='shcb-loc'><span><span class="hljs-comment"># copied from Claus Wilke https://github.com/clauswilke/dataviz/blob/master/nested_proportions.Rmd</span>
</span></span><span class='shcb-loc'><span><span class="hljs-comment"># I haven't paid attention to know how this works</span>
</span></span><span class='shcb-loc'><span>
</span></span><span class='shcb-loc'><span>collapsed_data &lt;- collapsed_data %&gt;%
</span></span><span class='shcb-loc'><span>  arrange(val) %&gt;%
</span></span><span class='shcb-loc'><span>  mutate(
</span></span><span class='shcb-loc'><span>    end_angle = <span class="hljs-number">2</span> * pi * cumsum(val) / sum(val), 
</span></span><span class='shcb-loc'><span>    start_angle = lag(end_angle, default = <span class="hljs-number">0</span>), 
</span></span><span class='shcb-loc'><span>    mid_angle = <span class="hljs-number">0.5</span> * (start_angle + end_angle), 
</span></span><span class='shcb-loc'><span>    hjust = ifelse(mid_angle &gt; pi, <span class="hljs-number">1</span>, <span class="hljs-number">0</span>),
</span></span><span class='shcb-loc'><span>    vjust = ifelse(mid_angle &lt; pi / <span class="hljs-number">2</span> | mid_angle &gt; <span class="hljs-number">3</span> * pi / <span class="hljs-number">2</span>, <span class="hljs-number">0</span>, <span class="hljs-number">1</span>)
</span></span><span class='shcb-loc'><span>  )
</span></span><span class='shcb-loc'><span>
</span></span><span class='shcb-loc'><span>rpie &lt;- <span class="hljs-number">1</span>
</span></span><span class='shcb-loc'><span>rlabel_out &lt;- <span class="hljs-number">1.05</span> * rpie
</span></span><span class='shcb-loc'><span>rlabel_in &lt;- <span class="hljs-number">0.6</span> * rpie
</span></span><span class='shcb-loc'><span>
</span></span><span class='shcb-loc'><span>
</span></span><span class='shcb-loc'><span>p6 &lt;- ggplot(collapsed_data) +
</span></span><span class='shcb-loc'><span>  geom_arc_bar(
</span></span><span class='shcb-loc'><span>    aes(
</span></span><span class='shcb-loc'><span>      x0 = <span class="hljs-number">0</span>, y0 = <span class="hljs-number">0</span>, r0 = <span class="hljs-number">0</span>, r = rpie,
</span></span><span class='shcb-loc'><span>      start = start_angle, end = end_angle
</span></span><span class='shcb-loc'><span>    ),
</span></span><span class='shcb-loc'><span>    fill = <span class="hljs-string">"grey90"</span>,
</span></span><span class='shcb-loc'><span>    color = <span class="hljs-string">"white"</span>
</span></span><span class='shcb-loc'><span>  ) +
</span></span><span class='shcb-loc'><span>  coord_fixed(clip = <span class="hljs-string">"off"</span>) +
</span></span><span class='shcb-loc'><span>  geom_text(
</span></span><span class='shcb-loc'><span>    aes(
</span></span><span class='shcb-loc'><span>      x = rlabel_out * sin(mid_angle),
</span></span><span class='shcb-loc'><span>      y = rlabel_out * cos(mid_angle),
</span></span><span class='shcb-loc'><span>      label = paste(cat, percent(val), sep = <span class="hljs-string">": "</span>),
</span></span><span class='shcb-loc'><span>      hjust = hjust, vjust = vjust
</span></span><span class='shcb-loc'><span>    ),
</span></span><span class='shcb-loc'><span>    size = <span class="hljs-number">10</span> / .pt
</span></span><span class='shcb-loc'><span>  ) +
</span></span><span class='shcb-loc'><span>  theme_void() +
</span></span><span class='shcb-loc'><span>  scale_x_continuous(
</span></span><span class='shcb-loc'><span>    name = <span class="hljs-literal">NULL</span>,
</span></span><span class='shcb-loc'><span>    limits = c(-<span class="hljs-number">1.5</span>, <span class="hljs-number">1.4</span>),
</span></span><span class='shcb-loc'><span>    expand = c(<span class="hljs-number">0</span>, <span class="hljs-number">0</span>)
</span></span><span class='shcb-loc'><span>  ) +
</span></span><span class='shcb-loc'><span>  scale_y_continuous(
</span></span><span class='shcb-loc'><span>    name = <span class="hljs-literal">NULL</span>,
</span></span><span class='shcb-loc'><span>    limits = c(-<span class="hljs-number">1.2</span>, <span class="hljs-number">1.3</span>),
</span></span><span class='shcb-loc'><span>    expand = c(<span class="hljs-number">0</span>, <span class="hljs-number">0</span>)
</span></span><span class='shcb-loc'><span>  )
</span></span><span class='shcb-loc'><span>
</span></span><span class='shcb-loc'><span>p6
</span></span><span class='shcb-loc'><span><span class="hljs-comment">####################################################################################################</span>
</span></span><span class='shcb-loc'><span>
</span></span><span class='shcb-loc'><span><span class="hljs-comment">####################################################################################################</span>
</span></span><span class='shcb-loc'><span><span class="hljs-comment">## Pie chart with a slice highlighted</span>
</span></span><span class='shcb-loc'><span>
</span></span><span class='shcb-loc'><span>slice_colors &lt;- rep(<span class="hljs-string">"grey90"</span>, <span class="hljs-number">8</span>)
</span></span><span class='shcb-loc'><span>names(slice_colors) &lt;- LETTERS[<span class="hljs-number">1</span>:<span class="hljs-number">8</span>]
</span></span><span class='shcb-loc'><span>slice_colors[<span class="hljs-string">"F"</span>] &lt;- <span class="hljs-string">"#F3790C"</span>
</span></span><span class='shcb-loc'><span>
</span></span><span class='shcb-loc'><span>p7 &lt;- ggplot(data) +
</span></span><span class='shcb-loc'><span>  geom_arc_bar(
</span></span><span class='shcb-loc'><span>    aes(
</span></span><span class='shcb-loc'><span>      x0 = <span class="hljs-number">0</span>, y0 = <span class="hljs-number">0</span>, r0 = <span class="hljs-number">0</span>, r = rpie,
</span></span><span class='shcb-loc'><span>      start = start_angle, end = end_angle,
</span></span><span class='shcb-loc'><span>      fill = cat,
</span></span><span class='shcb-loc'><span>    ),
</span></span><span class='shcb-loc'><span>    color = <span class="hljs-string">"white"</span>
</span></span><span class='shcb-loc'><span>  ) +
</span></span><span class='shcb-loc'><span>  coord_fixed(clip = <span class="hljs-string">"off"</span>) +
</span></span><span class='shcb-loc'><span>  scale_fill_manual(values = slice_colors) +
</span></span><span class='shcb-loc'><span>  geom_text(
</span></span><span class='shcb-loc'><span>    aes(
</span></span><span class='shcb-loc'><span>      x = rlabel_out * sin(mid_angle),
</span></span><span class='shcb-loc'><span>      y = rlabel_out * cos(mid_angle),
</span></span><span class='shcb-loc'><span>      label = cat,
</span></span><span class='shcb-loc'><span>      hjust = hjust, vjust = vjust
</span></span><span class='shcb-loc'><span>    ),
</span></span><span class='shcb-loc'><span>    size = <span class="hljs-number">10</span> / .pt
</span></span><span class='shcb-loc'><span>  ) +
</span></span><span class='shcb-loc'><span>  theme_void() +
</span></span><span class='shcb-loc'><span>  scale_x_continuous(
</span></span><span class='shcb-loc'><span>    name = <span class="hljs-literal">NULL</span>,
</span></span><span class='shcb-loc'><span>    limits = c(-<span class="hljs-number">1.5</span>, <span class="hljs-number">1.4</span>),
</span></span><span class='shcb-loc'><span>    expand = c(<span class="hljs-number">0</span>, <span class="hljs-number">0</span>)
</span></span><span class='shcb-loc'><span>  ) +
</span></span><span class='shcb-loc'><span>  scale_y_continuous(
</span></span><span class='shcb-loc'><span>    name = <span class="hljs-literal">NULL</span>,
</span></span><span class='shcb-loc'><span>    limits = c(-<span class="hljs-number">1.2</span>, <span class="hljs-number">1.3</span>),
</span></span><span class='shcb-loc'><span>    expand = c(<span class="hljs-number">0</span>, <span class="hljs-number">0</span>)
</span></span><span class='shcb-loc'><span>  ) +
</span></span><span class='shcb-loc'><span>  theme(legend.position = <span class="hljs-string">"none"</span>)
</span></span><span class='shcb-loc'><span>
</span></span><span class='shcb-loc'><span>p7 &lt;- p7 + geom_curve(aes(x = <span class="hljs-number">0.15</span>, y = <span class="hljs-number">1.14</span>, xend = <span class="hljs-number">.6</span>, yend = <span class="hljs-number">1</span>), curvature = -<span class="hljs-number">0.6</span>) +
</span></span><span class='shcb-loc'><span>  geom_text(aes(x = <span class="hljs-number">0.61</span>, y = <span class="hljs-number">1</span>, label = <span class="hljs-string">"25% growth likely\nnext year"</span>),
</span></span><span class='shcb-loc'><span>    hjust = <span class="hljs-number">0</span>,
</span></span><span class='shcb-loc'><span>    color = <span class="hljs-string">"#F3790C"</span>
</span></span><span class='shcb-loc'><span>  )
</span></span><span class='shcb-loc'><span>
</span></span><span class='shcb-loc'><span>p7
</span></span><span class='shcb-loc'><span><span class="hljs-comment">####################################################################################################</span>
</span></span><span class='shcb-loc'><span>
</span></span><span class='shcb-loc'><span>
</span></span><span class='shcb-loc'><span><span class="hljs-comment">####################################################################################################</span>
</span></span><span class='shcb-loc'><span><span class="hljs-comment">## Simple bar chart</span>
</span></span><span class='shcb-loc'><span>b1 &lt;- ggplot(data, aes(x = cat, y = val)) +
</span></span><span class='shcb-loc'><span>  geom_bar(stat = <span class="hljs-string">"identity"</span>, fill = <span class="hljs-string">"gray80"</span>) +
</span></span><span class='shcb-loc'><span>  scale_y_continuous(labels = percent) +
</span></span><span class='shcb-loc'><span>  theme_wsj() +
</span></span><span class='shcb-loc'><span>  theme(
</span></span><span class='shcb-loc'><span>    panel.background = element_blank(),
</span></span><span class='shcb-loc'><span>    plot.background = element_blank(),
</span></span><span class='shcb-loc'><span>    axis.title = element_blank()
</span></span><span class='shcb-loc'><span>  )
</span></span><span class='shcb-loc'><span>
</span></span><span class='shcb-loc'><span>b1
</span></span><span class='shcb-loc'><span><span class="hljs-comment">####################################################################################################</span>
</span></span><span class='shcb-loc'><span>
</span></span><span class='shcb-loc'><span>
</span></span><span class='shcb-loc'><span><span class="hljs-comment">####################################################################################################</span>
</span></span><span class='shcb-loc'><span><span class="hljs-comment">## Ordered bar chart</span>
</span></span><span class='shcb-loc'><span>b2 &lt;- ggplot(data, aes(x = reorder(cat, desc(val)), y = val)) +
</span></span><span class='shcb-loc'><span>  geom_bar(stat = <span class="hljs-string">"identity"</span>, fill = <span class="hljs-string">"gray80"</span>) +
</span></span><span class='shcb-loc'><span>  scale_y_continuous(labels = percent) +
</span></span><span class='shcb-loc'><span>  theme_wsj() +
</span></span><span class='shcb-loc'><span>  theme(
</span></span><span class='shcb-loc'><span>    panel.background = element_blank(),
</span></span><span class='shcb-loc'><span>    plot.background = element_blank(),
</span></span><span class='shcb-loc'><span>    axis.title = element_blank()
</span></span><span class='shcb-loc'><span>  )
</span></span><span class='shcb-loc'><span>
</span></span><span class='shcb-loc'><span>b2
</span></span><span class='shcb-loc'><span><span class="hljs-comment">####################################################################################################</span>
</span></span><span class='shcb-loc'><span>
</span></span><span class='shcb-loc'><span>
</span></span><span class='shcb-loc'><span><span class="hljs-comment">####################################################################################################</span>
</span></span><span class='shcb-loc'><span><span class="hljs-comment">## Ordered bar chart with white gridlines</span>
</span></span><span class='shcb-loc'><span>b3 &lt;- ggplot(data, aes(x = reorder(cat, desc(val)), y = val)) +
</span></span><span class='shcb-loc'><span>  geom_bar(stat = <span class="hljs-string">"identity"</span>, fill = <span class="hljs-string">"gray80"</span>) +
</span></span><span class='shcb-loc'><span>  scale_y_continuous(labels = percent) +
</span></span><span class='shcb-loc'><span>  theme_wsj() +
</span></span><span class='shcb-loc'><span>  geom_hline(yintercept = <span class="hljs-number">1</span>:<span class="hljs-number">3</span> / <span class="hljs-number">10</span>, color = <span class="hljs-string">"white"</span>) +
</span></span><span class='shcb-loc'><span>  theme(
</span></span><span class='shcb-loc'><span>    panel.background = element_blank(),
</span></span><span class='shcb-loc'><span>    plot.background = element_blank(),
</span></span><span class='shcb-loc'><span>    axis.title = element_blank(),
</span></span><span class='shcb-loc'><span>    panel.grid.major.y = element_blank()
</span></span><span class='shcb-loc'><span>  )
</span></span><span class='shcb-loc'><span>
</span></span><span class='shcb-loc'><span>b3
</span></span><span class='shcb-loc'><span><span class="hljs-comment">####################################################################################################</span>
</span></span><span class='shcb-loc'><span>
</span></span><span class='shcb-loc'><span>
</span></span><span class='shcb-loc'><span><span class="hljs-comment">####################################################################################################</span>
</span></span><span class='shcb-loc'><span><span class="hljs-comment">## Bar charts with long labels</span>
</span></span><span class='shcb-loc'><span>b4 &lt;- ggplot(data, aes(x = reorder(long_cat, desc(val)), y = val)) +
</span></span><span class='shcb-loc'><span>  geom_bar(stat = <span class="hljs-string">"identity"</span>, fill = <span class="hljs-string">"gray80"</span>) +
</span></span><span class='shcb-loc'><span>  scale_y_continuous(labels = percent) +
</span></span><span class='shcb-loc'><span>  theme_wsj() +
</span></span><span class='shcb-loc'><span>  geom_hline(yintercept = <span class="hljs-number">1</span>:<span class="hljs-number">3</span> / <span class="hljs-number">10</span>, color = <span class="hljs-string">"white"</span>) +
</span></span><span class='shcb-loc'><span>  theme(
</span></span><span class='shcb-loc'><span>    panel.background = element_blank(),
</span></span><span class='shcb-loc'><span>    plot.background = element_blank(),
</span></span><span class='shcb-loc'><span>    axis.title = element_blank(),
</span></span><span class='shcb-loc'><span>    panel.grid.major.y = element_blank()
</span></span><span class='shcb-loc'><span>  )
</span></span><span class='shcb-loc'><span>
</span></span><span class='shcb-loc'><span>b4_i &lt;- b4 + theme(axis.text.x = element_text(angle = <span class="hljs-number">90</span>))
</span></span><span class='shcb-loc'><span>
</span></span><span class='shcb-loc'><span>b4_ii &lt;- b4 + theme(axis.text.x = element_text(angle = <span class="hljs-number">30</span>, hjust = <span class="hljs-number">1</span>))
</span></span><span class='shcb-loc'><span>
</span></span><span class='shcb-loc'><span>wrap_plots(b4_i, b4_ii, heights = <span class="hljs-number">1</span>)
</span></span><span class='shcb-loc'><span><span class="hljs-comment">####################################################################################################</span>
</span></span><span class='shcb-loc'><span>
</span></span><span class='shcb-loc'><span>
</span></span><span class='shcb-loc'><span><span class="hljs-comment">####################################################################################################</span>
</span></span><span class='shcb-loc'><span><span class="hljs-comment">## Bar charts with long labels flipped on the y-axis</span>
</span></span><span class='shcb-loc'><span>b5 &lt;- ggplot(data, aes(y = reorder(long_cat, val), x = val)) +
</span></span><span class='shcb-loc'><span>  geom_bar(stat = <span class="hljs-string">"identity"</span>, fill = <span class="hljs-string">"gray80"</span>) +
</span></span><span class='shcb-loc'><span>  scale_x_continuous(labels = percent) +
</span></span><span class='shcb-loc'><span>  theme_wsj() +
</span></span><span class='shcb-loc'><span>  geom_vline(xintercept = <span class="hljs-number">1</span>:<span class="hljs-number">3</span> / <span class="hljs-number">10</span>, color = <span class="hljs-string">"white"</span>) +
</span></span><span class='shcb-loc'><span>  theme(
</span></span><span class='shcb-loc'><span>    panel.background = element_blank(),
</span></span><span class='shcb-loc'><span>    plot.background = element_blank(),
</span></span><span class='shcb-loc'><span>    axis.title = element_blank(),
</span></span><span class='shcb-loc'><span>    panel.grid.major.y = element_blank()
</span></span><span class='shcb-loc'><span>  )
</span></span><span class='shcb-loc'><span>
</span></span><span class='shcb-loc'><span>b5
</span></span><span class='shcb-loc'><span><span class="hljs-comment">####################################################################################################</span>
</span></span><span class='shcb-loc'><span>
</span></span><span class='shcb-loc'><span>
</span></span><span class='shcb-loc'><span><span class="hljs-comment">####################################################################################################</span>
</span></span><span class='shcb-loc'><span><span class="hljs-comment">## Bar charts with long labels flipped on the y-axis with some data labels</span>
</span></span><span class='shcb-loc'><span>b6 &lt;- ggplot(data, aes(y = reorder(long_cat, val), x = val)) +
</span></span><span class='shcb-loc'><span>  geom_bar(stat = <span class="hljs-string">"identity"</span>, fill = <span class="hljs-string">"gray80"</span>) +
</span></span><span class='shcb-loc'><span>  scale_x_continuous(labels = percent) +
</span></span><span class='shcb-loc'><span>  theme_wsj() +
</span></span><span class='shcb-loc'><span>  geom_vline(xintercept = <span class="hljs-number">1</span>:<span class="hljs-number">3</span> / <span class="hljs-number">10</span>, color = <span class="hljs-string">"white"</span>) +
</span></span><span class='shcb-loc'><span>  geom_text(aes(label = ifelse(val &lt; <span class="hljs-number">0.1</span>, percent(val), <span class="hljs-string">""</span>)), hjust = -<span class="hljs-number">0.1</span>) +
</span></span><span class='shcb-loc'><span>  theme(
</span></span><span class='shcb-loc'><span>    panel.background = element_blank(),
</span></span><span class='shcb-loc'><span>    plot.background = element_blank(),
</span></span><span class='shcb-loc'><span>    axis.title = element_blank(),
</span></span><span class='shcb-loc'><span>    panel.grid.major.y = element_blank()
</span></span><span class='shcb-loc'><span>  )
</span></span><span class='shcb-loc'><span>
</span></span><span class='shcb-loc'><span>b6
</span></span><span class='shcb-loc'><span><span class="hljs-comment">####################################################################################################</span>
</span></span><span class='shcb-loc'><span>
</span></span><span class='shcb-loc'><span>
</span></span><span class='shcb-loc'><span><span class="hljs-comment">####################################################################################################</span>
</span></span><span class='shcb-loc'><span><span class="hljs-comment">## A waffle chart</span>
</span></span><span class='shcb-loc'><span>data2 &lt;- arrange(data, val)
</span></span><span class='shcb-loc'><span>parts_v &lt;- data2$val * <span class="hljs-number">100</span>
</span></span><span class='shcb-loc'><span>names(parts_v) &lt;- data2$cat
</span></span><span class='shcb-loc'><span>
</span></span><span class='shcb-loc'><span>w1 &lt;- waffle(parts = parts_v, rows = <span class="hljs-number">5</span>, legend_pos = <span class="hljs-string">"top"</span>, xlab = <span class="hljs-string">"1 square equals 1%"</span>, reverse = <span class="hljs-literal">TRUE</span>)
</span></span><span class='shcb-loc'><span>
</span></span><span class='shcb-loc'><span>w1 &lt;- w1 + guides(fill = guide_legend(
</span></span><span class='shcb-loc'><span>  nrow = <span class="hljs-number">1</span>,
</span></span><span class='shcb-loc'><span>  reverse = <span class="hljs-literal">TRUE</span>,
</span></span><span class='shcb-loc'><span>  label.position = <span class="hljs-string">"top"</span>
</span></span><span class='shcb-loc'><span>)) +
</span></span><span class='shcb-loc'><span>  theme(legend.spacing.x = unit(<span class="hljs-number">1.2</span>, <span class="hljs-string">"cm"</span>))
</span></span><span class='shcb-loc'><span>
</span></span><span class='shcb-loc'><span>w1
</span></span><span class='shcb-loc'><span><span class="hljs-comment">####################################################################################################</span>
</span></span><span class='shcb-loc'><span>
</span></span><span class='shcb-loc'><span>
</span></span><span class='shcb-loc'><span><span class="hljs-comment">####################################################################################################</span>
</span></span><span class='shcb-loc'><span><span class="hljs-comment">## A simple dot chart</span>
</span></span><span class='shcb-loc'><span>d1 &lt;- ggplot(data, aes(y = reorder(cat, val), x = val)) +
</span></span><span class='shcb-loc'><span>  geom_point(shape = <span class="hljs-number">21</span>, fill = <span class="hljs-string">"#F3790C"</span>, size = <span class="hljs-number">3</span>) +
</span></span><span class='shcb-loc'><span>  scale_x_continuous(labels = percent) +
</span></span><span class='shcb-loc'><span>  theme_wsj() +
</span></span><span class='shcb-loc'><span>  theme(
</span></span><span class='shcb-loc'><span>    panel.background = element_blank(),
</span></span><span class='shcb-loc'><span>    plot.background = element_blank(),
</span></span><span class='shcb-loc'><span>    axis.title = element_blank(),
</span></span><span class='shcb-loc'><span>    panel.grid.major.y = element_line(size = <span class="hljs-number">0.4</span>)
</span></span><span class='shcb-loc'><span>  )
</span></span><span class='shcb-loc'><span>
</span></span><span class='shcb-loc'><span>d1
</span></span><span class='shcb-loc'><span><span class="hljs-comment">####################################################################################################</span>
</span></span><span class='shcb-loc'><span>
</span></span><span class='shcb-loc'><span><span class="hljs-comment"># IT Diversity Silicon Valley  data</span>
</span></span><span class='shcb-loc'><span><span class="hljs-comment"># Is Silicon Valley Tech Diversity Possible Now?</span>
</span></span><span class='shcb-loc'><span><span class="hljs-comment"># Center for Employment Equity, University of Massachuset</span>
</span></span><span class='shcb-loc'><span><span class="hljs-comment"># https://github.com/cirlabs/Silicon-Valley-Diversity-Data</span>
</span></span><span class='shcb-loc'><span><span class="hljs-comment"># https://www.umass.edu/employmentequity/sites/default/files/CEE_Diversity%2Bin%2BSilicon%2BValley%2BTech.pdf</span>
</span></span><span class='shcb-loc'><span>
</span></span><span class='shcb-loc'><span>it_data_clean &lt;- read_csv(<span class="hljs-string">"tech_diversity_cleaned.csv"</span>)
</span></span><span class='shcb-loc'><span>
</span></span><span class='shcb-loc'><span>
</span></span><span class='shcb-loc'><span><span class="hljs-comment">####################################################################################################</span>
</span></span><span class='shcb-loc'><span><span class="hljs-comment">## Bar chart vs. pie chart comparing job categories of employees in Silicon Valley companies</span>
</span></span><span class='shcb-loc'><span>job_cat_total &lt;- it_data_clean %&gt;%
</span></span><span class='shcb-loc'><span>  group_by(job_category) %&gt;%
</span></span><span class='shcb-loc'><span>  summarize(total = sum(count))
</span></span><span class='shcb-loc'><span>
</span></span><span class='shcb-loc'><span>
</span></span><span class='shcb-loc'><span>job_cat_total &lt;- job_cat_total %&gt;%
</span></span><span class='shcb-loc'><span>  arrange(total) %&gt;%
</span></span><span class='shcb-loc'><span>  mutate(
</span></span><span class='shcb-loc'><span>    end_angle = <span class="hljs-number">2</span> * pi * cumsum(total) / sum(total), 
</span></span><span class='shcb-loc'><span>    start_angle = lag(end_angle, default = <span class="hljs-number">0</span>),
</span></span><span class='shcb-loc'><span>    mid_angle = <span class="hljs-number">0.5</span> * (start_angle + end_angle), 
</span></span><span class='shcb-loc'><span>    hjust = ifelse(mid_angle &gt; pi, <span class="hljs-number">1</span>, <span class="hljs-number">0</span>),
</span></span><span class='shcb-loc'><span>    vjust = ifelse(mid_angle &lt; pi / <span class="hljs-number">2</span> | mid_angle &gt; <span class="hljs-number">3</span> * pi / <span class="hljs-number">2</span>, <span class="hljs-number">0</span>, <span class="hljs-number">1</span>)
</span></span><span class='shcb-loc'><span>  ) %&gt;%
</span></span><span class='shcb-loc'><span>  mutate(pct = total / sum(total))
</span></span><span class='shcb-loc'><span>
</span></span><span class='shcb-loc'><span>rpie &lt;- <span class="hljs-number">1</span>
</span></span><span class='shcb-loc'><span>rlabel_out &lt;- <span class="hljs-number">1.05</span> * rpie
</span></span><span class='shcb-loc'><span>rlabel_in &lt;- <span class="hljs-number">0.6</span> * rpie
</span></span><span class='shcb-loc'><span>
</span></span><span class='shcb-loc'><span>
</span></span><span class='shcb-loc'><span>p1_it &lt;- ggplot(job_cat_total) +
</span></span><span class='shcb-loc'><span>  geom_arc_bar(
</span></span><span class='shcb-loc'><span>    aes(
</span></span><span class='shcb-loc'><span>      x0 = <span class="hljs-number">0</span>, y0 = <span class="hljs-number">0</span>, r0 = <span class="hljs-number">0</span>, r = rpie,
</span></span><span class='shcb-loc'><span>      start = start_angle, end = end_angle
</span></span><span class='shcb-loc'><span>    ),
</span></span><span class='shcb-loc'><span>    fill = <span class="hljs-string">"grey90"</span>,
</span></span><span class='shcb-loc'><span>    color = <span class="hljs-string">"white"</span>
</span></span><span class='shcb-loc'><span>  ) +
</span></span><span class='shcb-loc'><span>  coord_fixed(clip = <span class="hljs-string">"off"</span>) +
</span></span><span class='shcb-loc'><span>  geom_text(
</span></span><span class='shcb-loc'><span>    aes(
</span></span><span class='shcb-loc'><span>      x = rlabel_out * sin(mid_angle),
</span></span><span class='shcb-loc'><span>      y = rlabel_out * cos(mid_angle),
</span></span><span class='shcb-loc'><span>      label = job_category,
</span></span><span class='shcb-loc'><span>      hjust = hjust, vjust = vjust
</span></span><span class='shcb-loc'><span>    ),
</span></span><span class='shcb-loc'><span>    size = <span class="hljs-number">10</span> / .pt
</span></span><span class='shcb-loc'><span>  ) +
</span></span><span class='shcb-loc'><span>  theme_void() +
</span></span><span class='shcb-loc'><span>  scale_x_continuous(
</span></span><span class='shcb-loc'><span>    name = <span class="hljs-literal">NULL</span>,
</span></span><span class='shcb-loc'><span>    limits = c(-<span class="hljs-number">1.5</span>, <span class="hljs-number">1.4</span>),
</span></span><span class='shcb-loc'><span>    expand = c(<span class="hljs-number">0</span>, <span class="hljs-number">0</span>)
</span></span><span class='shcb-loc'><span>  ) +
</span></span><span class='shcb-loc'><span>  scale_y_continuous(
</span></span><span class='shcb-loc'><span>    name = <span class="hljs-literal">NULL</span>,
</span></span><span class='shcb-loc'><span>    limits = c(-<span class="hljs-number">1.2</span>, <span class="hljs-number">1.3</span>),
</span></span><span class='shcb-loc'><span>    expand = c(<span class="hljs-number">0</span>, <span class="hljs-number">0</span>)
</span></span><span class='shcb-loc'><span>  )
</span></span><span class='shcb-loc'><span>
</span></span><span class='shcb-loc'><span>
</span></span><span class='shcb-loc'><span>
</span></span><span class='shcb-loc'><span>
</span></span><span class='shcb-loc'><span>b1_it &lt;- ggplot(job_cat_total, aes(y = reorder(job_category, pct), x = pct)) +
</span></span><span class='shcb-loc'><span>  geom_bar(stat = <span class="hljs-string">"identity"</span>, fill = <span class="hljs-string">"gray80"</span>) +
</span></span><span class='shcb-loc'><span>  scale_x_continuous(labels = percent, limits = c(<span class="hljs-number">0</span>, <span class="hljs-number">.6</span>)) +
</span></span><span class='shcb-loc'><span>  theme_wsj() +
</span></span><span class='shcb-loc'><span>  geom_vline(xintercept = <span class="hljs-number">1</span>:<span class="hljs-number">6</span> / <span class="hljs-number">10</span>, color = <span class="hljs-string">"white"</span>) +
</span></span><span class='shcb-loc'><span>  <span class="hljs-comment"># geom_text(aes(label = ifelse(val &lt; 0.1, percent(val), "")), hjust = -0.1) +</span>
</span></span><span class='shcb-loc'><span>  theme(
</span></span><span class='shcb-loc'><span>    panel.background = element_blank(),
</span></span><span class='shcb-loc'><span>    plot.background = element_blank(),
</span></span><span class='shcb-loc'><span>    axis.title = element_blank(),
</span></span><span class='shcb-loc'><span>    panel.grid.major.y = element_blank(),
</span></span><span class='shcb-loc'><span>    plot.title = element_text(family = <span class="hljs-string">""</span>, size = <span class="hljs-number">2</span>)
</span></span><span class='shcb-loc'><span>  )
</span></span><span class='shcb-loc'><span>
</span></span><span class='shcb-loc'><span>
</span></span><span class='shcb-loc'><span>wrap_plots(p1_it, b1_it, heights = <span class="hljs-number">1</span>) +
</span></span><span class='shcb-loc'><span>  plot_annotation(
</span></span><span class='shcb-loc'><span>    caption = <span class="hljs-string">"Source: Reveal, https://www.revealnews.org/topic/silicon-valley-diversity/"</span>,
</span></span><span class='shcb-loc'><span>    theme = theme(
</span></span><span class='shcb-loc'><span>      plot.caption = element_text(family = <span class="hljs-string">""</span>, size = <span class="hljs-number">9</span>, hjust = <span class="hljs-number">0</span>),
</span></span><span class='shcb-loc'><span>      plot.caption.position = <span class="hljs-string">"plot"</span>
</span></span><span class='shcb-loc'><span>    )
</span></span><span class='shcb-loc'><span>  )
</span></span><span class='shcb-loc'><span><span class="hljs-comment">####################################################################################################</span>
</span></span><span class='shcb-loc'><span>
</span></span><span class='shcb-loc'><span>
</span></span><span class='shcb-loc'><span><span class="hljs-comment">####################################################################################################</span>
</span></span><span class='shcb-loc'><span><span class="hljs-comment">## Bar chart vs. pie chart comparing ethnicities/races of employees in Silicon Valley companies</span>
</span></span><span class='shcb-loc'><span>race_total &lt;- it_data_clean %&gt;%
</span></span><span class='shcb-loc'><span>  group_by(race_short) %&gt;%
</span></span><span class='shcb-loc'><span>  summarize(total = sum(count)) %&gt;%
</span></span><span class='shcb-loc'><span>  ungroup()
</span></span><span class='shcb-loc'><span>
</span></span><span class='shcb-loc'><span>
</span></span><span class='shcb-loc'><span>race_total &lt;- race_total %&gt;%
</span></span><span class='shcb-loc'><span>  arrange(total) %&gt;%
</span></span><span class='shcb-loc'><span>  mutate(
</span></span><span class='shcb-loc'><span>    end_angle = <span class="hljs-number">2</span> * pi * cumsum(total) / sum(total), 
</span></span><span class='shcb-loc'><span>    start_angle = lag(end_angle, default = <span class="hljs-number">0</span>), 
</span></span><span class='shcb-loc'><span>    mid_angle = <span class="hljs-number">0.5</span> * (start_angle + end_angle), 
</span></span><span class='shcb-loc'><span>    hjust = ifelse(mid_angle &gt; pi, <span class="hljs-number">1</span>, <span class="hljs-number">0</span>),
</span></span><span class='shcb-loc'><span>    vjust = ifelse(mid_angle &lt; pi / <span class="hljs-number">2</span> | mid_angle &gt; <span class="hljs-number">3</span> * pi / <span class="hljs-number">2</span>, <span class="hljs-number">0</span>, <span class="hljs-number">1</span>)
</span></span><span class='shcb-loc'><span>  ) %&gt;%
</span></span><span class='shcb-loc'><span>  mutate(pct = total / sum(total))
</span></span><span class='shcb-loc'><span>
</span></span><span class='shcb-loc'><span>rpie &lt;- <span class="hljs-number">1</span>
</span></span><span class='shcb-loc'><span>rlabel_out &lt;- <span class="hljs-number">1.05</span> * rpie
</span></span><span class='shcb-loc'><span>rlabel_in &lt;- <span class="hljs-number">0.6</span> * rpie
</span></span><span class='shcb-loc'><span>
</span></span><span class='shcb-loc'><span>
</span></span><span class='shcb-loc'><span>p2_it &lt;- ggplot(race_total) +
</span></span><span class='shcb-loc'><span>  geom_arc_bar(
</span></span><span class='shcb-loc'><span>    aes(
</span></span><span class='shcb-loc'><span>      x0 = <span class="hljs-number">0</span>, y0 = <span class="hljs-number">0</span>, r0 = <span class="hljs-number">0</span>, r = rpie,
</span></span><span class='shcb-loc'><span>      start = start_angle, end = end_angle
</span></span><span class='shcb-loc'><span>    ),
</span></span><span class='shcb-loc'><span>    fill = <span class="hljs-string">"grey90"</span>,
</span></span><span class='shcb-loc'><span>    color = <span class="hljs-string">"white"</span>
</span></span><span class='shcb-loc'><span>  ) +
</span></span><span class='shcb-loc'><span>  coord_fixed(clip = <span class="hljs-string">"off"</span>) +
</span></span><span class='shcb-loc'><span>  geom_text(
</span></span><span class='shcb-loc'><span>    aes(
</span></span><span class='shcb-loc'><span>      x = rlabel_out * sin(mid_angle),
</span></span><span class='shcb-loc'><span>      y = rlabel_out * cos(mid_angle),
</span></span><span class='shcb-loc'><span>      label = race_short,
</span></span><span class='shcb-loc'><span>      hjust = hjust, vjust = vjust
</span></span><span class='shcb-loc'><span>    ),
</span></span><span class='shcb-loc'><span>    size = <span class="hljs-number">10</span> / .pt
</span></span><span class='shcb-loc'><span>  ) +
</span></span><span class='shcb-loc'><span>  theme_void() +
</span></span><span class='shcb-loc'><span>  scale_x_continuous(
</span></span><span class='shcb-loc'><span>    name = <span class="hljs-literal">NULL</span>,
</span></span><span class='shcb-loc'><span>    limits = c(-<span class="hljs-number">1.5</span>, <span class="hljs-number">1.4</span>),
</span></span><span class='shcb-loc'><span>    expand = c(<span class="hljs-number">0</span>, <span class="hljs-number">0</span>)
</span></span><span class='shcb-loc'><span>  ) +
</span></span><span class='shcb-loc'><span>  scale_y_continuous(
</span></span><span class='shcb-loc'><span>    name = <span class="hljs-literal">NULL</span>,
</span></span><span class='shcb-loc'><span>    limits = c(-<span class="hljs-number">1.2</span>, <span class="hljs-number">1.3</span>),
</span></span><span class='shcb-loc'><span>    expand = c(<span class="hljs-number">0</span>, <span class="hljs-number">0</span>)
</span></span><span class='shcb-loc'><span>  )
</span></span><span class='shcb-loc'><span>
</span></span><span class='shcb-loc'><span>
</span></span><span class='shcb-loc'><span>
</span></span><span class='shcb-loc'><span>
</span></span><span class='shcb-loc'><span>b2_it &lt;- ggplot(race_total, aes(y = reorder(race_short, pct), x = pct)) +
</span></span><span class='shcb-loc'><span>  geom_bar(stat = <span class="hljs-string">"identity"</span>, fill = <span class="hljs-string">"gray80"</span>) +
</span></span><span class='shcb-loc'><span>  scale_x_continuous(labels = percent, limits = c(<span class="hljs-number">0</span>, <span class="hljs-number">.6</span>)) +
</span></span><span class='shcb-loc'><span>  theme_wsj() +
</span></span><span class='shcb-loc'><span>  geom_vline(xintercept = <span class="hljs-number">1</span>:<span class="hljs-number">6</span> / <span class="hljs-number">10</span>, color = <span class="hljs-string">"white"</span>) +
</span></span><span class='shcb-loc'><span>  <span class="hljs-comment"># geom_text(aes(label = ifelse(val &lt; 0.1, percent(val), "")), hjust = -0.1) +</span>
</span></span><span class='shcb-loc'><span>  theme(
</span></span><span class='shcb-loc'><span>    panel.background = element_blank(),
</span></span><span class='shcb-loc'><span>    plot.background = element_blank(),
</span></span><span class='shcb-loc'><span>    axis.title = element_blank(),
</span></span><span class='shcb-loc'><span>    panel.grid.major.y = element_blank(),
</span></span><span class='shcb-loc'><span>    plot.title = element_text(family = <span class="hljs-string">""</span>, size = <span class="hljs-number">2</span>)
</span></span><span class='shcb-loc'><span>  )
</span></span><span class='shcb-loc'><span>
</span></span><span class='shcb-loc'><span>wrap_plots(p2_it, b2_it, heights = <span class="hljs-number">1</span>) +
</span></span><span class='shcb-loc'><span>  plot_annotation(
</span></span><span class='shcb-loc'><span>    caption = <span class="hljs-string">"Source: Reveal, https://www.revealnews.org/topic/silicon-valley-diversity/"</span>,
</span></span><span class='shcb-loc'><span>    theme = theme(
</span></span><span class='shcb-loc'><span>      plot.caption = element_text(family = <span class="hljs-string">""</span>, size = <span class="hljs-number">9</span>, hjust = <span class="hljs-number">0</span>),
</span></span><span class='shcb-loc'><span>      plot.caption.position = <span class="hljs-string">"plot"</span>
</span></span><span class='shcb-loc'><span>    )
</span></span><span class='shcb-loc'><span>  )
</span></span><span class='shcb-loc'><span><span class="hljs-comment">####################################################################################################</span>
</span></span><span class='shcb-loc'><span>
</span></span><span class='shcb-loc'><span>
</span></span><span class='shcb-loc'><span>
</span></span><span class='shcb-loc'><span><span class="hljs-comment">####################################################################################################</span>
</span></span><span class='shcb-loc'><span><span class="hljs-comment">## A pie chart showing ethnicities/races and job categories of employees in Silicon Valley companies</span>
</span></span><span class='shcb-loc'><span>race_job_cat_total &lt;- it_data_clean %&gt;%
</span></span><span class='shcb-loc'><span>  group_by(race_short, job_category) %&gt;%
</span></span><span class='shcb-loc'><span>  summarize(total = sum(count)) %&gt;%
</span></span><span class='shcb-loc'><span>  ungroup() %&gt;%
</span></span><span class='shcb-loc'><span>  mutate(race_job_cat = paste(race_short, job_category, sep = <span class="hljs-string">"-"</span>)) %&gt;%
</span></span><span class='shcb-loc'><span>  mutate(pct = total / sum(total))
</span></span><span class='shcb-loc'><span>
</span></span><span class='shcb-loc'><span>
</span></span><span class='shcb-loc'><span>race_job_cat_total &lt;- race_job_cat_total %&gt;%
</span></span><span class='shcb-loc'><span>  arrange(race_short, total) %&gt;%
</span></span><span class='shcb-loc'><span>  mutate(
</span></span><span class='shcb-loc'><span>    count_total = sum(total),
</span></span><span class='shcb-loc'><span>    end_angle = <span class="hljs-number">2</span> * pi * cumsum(total) / count_total, <span class="hljs-comment"># ending angle for each pie slice</span>
</span></span><span class='shcb-loc'><span>    start_angle = lag(end_angle, default = <span class="hljs-number">0</span>), <span class="hljs-comment"># starting angle for each pie slice</span>
</span></span><span class='shcb-loc'><span>    mid_angle = <span class="hljs-number">0.5</span> * (start_angle + end_angle), <span class="hljs-comment"># middle of each pie slice, for the text label</span>
</span></span><span class='shcb-loc'><span>    hjust = ifelse(mid_angle &gt; pi, <span class="hljs-number">1</span>, <span class="hljs-number">0</span>),
</span></span><span class='shcb-loc'><span>    vjust = ifelse(mid_angle &lt; pi / <span class="hljs-number">2</span> | mid_angle &gt; <span class="hljs-number">3</span> * pi / <span class="hljs-number">2</span>, <span class="hljs-number">0</span>, <span class="hljs-number">1</span>),
</span></span><span class='shcb-loc'><span>    type = job_category,
</span></span><span class='shcb-loc'><span>    label = race_job_cat
</span></span><span class='shcb-loc'><span>  )
</span></span><span class='shcb-loc'><span>
</span></span><span class='shcb-loc'><span>slice_colors &lt;- c(
</span></span><span class='shcb-loc'><span>  brewer.pal(<span class="hljs-number">5</span>, <span class="hljs-string">"Blues"</span>)[-<span class="hljs-number">1</span>],
</span></span><span class='shcb-loc'><span>  brewer.pal(<span class="hljs-number">5</span>, <span class="hljs-string">"Greens"</span>)[-<span class="hljs-number">1</span>],
</span></span><span class='shcb-loc'><span>  brewer.pal(<span class="hljs-number">5</span>, <span class="hljs-string">"Oranges"</span>)[-<span class="hljs-number">1</span>],
</span></span><span class='shcb-loc'><span>  brewer.pal(<span class="hljs-number">5</span>, <span class="hljs-string">"Reds"</span>)[-<span class="hljs-number">1</span>],
</span></span><span class='shcb-loc'><span>  brewer.pal(<span class="hljs-number">5</span>, <span class="hljs-string">"Purples"</span>)[-<span class="hljs-number">1</span>]
</span></span><span class='shcb-loc'><span>)
</span></span><span class='shcb-loc'><span>
</span></span><span class='shcb-loc'><span>names(slice_colors) &lt;- race_job_cat_total$race_job_cat
</span></span><span class='shcb-loc'><span>
</span></span><span class='shcb-loc'><span>rpie &lt;- <span class="hljs-number">1</span>
</span></span><span class='shcb-loc'><span>rpie1 &lt;- <span class="hljs-number">0</span>
</span></span><span class='shcb-loc'><span>rpie2 &lt;- <span class="hljs-number">1</span>
</span></span><span class='shcb-loc'><span>rlabel &lt;- <span class="hljs-number">1.02</span> * rpie
</span></span><span class='shcb-loc'><span>
</span></span><span class='shcb-loc'><span>job_cat_race_nested_pie &lt;- ggplot() +
</span></span><span class='shcb-loc'><span>  geom_arc_bar(
</span></span><span class='shcb-loc'><span>    data = race_job_cat_total,
</span></span><span class='shcb-loc'><span>    aes(
</span></span><span class='shcb-loc'><span>      x0 = <span class="hljs-number">0</span>, y0 = <span class="hljs-number">0</span>, r0 = rpie1, r = rpie2,
</span></span><span class='shcb-loc'><span>      start = start_angle, end = end_angle, fill = race_job_cat
</span></span><span class='shcb-loc'><span>    ),
</span></span><span class='shcb-loc'><span>    color = <span class="hljs-string">"white"</span>, size = <span class="hljs-number">0.5</span>
</span></span><span class='shcb-loc'><span>  ) +
</span></span><span class='shcb-loc'><span>  geom_text(
</span></span><span class='shcb-loc'><span>    data = race_job_cat_total,
</span></span><span class='shcb-loc'><span>    aes(
</span></span><span class='shcb-loc'><span>      x = rlabel * sin(mid_angle),
</span></span><span class='shcb-loc'><span>      y = rlabel * cos(mid_angle),
</span></span><span class='shcb-loc'><span>      label = race_job_cat,
</span></span><span class='shcb-loc'><span>      hjust = hjust, vjust = vjust
</span></span><span class='shcb-loc'><span>    ),
</span></span><span class='shcb-loc'><span>    size = <span class="hljs-number">12</span> / .pt
</span></span><span class='shcb-loc'><span>  ) +
</span></span><span class='shcb-loc'><span>  geom_text(
</span></span><span class='shcb-loc'><span>    data = race_job_cat_total,
</span></span><span class='shcb-loc'><span>    aes(
</span></span><span class='shcb-loc'><span>      x = <span class="hljs-number">0.6</span> * sin(mid_angle),
</span></span><span class='shcb-loc'><span>      y = <span class="hljs-number">0.6</span> * cos(mid_angle),
</span></span><span class='shcb-loc'><span>      label = percent(pct, accuracy = <span class="hljs-number">1</span>)
</span></span><span class='shcb-loc'><span>    ),
</span></span><span class='shcb-loc'><span>    size = <span class="hljs-number">10</span> / .pt,
</span></span><span class='shcb-loc'><span>    hjust = <span class="hljs-number">0.5</span>, vjust = <span class="hljs-number">0.5</span>
</span></span><span class='shcb-loc'><span>  ) +
</span></span><span class='shcb-loc'><span>  coord_fixed(clip = <span class="hljs-string">"off"</span>) +
</span></span><span class='shcb-loc'><span>  scale_x_continuous(
</span></span><span class='shcb-loc'><span>    limits = c(-<span class="hljs-number">1.5</span>, <span class="hljs-number">1.8</span>), expand = c(<span class="hljs-number">0</span>, <span class="hljs-number">0</span>), name = <span class="hljs-string">""</span>, breaks = <span class="hljs-literal">NULL</span>, labels = <span class="hljs-literal">NULL</span>
</span></span><span class='shcb-loc'><span>  ) +
</span></span><span class='shcb-loc'><span>  scale_y_continuous(
</span></span><span class='shcb-loc'><span>    limits = c(-<span class="hljs-number">1.15</span>, <span class="hljs-number">1.15</span>), expand = c(<span class="hljs-number">0</span>, <span class="hljs-number">0</span>), name = <span class="hljs-string">""</span>, breaks = <span class="hljs-literal">NULL</span>, labels = <span class="hljs-literal">NULL</span>
</span></span><span class='shcb-loc'><span>  ) +
</span></span><span class='shcb-loc'><span>  scale_fill_manual(
</span></span><span class='shcb-loc'><span>    values = slice_colors
</span></span><span class='shcb-loc'><span>  ) +
</span></span><span class='shcb-loc'><span>  labs(
</span></span><span class='shcb-loc'><span>    title = <span class="hljs-string">"Colorful, bad design"</span>,
</span></span><span class='shcb-loc'><span>    caption = <span class="hljs-string">"Source: Reveal, https://www.revealnews.org/topic/silicon-valley-diversity/"</span>
</span></span><span class='shcb-loc'><span>  ) +
</span></span><span class='shcb-loc'><span>  theme(
</span></span><span class='shcb-loc'><span>    legend.position = <span class="hljs-string">"none"</span>,
</span></span><span class='shcb-loc'><span>    panel.background = element_blank(),
</span></span><span class='shcb-loc'><span>    plot.background = element_blank(),
</span></span><span class='shcb-loc'><span>    plot.caption = element_text(family = <span class="hljs-string">""</span>, size = <span class="hljs-number">8</span>, hjust = <span class="hljs-number">0</span>),
</span></span><span class='shcb-loc'><span>    plot.caption.position = <span class="hljs-string">"plot"</span>,
</span></span><span class='shcb-loc'><span>    plot.title.position = <span class="hljs-string">"plot"</span>,
</span></span><span class='shcb-loc'><span>    plot.title = element_text(face = <span class="hljs-string">"bold"</span>, size = <span class="hljs-number">12</span>)
</span></span><span class='shcb-loc'><span>  )
</span></span><span class='shcb-loc'><span>
</span></span><span class='shcb-loc'><span>job_cat_race_nested_pie
</span></span><span class='shcb-loc'><span><span class="hljs-comment">####################################################################################################</span>
</span></span><span class='shcb-loc'><span>
</span></span><span class='shcb-loc'><span>
</span></span><span class='shcb-loc'><span><span class="hljs-comment">####################################################################################################</span>
</span></span><span class='shcb-loc'><span><span class="hljs-comment">## A bar chart showing ethnicities/races and job categories of employees in Silicon Valley companies</span>
</span></span><span class='shcb-loc'><span>
</span></span><span class='shcb-loc'><span>ordered_lvls_job_cat_race_combined &lt;- c(<span class="hljs-string">"White-Executives"</span>, <span class="hljs-string">"White-Managers"</span>, <span class="hljs-string">"White-Professionals"</span>, <span class="hljs-string">"White-Other workers"</span>, <span class="hljs-string">"Asian-Executives"</span>, <span class="hljs-string">"Asian-Managers"</span>, <span class="hljs-string">"Asian-Professionals"</span>, <span class="hljs-string">"Asian-Other workers"</span>, <span class="hljs-string">"Latinx-Executives"</span>, <span class="hljs-string">"Latinx-Managers"</span>, <span class="hljs-string">"Latinx-Professionals"</span>, <span class="hljs-string">"Latinx-Other workers"</span>, <span class="hljs-string">"Black-Executives"</span>, <span class="hljs-string">"Black-Managers"</span>, <span class="hljs-string">"Black-Professionals"</span>, <span class="hljs-string">"Black-Other workers"</span>, <span class="hljs-string">"Other-Executives"</span>, <span class="hljs-string">"Other-Managers"</span>, <span class="hljs-string">"Other-Professionals"</span>, <span class="hljs-string">"Other-Other workers"</span>)
</span></span><span class='shcb-loc'><span>
</span></span><span class='shcb-loc'><span>job_cat_race_nested_bar &lt;- ggplot(race_job_cat_total, aes(
</span></span><span class='shcb-loc'><span>  y = factor(race_job_cat, levels = rev(ordered_lvls_job_cat_race_combined)),
</span></span><span class='shcb-loc'><span>  x = pct,
</span></span><span class='shcb-loc'><span>  group = race_short,
</span></span><span class='shcb-loc'><span>  fill = job_category
</span></span><span class='shcb-loc'><span>)) +
</span></span><span class='shcb-loc'><span>  geom_bar(stat = <span class="hljs-string">"identity"</span>, position = position_dodge(<span class="hljs-number">0.5</span>)) +
</span></span><span class='shcb-loc'><span>  scale_fill_brewer(type = <span class="hljs-string">"qual"</span>, palette = <span class="hljs-string">"Set2"</span>) +
</span></span><span class='shcb-loc'><span>  scale_x_continuous(labels = percent) +
</span></span><span class='shcb-loc'><span>  scale_y_discrete(expand = c(<span class="hljs-number">0</span>, <span class="hljs-number">0</span>)) +
</span></span><span class='shcb-loc'><span>  geom_vline(xintercept = seq(from = <span class="hljs-number">0</span>, to = <span class="hljs-number">.3</span>, by = <span class="hljs-number">0.05</span>), color = <span class="hljs-string">"white"</span>) +
</span></span><span class='shcb-loc'><span>  theme_minimal() +
</span></span><span class='shcb-loc'><span>  labs(
</span></span><span class='shcb-loc'><span>    title = <span class="hljs-string">"Colorful, bad design"</span>,
</span></span><span class='shcb-loc'><span>    caption = <span class="hljs-string">"Source: Reveal, https://www.revealnews.org/topic/silicon-valley-diversity/"</span>
</span></span><span class='shcb-loc'><span>  ) +
</span></span><span class='shcb-loc'><span>  theme(
</span></span><span class='shcb-loc'><span>    legend.position = <span class="hljs-string">"top"</span>,
</span></span><span class='shcb-loc'><span>    legend.title = element_blank(),
</span></span><span class='shcb-loc'><span>    panel.grid = element_blank(),
</span></span><span class='shcb-loc'><span>    axis.text = element_text(face = <span class="hljs-string">"bold"</span>, size = <span class="hljs-number">12</span>),
</span></span><span class='shcb-loc'><span>    axis.title = element_blank(),
</span></span><span class='shcb-loc'><span>    axis.text.y = element_text(hjust = <span class="hljs-number">1</span>),
</span></span><span class='shcb-loc'><span>    axis.text.x = element_text(hjust = <span class="hljs-number">0.2</span>),
</span></span><span class='shcb-loc'><span>    axis.line = element_line(),
</span></span><span class='shcb-loc'><span>    axis.line.y = element_blank(),
</span></span><span class='shcb-loc'><span>    axis.ticks.y = element_blank(),
</span></span><span class='shcb-loc'><span>    axis.ticks.x = element_line(colour = <span class="hljs-literal">NULL</span>),
</span></span><span class='shcb-loc'><span>    plot.caption = element_text(family = <span class="hljs-string">""</span>, size = <span class="hljs-number">8</span>, hjust = <span class="hljs-number">0</span>),
</span></span><span class='shcb-loc'><span>    plot.caption.position = <span class="hljs-string">"plot"</span>,
</span></span><span class='shcb-loc'><span>    plot.title.position = <span class="hljs-string">"plot"</span>,
</span></span><span class='shcb-loc'><span>    plot.title = element_text(face = <span class="hljs-string">"bold"</span>, size = <span class="hljs-number">12</span>)
</span></span><span class='shcb-loc'><span>  ) +
</span></span><span class='shcb-loc'><span>  guides(fill = guide_legend(nrow = <span class="hljs-number">1</span>))
</span></span><span class='shcb-loc'><span>
</span></span><span class='shcb-loc'><span>job_cat_race_nested_bar
</span></span><span class='shcb-loc'><span><span class="hljs-comment">####################################################################################################</span>
</span></span><span class='shcb-loc'><span>
</span></span><span class='shcb-loc'><span>
</span></span><span class='shcb-loc'><span><span class="hljs-comment">####################################################################################################</span>
</span></span><span class='shcb-loc'><span><span class="hljs-comment">## Pie chart panels showing job categories of employees in Silicon Valley companies by ethnicity/race</span>
</span></span><span class='shcb-loc'><span>
</span></span><span class='shcb-loc'><span>race_job_cat_total &lt;- it_data_clean %&gt;%
</span></span><span class='shcb-loc'><span>  group_by(race_short, job_category) %&gt;%
</span></span><span class='shcb-loc'><span>  summarize(total = sum(count)) %&gt;%
</span></span><span class='shcb-loc'><span>  mutate(pct = total / sum(total))
</span></span><span class='shcb-loc'><span>
</span></span><span class='shcb-loc'><span>
</span></span><span class='shcb-loc'><span>race_job_cat_total &lt;- race_job_cat_total %&gt;%
</span></span><span class='shcb-loc'><span>  arrange(race_short, total) %&gt;%
</span></span><span class='shcb-loc'><span>  group_by(race_short) %&gt;%
</span></span><span class='shcb-loc'><span>  mutate(
</span></span><span class='shcb-loc'><span>    count_total = sum(total),
</span></span><span class='shcb-loc'><span>    end_angle = <span class="hljs-number">2</span> * pi * cumsum(total) / count_total, 
</span></span><span class='shcb-loc'><span>    start_angle = lag(end_angle, default = <span class="hljs-number">0</span>), 
</span></span><span class='shcb-loc'><span>    mid_angle = <span class="hljs-number">0.5</span> * (start_angle + end_angle), 
</span></span><span class='shcb-loc'><span>    hjust = ifelse(mid_angle &gt; pi, <span class="hljs-number">1</span>, <span class="hljs-number">0</span>),
</span></span><span class='shcb-loc'><span>    vjust = ifelse(mid_angle &lt; pi / <span class="hljs-number">2</span> | mid_angle &gt; <span class="hljs-number">3</span> * pi / <span class="hljs-number">2</span>, <span class="hljs-number">0</span>, <span class="hljs-number">1</span>),
</span></span><span class='shcb-loc'><span>    type = job_category,
</span></span><span class='shcb-loc'><span>    label = job_category
</span></span><span class='shcb-loc'><span>  )
</span></span><span class='shcb-loc'><span>
</span></span><span class='shcb-loc'><span>race_job_cat_total &lt;- ungroup(race_job_cat_total) %&gt;%
</span></span><span class='shcb-loc'><span>  mutate(race_short = factor(race_short, ordered_lvls_race))
</span></span><span class='shcb-loc'><span>
</span></span><span class='shcb-loc'><span>
</span></span><span class='shcb-loc'><span>
</span></span><span class='shcb-loc'><span>rpie &lt;- <span class="hljs-number">1</span>
</span></span><span class='shcb-loc'><span>rpie1 &lt;- <span class="hljs-number">0</span>
</span></span><span class='shcb-loc'><span>rpie2 &lt;- <span class="hljs-number">1</span>
</span></span><span class='shcb-loc'><span>rlabel &lt;- <span class="hljs-number">1.02</span> * rpie
</span></span><span class='shcb-loc'><span>
</span></span><span class='shcb-loc'><span>job_cat_race_panel_pie_v1 &lt;- ggplot() +
</span></span><span class='shcb-loc'><span>  geom_arc_bar(
</span></span><span class='shcb-loc'><span>    data = race_job_cat_total,
</span></span><span class='shcb-loc'><span>    aes(
</span></span><span class='shcb-loc'><span>      x0 = <span class="hljs-number">0</span>, y0 = <span class="hljs-number">0</span>, r0 = rpie1, r = rpie2,
</span></span><span class='shcb-loc'><span>      start = start_angle, end = end_angle, fill = job_category
</span></span><span class='shcb-loc'><span>    ),
</span></span><span class='shcb-loc'><span>    color = <span class="hljs-string">"white"</span>, size = <span class="hljs-number">0.5</span>
</span></span><span class='shcb-loc'><span>  ) +
</span></span><span class='shcb-loc'><span>  facet_wrap(~race_short) +
</span></span><span class='shcb-loc'><span>  geom_text(
</span></span><span class='shcb-loc'><span>    data = race_job_cat_total,
</span></span><span class='shcb-loc'><span>    aes(
</span></span><span class='shcb-loc'><span>      x = rlabel * sin(mid_angle),
</span></span><span class='shcb-loc'><span>      y = rlabel * cos(mid_angle),
</span></span><span class='shcb-loc'><span>      label = ifelse(job_category == <span class="hljs-string">"Other workers"</span>, <span class="hljs-string">"Other\nworkers"</span>, job_category),
</span></span><span class='shcb-loc'><span>      hjust = hjust, vjust = vjust
</span></span><span class='shcb-loc'><span>    ),
</span></span><span class='shcb-loc'><span>    <span class="hljs-comment"># family = dviz_font_family,</span>
</span></span><span class='shcb-loc'><span>    size = <span class="hljs-number">10</span> / .pt
</span></span><span class='shcb-loc'><span>  ) +
</span></span><span class='shcb-loc'><span>  geom_text(
</span></span><span class='shcb-loc'><span>    data = race_job_cat_total,
</span></span><span class='shcb-loc'><span>    aes(
</span></span><span class='shcb-loc'><span>      x = <span class="hljs-number">0.6</span> * sin(mid_angle),
</span></span><span class='shcb-loc'><span>      y = <span class="hljs-number">0.6</span> * cos(mid_angle),
</span></span><span class='shcb-loc'><span>      label = percent(pct, accuracy = <span class="hljs-number">1</span>)
</span></span><span class='shcb-loc'><span>    ),
</span></span><span class='shcb-loc'><span>    size = <span class="hljs-number">10</span> / .pt,
</span></span><span class='shcb-loc'><span>    hjust = <span class="hljs-number">0.5</span>, vjust = <span class="hljs-number">0.5</span>
</span></span><span class='shcb-loc'><span>  ) +
</span></span><span class='shcb-loc'><span>  coord_fixed(clip = <span class="hljs-string">"off"</span>) +
</span></span><span class='shcb-loc'><span>  scale_x_continuous(
</span></span><span class='shcb-loc'><span>    limits = c(-<span class="hljs-number">1.5</span>, <span class="hljs-number">1.8</span>), expand = c(<span class="hljs-number">0</span>, <span class="hljs-number">0</span>), name = <span class="hljs-string">""</span>, breaks = <span class="hljs-literal">NULL</span>, labels = <span class="hljs-literal">NULL</span>
</span></span><span class='shcb-loc'><span>  ) +
</span></span><span class='shcb-loc'><span>  scale_y_continuous(
</span></span><span class='shcb-loc'><span>    limits = c(-<span class="hljs-number">1.15</span>, <span class="hljs-number">1.15</span>), expand = c(<span class="hljs-number">0</span>, <span class="hljs-number">0</span>), name = <span class="hljs-string">""</span>, breaks = <span class="hljs-literal">NULL</span>, labels = <span class="hljs-literal">NULL</span>
</span></span><span class='shcb-loc'><span>  ) +
</span></span><span class='shcb-loc'><span>  scale_fill_brewer(type = <span class="hljs-string">"qual"</span>, palette = <span class="hljs-string">"Set2"</span>) +
</span></span><span class='shcb-loc'><span>  theme(
</span></span><span class='shcb-loc'><span>    legend.position = <span class="hljs-string">"none"</span>,
</span></span><span class='shcb-loc'><span>    panel.background = element_blank(),
</span></span><span class='shcb-loc'><span>    plot.background = element_blank()
</span></span><span class='shcb-loc'><span>  ) +
</span></span><span class='shcb-loc'><span>  labs(caption = <span class="hljs-string">"Source: Reveal, https://www.revealnews.org/topic/silicon-valley-diversity/"</span>)
</span></span><span class='shcb-loc'><span>
</span></span><span class='shcb-loc'><span>job_cat_race_panel_pie_v1 &lt;- job_cat_race_panel_pie_v1 +
</span></span><span class='shcb-loc'><span>  theme(
</span></span><span class='shcb-loc'><span>    strip.text = element_text(face = <span class="hljs-string">"bold"</span>, size = <span class="hljs-number">12</span>),
</span></span><span class='shcb-loc'><span>    strip.background = element_rect(fill = <span class="hljs-string">"grey95"</span>),
</span></span><span class='shcb-loc'><span>    panel.spacing = unit(<span class="hljs-number">1.5</span>, <span class="hljs-string">"cm"</span>),
</span></span><span class='shcb-loc'><span>    plot.caption = element_text(family = <span class="hljs-string">""</span>, size = <span class="hljs-number">8</span>, hjust = <span class="hljs-number">0</span>),
</span></span><span class='shcb-loc'><span>    plot.caption.position = <span class="hljs-string">"plot"</span>
</span></span><span class='shcb-loc'><span>  )
</span></span><span class='shcb-loc'><span>
</span></span><span class='shcb-loc'><span>job_cat_race_panel_pie_v1
</span></span><span class='shcb-loc'><span><span class="hljs-comment">####################################################################################################</span>
</span></span><span class='shcb-loc'><span>
</span></span><span class='shcb-loc'><span>
</span></span><span class='shcb-loc'><span><span class="hljs-comment">####################################################################################################</span>
</span></span><span class='shcb-loc'><span><span class="hljs-comment">## Bar chart panels showing job categories of employees in Silicon Valley companies by ethnicity/race</span>
</span></span><span class='shcb-loc'><span>
</span></span><span class='shcb-loc'><span>race_job_cat_total &lt;- it_data_clean %&gt;%
</span></span><span class='shcb-loc'><span>  group_by(race_short, job_category) %&gt;%
</span></span><span class='shcb-loc'><span>  summarize(total = sum(count)) %&gt;%
</span></span><span class='shcb-loc'><span>  mutate(pct = total / sum(total))
</span></span><span class='shcb-loc'><span>
</span></span><span class='shcb-loc'><span>race_job_cat_total &lt;- ungroup(race_job_cat_total) %&gt;%
</span></span><span class='shcb-loc'><span>  mutate(
</span></span><span class='shcb-loc'><span>    race_short = factor(race_short, ordered_lvls_race),
</span></span><span class='shcb-loc'><span>    job_category = factor(job_category, rev(ordered_lvls_job_cat))
</span></span><span class='shcb-loc'><span>  )
</span></span><span class='shcb-loc'><span>
</span></span><span class='shcb-loc'><span>
</span></span><span class='shcb-loc'><span>
</span></span><span class='shcb-loc'><span>job_cat_race_panel_bar_v1 &lt;- ggplot(race_job_cat_total, aes(
</span></span><span class='shcb-loc'><span>  y = job_category,
</span></span><span class='shcb-loc'><span>  x = pct,
</span></span><span class='shcb-loc'><span>  fill = job_category
</span></span><span class='shcb-loc'><span>)) +
</span></span><span class='shcb-loc'><span>  geom_bar(stat = <span class="hljs-string">"identity"</span>, position = position_dodge(<span class="hljs-number">0.5</span>), width = <span class="hljs-number">0.7</span>) +
</span></span><span class='shcb-loc'><span>  facet_wrap(~race_short) +
</span></span><span class='shcb-loc'><span>  scale_fill_brewer(type = <span class="hljs-string">"qual"</span>, palette = <span class="hljs-string">"Set2"</span>) +
</span></span><span class='shcb-loc'><span>  scale_x_continuous(labels = percent) +
</span></span><span class='shcb-loc'><span>  geom_vline(xintercept = seq(from = <span class="hljs-number">0</span>, to = <span class="hljs-number">.6</span>, by = <span class="hljs-number">0.2</span>), color = <span class="hljs-string">"white"</span>) +
</span></span><span class='shcb-loc'><span>  theme_minimal() +
</span></span><span class='shcb-loc'><span>  labs(caption = <span class="hljs-string">"Source: Reveal, https://www.revealnews.org/topic/silicon-valley-diversity/"</span>) +
</span></span><span class='shcb-loc'><span>  theme(
</span></span><span class='shcb-loc'><span>    legend.position = <span class="hljs-string">"none"</span>,
</span></span><span class='shcb-loc'><span>    legend.title = element_blank(),
</span></span><span class='shcb-loc'><span>    panel.grid = element_blank(),
</span></span><span class='shcb-loc'><span>    axis.text = element_text(face = <span class="hljs-string">"bold"</span>, size = <span class="hljs-number">12</span>),
</span></span><span class='shcb-loc'><span>    axis.title = element_blank(),
</span></span><span class='shcb-loc'><span>    axis.text.y = element_text(hjust = <span class="hljs-number">1</span>),
</span></span><span class='shcb-loc'><span>    axis.text.x = element_text(hjust = <span class="hljs-number">0.2</span>),
</span></span><span class='shcb-loc'><span>    axis.line = element_line(),
</span></span><span class='shcb-loc'><span>    axis.line.y = element_blank(),
</span></span><span class='shcb-loc'><span>    axis.ticks.y = element_blank(),
</span></span><span class='shcb-loc'><span>    axis.ticks.x = element_line(colour = <span class="hljs-literal">NULL</span>)
</span></span><span class='shcb-loc'><span>  ) +
</span></span><span class='shcb-loc'><span>  guides(fill = guide_legend(nrow = <span class="hljs-number">1</span>)) +
</span></span><span class='shcb-loc'><span>  theme(
</span></span><span class='shcb-loc'><span>    strip.text = element_text(face = <span class="hljs-string">"bold"</span>, size = <span class="hljs-number">12</span>),
</span></span><span class='shcb-loc'><span>    strip.background = element_rect(fill = <span class="hljs-string">"grey95"</span>, color = <span class="hljs-literal">NA</span>),
</span></span><span class='shcb-loc'><span>    panel.spacing = unit(<span class="hljs-number">1</span>, <span class="hljs-string">"cm"</span>),
</span></span><span class='shcb-loc'><span>    plot.caption = element_text(family = <span class="hljs-string">""</span>, size = <span class="hljs-number">8</span>, hjust = <span class="hljs-number">0</span>),
</span></span><span class='shcb-loc'><span>    plot.caption.position = <span class="hljs-string">"plot"</span>
</span></span><span class='shcb-loc'><span>  )
</span></span><span class='shcb-loc'><span>
</span></span><span class='shcb-loc'><span>job_cat_race_panel_bar_v1
</span></span><span class='shcb-loc'><span><span class="hljs-comment">####################################################################################################</span>
</span></span><span class='shcb-loc'><span>
</span></span><span class='shcb-loc'><span>
</span></span><span class='shcb-loc'><span>
</span></span><span class='shcb-loc'><span><span class="hljs-comment">####################################################################################################</span>
</span></span><span class='shcb-loc'><span><span class="hljs-comment">## Pie chart panels showing ethnicity/race of employees in Silicon Valley companies by job category</span>
</span></span><span class='shcb-loc'><span>race_job_cat_total &lt;- it_data_clean %&gt;%
</span></span><span class='shcb-loc'><span>  group_by(job_category, race_short) %&gt;%
</span></span><span class='shcb-loc'><span>  summarize(total = sum(count)) %&gt;%
</span></span><span class='shcb-loc'><span>  mutate(pct = total / sum(total))
</span></span><span class='shcb-loc'><span>
</span></span><span class='shcb-loc'><span>
</span></span><span class='shcb-loc'><span>race_job_cat_total &lt;- race_job_cat_total %&gt;%
</span></span><span class='shcb-loc'><span>  arrange(job_category, total) %&gt;%
</span></span><span class='shcb-loc'><span>  group_by(job_category) %&gt;%
</span></span><span class='shcb-loc'><span>  mutate(
</span></span><span class='shcb-loc'><span>    count_total = sum(total),
</span></span><span class='shcb-loc'><span>    end_angle = <span class="hljs-number">2</span> * pi * cumsum(total) / count_total, <span class="hljs-comment"># ending angle for each pie slice</span>
</span></span><span class='shcb-loc'><span>    start_angle = lag(end_angle, default = <span class="hljs-number">0</span>), <span class="hljs-comment"># starting angle for each pie slice</span>
</span></span><span class='shcb-loc'><span>    mid_angle = <span class="hljs-number">0.5</span> * (start_angle + end_angle), <span class="hljs-comment"># middle of each pie slice, for the text label</span>
</span></span><span class='shcb-loc'><span>    hjust = ifelse(mid_angle &gt; pi, <span class="hljs-number">1</span>, <span class="hljs-number">0</span>),
</span></span><span class='shcb-loc'><span>    vjust = ifelse(mid_angle &lt; pi / <span class="hljs-number">2</span> | mid_angle &gt; <span class="hljs-number">3</span> * pi / <span class="hljs-number">2</span>, <span class="hljs-number">0</span>, <span class="hljs-number">1</span>),
</span></span><span class='shcb-loc'><span>    type = job_category,
</span></span><span class='shcb-loc'><span>    label = job_category
</span></span><span class='shcb-loc'><span>  )
</span></span><span class='shcb-loc'><span>
</span></span><span class='shcb-loc'><span>race_job_cat_total &lt;- ungroup(race_job_cat_total) %&gt;%
</span></span><span class='shcb-loc'><span>  mutate(
</span></span><span class='shcb-loc'><span>    race_short = factor(race_short, rev(ordered_lvls_race)),
</span></span><span class='shcb-loc'><span>    job_category = factor(job_category, ordered_lvls_job_cat)
</span></span><span class='shcb-loc'><span>  )
</span></span><span class='shcb-loc'><span>
</span></span><span class='shcb-loc'><span>rpie &lt;- <span class="hljs-number">1</span>
</span></span><span class='shcb-loc'><span>rpie1 &lt;- <span class="hljs-number">0</span>
</span></span><span class='shcb-loc'><span>rpie2 &lt;- <span class="hljs-number">1</span>
</span></span><span class='shcb-loc'><span>rlabel &lt;- <span class="hljs-number">1.02</span> * rpie
</span></span><span class='shcb-loc'><span>
</span></span><span class='shcb-loc'><span>
</span></span><span class='shcb-loc'><span>job_cat_race_panel_pie_v2 &lt;- ggplot() +
</span></span><span class='shcb-loc'><span>  geom_arc_bar(
</span></span><span class='shcb-loc'><span>    data = race_job_cat_total,
</span></span><span class='shcb-loc'><span>    aes(
</span></span><span class='shcb-loc'><span>      x0 = <span class="hljs-number">0</span>, y0 = <span class="hljs-number">0</span>, r0 = rpie1, r = rpie2,
</span></span><span class='shcb-loc'><span>      start = start_angle, end = end_angle, fill = race_short
</span></span><span class='shcb-loc'><span>    ),
</span></span><span class='shcb-loc'><span>    color = <span class="hljs-string">"white"</span>, size = <span class="hljs-number">0.5</span>
</span></span><span class='shcb-loc'><span>  ) +
</span></span><span class='shcb-loc'><span>  facet_wrap(~job_category) +
</span></span><span class='shcb-loc'><span>  geom_text(
</span></span><span class='shcb-loc'><span>    data = race_job_cat_total,
</span></span><span class='shcb-loc'><span>    aes(
</span></span><span class='shcb-loc'><span>      x = rlabel * sin(mid_angle),
</span></span><span class='shcb-loc'><span>      y = rlabel * cos(mid_angle),
</span></span><span class='shcb-loc'><span>      label = race_short,
</span></span><span class='shcb-loc'><span>      hjust = hjust, vjust = vjust
</span></span><span class='shcb-loc'><span>    ),
</span></span><span class='shcb-loc'><span>    size = <span class="hljs-number">10</span> / .pt
</span></span><span class='shcb-loc'><span>  ) +
</span></span><span class='shcb-loc'><span>  geom_text(
</span></span><span class='shcb-loc'><span>    data = race_job_cat_total,
</span></span><span class='shcb-loc'><span>    aes(
</span></span><span class='shcb-loc'><span>      x = <span class="hljs-number">0.6</span> * sin(mid_angle),
</span></span><span class='shcb-loc'><span>      y = <span class="hljs-number">0.6</span> * cos(mid_angle),
</span></span><span class='shcb-loc'><span>      label = percent(pct, accuracy = <span class="hljs-number">1</span>)
</span></span><span class='shcb-loc'><span>    ),
</span></span><span class='shcb-loc'><span>    size = <span class="hljs-number">10</span> / .pt,
</span></span><span class='shcb-loc'><span>    hjust = <span class="hljs-number">0.5</span>, vjust = <span class="hljs-number">0.5</span>
</span></span><span class='shcb-loc'><span>  ) +
</span></span><span class='shcb-loc'><span>  coord_fixed(clip = <span class="hljs-string">"off"</span>) +
</span></span><span class='shcb-loc'><span>  scale_x_continuous(
</span></span><span class='shcb-loc'><span>    limits = c(-<span class="hljs-number">1.5</span>, <span class="hljs-number">1.8</span>), expand = c(<span class="hljs-number">0</span>, <span class="hljs-number">0</span>), name = <span class="hljs-string">""</span>, breaks = <span class="hljs-literal">NULL</span>, labels = <span class="hljs-literal">NULL</span>
</span></span><span class='shcb-loc'><span>  ) +
</span></span><span class='shcb-loc'><span>  scale_y_continuous(
</span></span><span class='shcb-loc'><span>    limits = c(-<span class="hljs-number">1.15</span>, <span class="hljs-number">1.15</span>), expand = c(<span class="hljs-number">0</span>, <span class="hljs-number">0</span>), name = <span class="hljs-string">""</span>, breaks = <span class="hljs-literal">NULL</span>, labels = <span class="hljs-literal">NULL</span>
</span></span><span class='shcb-loc'><span>  ) +
</span></span><span class='shcb-loc'><span>  labs(caption = <span class="hljs-string">"Source: Reveal, https://www.revealnews.org/topic/silicon-valley-diversity/"</span>) +
</span></span><span class='shcb-loc'><span>  scale_fill_manual(values = race_colors) +
</span></span><span class='shcb-loc'><span>  theme(
</span></span><span class='shcb-loc'><span>    legend.position = <span class="hljs-string">"none"</span>,
</span></span><span class='shcb-loc'><span>    panel.background = element_blank(),
</span></span><span class='shcb-loc'><span>    plot.background = element_blank()
</span></span><span class='shcb-loc'><span>  )
</span></span><span class='shcb-loc'><span>
</span></span><span class='shcb-loc'><span>job_cat_race_panel_pie_v2 &lt;- job_cat_race_panel_pie_v2 +
</span></span><span class='shcb-loc'><span>  theme(
</span></span><span class='shcb-loc'><span>    strip.text = element_text(face = <span class="hljs-string">"bold"</span>, size = <span class="hljs-number">12</span>),
</span></span><span class='shcb-loc'><span>    strip.background = element_rect(fill = <span class="hljs-string">"grey95"</span>),
</span></span><span class='shcb-loc'><span>    panel.spacing = unit(<span class="hljs-number">1.5</span>, <span class="hljs-string">"cm"</span>),
</span></span><span class='shcb-loc'><span>    plot.caption = element_text(family = <span class="hljs-string">""</span>, size = <span class="hljs-number">8</span>, hjust = <span class="hljs-number">0</span>),
</span></span><span class='shcb-loc'><span>    plot.caption.position = <span class="hljs-string">"plot"</span>
</span></span><span class='shcb-loc'><span>  )
</span></span><span class='shcb-loc'><span>
</span></span><span class='shcb-loc'><span>job_cat_race_panel_pie_v2
</span></span><span class='shcb-loc'><span><span class="hljs-comment">####################################################################################################</span>
</span></span><span class='shcb-loc'><span>
</span></span><span class='shcb-loc'><span>
</span></span><span class='shcb-loc'><span><span class="hljs-comment">####################################################################################################</span>
</span></span><span class='shcb-loc'><span><span class="hljs-comment">## Bar chart panels showing ethnicity/race of employees in Silicon Valley companies by job category</span>
</span></span><span class='shcb-loc'><span>
</span></span><span class='shcb-loc'><span>
</span></span><span class='shcb-loc'><span>race_job_cat_total &lt;- ungroup(race_job_cat_total) %&gt;%
</span></span><span class='shcb-loc'><span>  mutate(
</span></span><span class='shcb-loc'><span>    race_short = factor(race_short, rev(ordered_lvls_race)),
</span></span><span class='shcb-loc'><span>    job_category = factor(job_category, ordered_lvls_job_cat)
</span></span><span class='shcb-loc'><span>  )
</span></span><span class='shcb-loc'><span>
</span></span><span class='shcb-loc'><span>job_cat_race_panel_bar_v2 &lt;- ggplot(race_job_cat_total, aes(
</span></span><span class='shcb-loc'><span>  y = race_short,
</span></span><span class='shcb-loc'><span>  x = pct,
</span></span><span class='shcb-loc'><span>  fill = race_short
</span></span><span class='shcb-loc'><span>)) +
</span></span><span class='shcb-loc'><span>  geom_bar(stat = <span class="hljs-string">"identity"</span>, position = position_dodge(<span class="hljs-number">0.5</span>), width = <span class="hljs-number">0.7</span>) +
</span></span><span class='shcb-loc'><span>  facet_wrap(~job_category) +
</span></span><span class='shcb-loc'><span>  scale_fill_manual(values = race_colors) +
</span></span><span class='shcb-loc'><span>  scale_x_continuous(labels = percent) +
</span></span><span class='shcb-loc'><span>  scale_y_discrete(expand = c(<span class="hljs-number">0</span>, <span class="hljs-number">0</span>)) +
</span></span><span class='shcb-loc'><span>  geom_vline(xintercept = seq(from = <span class="hljs-number">0</span>, to = <span class="hljs-number">.6</span>, by = <span class="hljs-number">0.2</span>), color = <span class="hljs-string">"white"</span>) +
</span></span><span class='shcb-loc'><span>  theme_minimal() +
</span></span><span class='shcb-loc'><span>  labs(caption = <span class="hljs-string">"Source: Reveal, https://www.revealnews.org/topic/silicon-valley-diversity/"</span>) +
</span></span><span class='shcb-loc'><span>  theme(
</span></span><span class='shcb-loc'><span>    legend.position = <span class="hljs-string">"none"</span>,
</span></span><span class='shcb-loc'><span>    legend.title = element_blank(),
</span></span><span class='shcb-loc'><span>    panel.grid = element_blank(),
</span></span><span class='shcb-loc'><span>    axis.text = element_text(face = <span class="hljs-string">"bold"</span>, size = <span class="hljs-number">12</span>),
</span></span><span class='shcb-loc'><span>    axis.title = element_blank(),
</span></span><span class='shcb-loc'><span>    axis.text.y = element_text(hjust = <span class="hljs-number">1</span>),
</span></span><span class='shcb-loc'><span>    axis.text.x = element_text(hjust = <span class="hljs-number">0.2</span>),
</span></span><span class='shcb-loc'><span>    axis.line = element_line(),
</span></span><span class='shcb-loc'><span>    axis.line.y = element_blank(),
</span></span><span class='shcb-loc'><span>    axis.ticks.y = element_blank(),
</span></span><span class='shcb-loc'><span>    axis.ticks.x = element_line(colour = <span class="hljs-literal">NULL</span>)
</span></span><span class='shcb-loc'><span>  ) +
</span></span><span class='shcb-loc'><span>  guides(fill = guide_legend(nrow = <span class="hljs-number">1</span>)) +
</span></span><span class='shcb-loc'><span>  theme(
</span></span><span class='shcb-loc'><span>    strip.text = element_text(face = <span class="hljs-string">"bold"</span>, size = <span class="hljs-number">12</span>),
</span></span><span class='shcb-loc'><span>    strip.background = element_rect(fill = <span class="hljs-string">"grey95"</span>, color = <span class="hljs-literal">NA</span>),
</span></span><span class='shcb-loc'><span>    panel.spacing = unit(<span class="hljs-number">1</span>, <span class="hljs-string">"cm"</span>),
</span></span><span class='shcb-loc'><span>    plot.caption = element_text(family = <span class="hljs-string">""</span>, size = <span class="hljs-number">8</span>, hjust = <span class="hljs-number">0</span>),
</span></span><span class='shcb-loc'><span>    plot.caption.position = <span class="hljs-string">"plot"</span>
</span></span><span class='shcb-loc'><span>  )
</span></span><span class='shcb-loc'><span>
</span></span><span class='shcb-loc'><span>job_cat_race_panel_bar_v2
</span></span><span class='shcb-loc'><span><span class="hljs-comment">####################################################################################################</span>
</span></span><span class='shcb-loc'><span>
</span></span><span class='shcb-loc'><span>
</span></span><span class='shcb-loc'><span><span class="hljs-comment">####################################################################################################</span>
</span></span><span class='shcb-loc'><span><span class="hljs-comment">## A dot chart of race/ethnicity distribution within job category</span>
</span></span><span class='shcb-loc'><span>
</span></span><span class='shcb-loc'><span>race_job_cat_distribution_within_jobcat &lt;- it_data_clean %&gt;%
</span></span><span class='shcb-loc'><span>  group_by(race_short, job_category) %&gt;%
</span></span><span class='shcb-loc'><span>  summarize(total = sum(count)) %&gt;%
</span></span><span class='shcb-loc'><span>  group_by(race_short) %&gt;%
</span></span><span class='shcb-loc'><span>  mutate(pct = total / sum(total))
</span></span><span class='shcb-loc'><span>
</span></span><span class='shcb-loc'><span>race_job_cat_distribution_within_race &lt;- it_data_clean %&gt;%
</span></span><span class='shcb-loc'><span>  group_by(race_short, job_category) %&gt;%
</span></span><span class='shcb-loc'><span>  summarize(total = sum(count)) %&gt;%
</span></span><span class='shcb-loc'><span>  group_by(job_category) %&gt;%
</span></span><span class='shcb-loc'><span>  mutate(pct = total / sum(total))
</span></span><span class='shcb-loc'><span>
</span></span><span class='shcb-loc'><span>
</span></span><span class='shcb-loc'><span>race_job_cat_distribution_within_race &lt;- ungroup(race_job_cat_distribution_within_race) %&gt;%
</span></span><span class='shcb-loc'><span>  mutate(
</span></span><span class='shcb-loc'><span>    race_short = factor(race_short, rev(ordered_lvls_race)),
</span></span><span class='shcb-loc'><span>    job_category = factor(job_category, ordered_lvls_job_cat)
</span></span><span class='shcb-loc'><span>  )
</span></span><span class='shcb-loc'><span>
</span></span><span class='shcb-loc'><span>
</span></span><span class='shcb-loc'><span>
</span></span><span class='shcb-loc'><span>race_within_job_cat_dot_plot &lt;- ggplot(
</span></span><span class='shcb-loc'><span>  race_job_cat_distribution_within_race,
</span></span><span class='shcb-loc'><span>  aes(
</span></span><span class='shcb-loc'><span>    x = pct,
</span></span><span class='shcb-loc'><span>    y = race_short,
</span></span><span class='shcb-loc'><span>    fill = race_short
</span></span><span class='shcb-loc'><span>  )
</span></span><span class='shcb-loc'><span>) +
</span></span><span class='shcb-loc'><span>  geom_segment(aes(x = <span class="hljs-number">0</span>, y = race_short, xend = pct, yend = race_short), color = <span class="hljs-string">"grey70"</span>) +
</span></span><span class='shcb-loc'><span>  geom_point(shape = <span class="hljs-number">21</span>, color = <span class="hljs-string">"white"</span>, size = rel(<span class="hljs-number">3.5</span>)) +
</span></span><span class='shcb-loc'><span>  facet_grid(job_category ~ .,
</span></span><span class='shcb-loc'><span>    scales = <span class="hljs-string">"free_y"</span>,
</span></span><span class='shcb-loc'><span>    space = <span class="hljs-string">"free"</span>,
</span></span><span class='shcb-loc'><span>    <span class="hljs-keyword">switch</span> = <span class="hljs-string">"y"</span>
</span></span><span class='shcb-loc'><span>  ) +
</span></span><span class='shcb-loc'><span>  scale_fill_manual(values = race_colors) +
</span></span><span class='shcb-loc'><span>  labs(caption = <span class="hljs-string">"Source: Reveal, https://www.revealnews.org/topic/silicon-valley-diversity/"</span>)
</span></span><span class='shcb-loc'><span>
</span></span><span class='shcb-loc'><span>race_within_job_cat_dot_plot &lt;- race_within_job_cat_dot_plot +
</span></span><span class='shcb-loc'><span>  scale_x_continuous(labels = percent) +
</span></span><span class='shcb-loc'><span>  theme(
</span></span><span class='shcb-loc'><span>    panel.grid.major.y = element_line(size = rel(<span class="hljs-number">0.075</span>), linetype = <span class="hljs-string">"dashed"</span>),
</span></span><span class='shcb-loc'><span>    strip.background.y = element_rect(fill = <span class="hljs-string">"white"</span>, color = <span class="hljs-string">"grey80"</span>),
</span></span><span class='shcb-loc'><span>    axis.ticks.x = element_line(size = rel(<span class="hljs-number">0.5</span>)),
</span></span><span class='shcb-loc'><span>    strip.text.y = element_text(angle = <span class="hljs-number">180</span>, face = <span class="hljs-string">"bold"</span>, size = rel(<span class="hljs-number">1.15</span>)),
</span></span><span class='shcb-loc'><span>    strip.placement = <span class="hljs-string">"outside"</span>,
</span></span><span class='shcb-loc'><span>    axis.text = element_text(face = <span class="hljs-string">"bold"</span>, size = <span class="hljs-number">10</span>),
</span></span><span class='shcb-loc'><span>    panel.background = element_rect(fill = <span class="hljs-literal">NA</span>, color = <span class="hljs-string">"gray80"</span>),
</span></span><span class='shcb-loc'><span>    legend.position = <span class="hljs-string">"none"</span>
</span></span><span class='shcb-loc'><span>  )
</span></span><span class='shcb-loc'><span>
</span></span><span class='shcb-loc'><span>race_within_job_cat_dot_plot &lt;- race_within_job_cat_dot_plot + theme(
</span></span><span class='shcb-loc'><span>  panel.border = element_rect(color = <span class="hljs-string">"grey90"</span>, fill = <span class="hljs-literal">NA</span>),
</span></span><span class='shcb-loc'><span>  axis.text.y = element_text(size = rel(<span class="hljs-number">0.8</span>)),
</span></span><span class='shcb-loc'><span>  axis.title = element_blank(),
</span></span><span class='shcb-loc'><span>  plot.caption = element_text(family = <span class="hljs-string">""</span>, size = <span class="hljs-number">8</span>, hjust = <span class="hljs-number">0</span>),
</span></span><span class='shcb-loc'><span>  plot.caption.position = <span class="hljs-string">"plot"</span>
</span></span><span class='shcb-loc'><span>)
</span></span><span class='shcb-loc'><span>
</span></span><span class='shcb-loc'><span>race_within_job_cat_dot_plot
</span></span><span class='shcb-loc'><span><span class="hljs-comment">####################################################################################################</span>
</span></span><span class='shcb-loc'><span>
</span></span><span class='shcb-loc'><span>
</span></span><span class='shcb-loc'><span>
</span></span><span class='shcb-loc'><span><span class="hljs-comment">####################################################################################################</span>
</span></span><span class='shcb-loc'><span><span class="hljs-comment">## A dot chart of job category distribution within race/ethnicity</span>
</span></span><span class='shcb-loc'><span>
</span></span><span class='shcb-loc'><span>race_job_cat_distribution_within_jobcat &lt;- ungroup(race_job_cat_distribution_within_jobcat) %&gt;%
</span></span><span class='shcb-loc'><span>  mutate(
</span></span><span class='shcb-loc'><span>    race_short = factor(race_short, ordered_lvls_race),
</span></span><span class='shcb-loc'><span>    job_category = factor(job_category, rev(ordered_lvls_job_cat))
</span></span><span class='shcb-loc'><span>  )
</span></span><span class='shcb-loc'><span>
</span></span><span class='shcb-loc'><span>job_cat_within_race_dot_plot &lt;- ggplot(
</span></span><span class='shcb-loc'><span>  race_job_cat_distribution_within_jobcat,
</span></span><span class='shcb-loc'><span>  aes(
</span></span><span class='shcb-loc'><span>    x = pct,
</span></span><span class='shcb-loc'><span>    y = job_category,
</span></span><span class='shcb-loc'><span>    fill = job_category
</span></span><span class='shcb-loc'><span>  )
</span></span><span class='shcb-loc'><span>) +
</span></span><span class='shcb-loc'><span>  geom_segment(aes(x = <span class="hljs-number">0</span>, y = job_category, xend = pct, yend = job_category), color = <span class="hljs-string">"grey70"</span>) +
</span></span><span class='shcb-loc'><span>  geom_point(shape = <span class="hljs-number">21</span>, color = <span class="hljs-string">"white"</span>, size = rel(<span class="hljs-number">3.5</span>)) +
</span></span><span class='shcb-loc'><span>  facet_grid(race_short ~ .,
</span></span><span class='shcb-loc'><span>    scales = <span class="hljs-string">"free_y"</span>,
</span></span><span class='shcb-loc'><span>    space = <span class="hljs-string">"free"</span>,
</span></span><span class='shcb-loc'><span>    <span class="hljs-keyword">switch</span> = <span class="hljs-string">"y"</span>
</span></span><span class='shcb-loc'><span>  ) +
</span></span><span class='shcb-loc'><span>  scale_fill_manual(values = job_cat_colors) +
</span></span><span class='shcb-loc'><span>  labs(caption = <span class="hljs-string">"Source: Reveal, https://www.revealnews.org/topic/silicon-valley-diversity/"</span>)
</span></span><span class='shcb-loc'><span>
</span></span><span class='shcb-loc'><span>job_cat_within_race_dot_plot &lt;- job_cat_within_race_dot_plot +
</span></span><span class='shcb-loc'><span>  scale_x_continuous(labels = percent) +
</span></span><span class='shcb-loc'><span>  theme(
</span></span><span class='shcb-loc'><span>    panel.grid.major.y = element_line(size = rel(<span class="hljs-number">0.075</span>), linetype = <span class="hljs-string">"dashed"</span>),
</span></span><span class='shcb-loc'><span>    strip.background.y = element_rect(fill = <span class="hljs-string">"white"</span>, color = <span class="hljs-string">"grey80"</span>),
</span></span><span class='shcb-loc'><span>    axis.ticks.x = element_line(size = rel(<span class="hljs-number">0.5</span>)),
</span></span><span class='shcb-loc'><span>    strip.text.y = element_text(angle = <span class="hljs-number">180</span>, face = <span class="hljs-string">"bold"</span>, size = rel(<span class="hljs-number">1.2</span>)),
</span></span><span class='shcb-loc'><span>    strip.placement = <span class="hljs-string">"outside"</span>,
</span></span><span class='shcb-loc'><span>    axis.text = element_text(face = <span class="hljs-string">"bold"</span>, size = <span class="hljs-number">10</span>),
</span></span><span class='shcb-loc'><span>    panel.background = element_rect(fill = <span class="hljs-literal">NA</span>, color = <span class="hljs-string">"gray80"</span>),
</span></span><span class='shcb-loc'><span>    legend.position = <span class="hljs-string">"none"</span>
</span></span><span class='shcb-loc'><span>  )
</span></span><span class='shcb-loc'><span>
</span></span><span class='shcb-loc'><span>job_cat_within_race_dot_plot &lt;- job_cat_within_race_dot_plot + theme(
</span></span><span class='shcb-loc'><span>  panel.border = element_rect(color = <span class="hljs-string">"grey90"</span>, fill = <span class="hljs-literal">NA</span>),
</span></span><span class='shcb-loc'><span>  axis.text.y = element_text(size = rel(<span class="hljs-number">0.9</span>)),
</span></span><span class='shcb-loc'><span>  axis.title = element_blank(),
</span></span><span class='shcb-loc'><span>  plot.caption = element_text(family = <span class="hljs-string">""</span>, size = <span class="hljs-number">8</span>, hjust = <span class="hljs-number">0</span>),
</span></span><span class='shcb-loc'><span>  plot.caption.position = <span class="hljs-string">"plot"</span>
</span></span><span class='shcb-loc'><span>)
</span></span><span class='shcb-loc'><span>
</span></span><span class='shcb-loc'><span>job_cat_within_race_dot_plot
</span></span><span class='shcb-loc'><span><span class="hljs-comment">####################################################################################################</span>
</span></span><span class='shcb-loc'><span>
</span></span><span class='shcb-loc'><span>
</span></span><span class='shcb-loc'><span><span class="hljs-comment">####################################################################################################</span>
</span></span><span class='shcb-loc'><span><span class="hljs-comment">## Side-by-side bar charts of job categories and ethnicity/race distribution by gender</span>
</span></span><span class='shcb-loc'><span>
</span></span><span class='shcb-loc'><span>it_data_clean &lt;- read_csv(<span class="hljs-string">"tech_diversity_cleaned.csv"</span>)
</span></span><span class='shcb-loc'><span>
</span></span><span class='shcb-loc'><span>
</span></span><span class='shcb-loc'><span><span class="hljs-comment">#  https://blog.datawrapper.de/gendercolor/</span>
</span></span><span class='shcb-loc'><span><span class="hljs-comment"># https://stackoverflow.com/questions/17083362/colorize-parts-of-the-title-in-a-plot</span>
</span></span><span class='shcb-loc'><span><span class="hljs-comment"># https://stackoverflow.com/questions/52902946/using-unicode-characters-as-shape</span>
</span></span><span class='shcb-loc'><span>
</span></span><span class='shcb-loc'><span>gender_ratio_by_job_cat_role &lt;- it_data_clean %&gt;%
</span></span><span class='shcb-loc'><span>  group_by(job_category, race_short, gender) %&gt;%
</span></span><span class='shcb-loc'><span>  summarize(total = sum(count)) %&gt;%
</span></span><span class='shcb-loc'><span>  group_by(job_category, gender) %&gt;%
</span></span><span class='shcb-loc'><span>  mutate(pct = total / sum(total))
</span></span><span class='shcb-loc'><span>
</span></span><span class='shcb-loc'><span>
</span></span><span class='shcb-loc'><span>gender_ratio_by_job_cat_role &lt;- ungroup(gender_ratio_by_job_cat_role) %&gt;%
</span></span><span class='shcb-loc'><span>  mutate(
</span></span><span class='shcb-loc'><span>    race_short = factor(race_short, rev(ordered_lvls_race)),
</span></span><span class='shcb-loc'><span>    job_category = factor(job_category, ordered_lvls_job_cat)
</span></span><span class='shcb-loc'><span>  )
</span></span><span class='shcb-loc'><span>
</span></span><span class='shcb-loc'><span>gender_job_cat_race_bar &lt;- ggplot(
</span></span><span class='shcb-loc'><span>  gender_ratio_by_job_cat_role,
</span></span><span class='shcb-loc'><span>  aes(y = race_short, x = pct, group = gender, fill = gender)
</span></span><span class='shcb-loc'><span>) +
</span></span><span class='shcb-loc'><span>  geom_bar(stat = <span class="hljs-string">"identity"</span>, position = <span class="hljs-string">"dodge"</span>, width = <span class="hljs-number">0.7</span>) +
</span></span><span class='shcb-loc'><span>  scale_fill_manual(values = c(<span class="hljs-string">"Female"</span> = <span class="hljs-string">"#8700f9"</span>, <span class="hljs-string">"Male"</span> = <span class="hljs-string">"#00c4aa"</span>)) +
</span></span><span class='shcb-loc'><span>  scale_x_sqrt(
</span></span><span class='shcb-loc'><span>    labels = scales::percent_format(accuracy = <span class="hljs-number">1</span>),
</span></span><span class='shcb-loc'><span>    limits = c(<span class="hljs-number">0</span>, <span class="hljs-number">.8</span>),
</span></span><span class='shcb-loc'><span>    breaks = c(<span class="hljs-number">0</span>, <span class="hljs-number">0.05</span>, <span class="hljs-number">.1</span>, <span class="hljs-number">.3</span>, <span class="hljs-number">.5</span>, <span class="hljs-number">.7</span>),
</span></span><span class='shcb-loc'><span>    expand = c(<span class="hljs-number">0</span>, <span class="hljs-number">0</span>)
</span></span><span class='shcb-loc'><span>  ) +
</span></span><span class='shcb-loc'><span>  facet_wrap(job_category ~ .)
</span></span><span class='shcb-loc'><span>
</span></span><span class='shcb-loc'><span>
</span></span><span class='shcb-loc'><span>gender_job_cat_race_bar &lt;- gender_job_cat_race_bar +
</span></span><span class='shcb-loc'><span>  ggtitle(
</span></span><span class='shcb-loc'><span>    label = <span class="hljs-string">"Job categories and ethnicity/race distribution by gender"</span>,
</span></span><span class='shcb-loc'><span>    subtitle = paste(
</span></span><span class='shcb-loc'><span>      <span class="hljs-string">"&lt;b style='color:#8700f9'&gt;\u25A1 Female&lt;/b&gt;"</span>,
</span></span><span class='shcb-loc'><span>      <span class="hljs-string">"&lt;b style='color:#00c4aa'&gt;\u25A1 Male&lt;/b&gt;"</span>
</span></span><span class='shcb-loc'><span>    )
</span></span><span class='shcb-loc'><span>  ) +
</span></span><span class='shcb-loc'><span>  labs(caption = <span class="hljs-string">"Note: The x-axis is transformed using the square root function to see smaller values. Source: Reveal, https://www.revealnews.org/topic/silicon-valley-diversity/"</span>) +
</span></span><span class='shcb-loc'><span>  geom_vline(xintercept = c(<span class="hljs-number">0</span>, <span class="hljs-number">0.05</span>, <span class="hljs-number">.1</span>, <span class="hljs-number">.3</span>, <span class="hljs-number">.5</span>, <span class="hljs-number">.7</span>), color = <span class="hljs-string">"grey98"</span>) +
</span></span><span class='shcb-loc'><span>  theme_wsj() +
</span></span><span class='shcb-loc'><span>  theme(
</span></span><span class='shcb-loc'><span>    strip.text.x = element_text(face = <span class="hljs-string">"bold"</span>, size = <span class="hljs-number">12</span>),
</span></span><span class='shcb-loc'><span>    strip.background = element_rect(fill = <span class="hljs-string">"grey98"</span>),
</span></span><span class='shcb-loc'><span>    panel.background = element_rect(fill = <span class="hljs-string">"grey98"</span>),
</span></span><span class='shcb-loc'><span>    plot.background = element_rect(fill = <span class="hljs-string">"grey98"</span>),
</span></span><span class='shcb-loc'><span>    panel.grid.major = element_blank(),
</span></span><span class='shcb-loc'><span>    panel.spacing = unit(<span class="hljs-number">0.8</span>, <span class="hljs-string">"cm"</span>),
</span></span><span class='shcb-loc'><span>    legend.position = <span class="hljs-string">"none"</span>,
</span></span><span class='shcb-loc'><span>    legend.title = element_blank(),
</span></span><span class='shcb-loc'><span>    plot.title = element_text(size = <span class="hljs-number">14</span>, family = <span class="hljs-string">""</span>),
</span></span><span class='shcb-loc'><span>    plot.title.position = <span class="hljs-string">"plot"</span>,
</span></span><span class='shcb-loc'><span>    plot.subtitle = ggtext::element_markdown(
</span></span><span class='shcb-loc'><span>      lineheight = <span class="hljs-number">1.1</span>,
</span></span><span class='shcb-loc'><span>      family = <span class="hljs-string">"Arial Unicode MS"</span>,
</span></span><span class='shcb-loc'><span>      size = <span class="hljs-number">12</span>
</span></span><span class='shcb-loc'><span>    ),
</span></span><span class='shcb-loc'><span>    plot.caption = element_text(family = <span class="hljs-string">""</span>, size = <span class="hljs-number">8</span>, hjust = <span class="hljs-number">0</span>),
</span></span><span class='shcb-loc'><span>    plot.caption.position = <span class="hljs-string">"plot"</span>
</span></span><span class='shcb-loc'><span>  )
</span></span><span class='shcb-loc'><span>
</span></span><span class='shcb-loc'><span>annotation_df &lt;- data.frame(
</span></span><span class='shcb-loc'><span>  label = <span class="hljs-string">"Of all female executives,\nBlack females are about\n 2% of them, and of all\nmale executives, Black males\nare about 1% of them"</span>,
</span></span><span class='shcb-loc'><span>  x = <span class="hljs-number">.16</span>,
</span></span><span class='shcb-loc'><span>  y = <span class="hljs-number">1.8</span>,
</span></span><span class='shcb-loc'><span>  gender = <span class="hljs-string">"Male"</span>,
</span></span><span class='shcb-loc'><span>  job_category = <span class="hljs-string">"Executives"</span>,
</span></span><span class='shcb-loc'><span>  race_short = <span class="hljs-string">"Black"</span>
</span></span><span class='shcb-loc'><span>)
</span></span><span class='shcb-loc'><span>
</span></span><span class='shcb-loc'><span>gender_job_cat_race_bar &lt;- gender_job_cat_race_bar +
</span></span><span class='shcb-loc'><span>  geom_curve(
</span></span><span class='shcb-loc'><span>    data = data.frame(x = <span class="hljs-number">.02</span>, y = <span class="hljs-number">2</span>, xend = <span class="hljs-number">0.15</span>, yend = <span class="hljs-number">2</span>, gender = <span class="hljs-string">"Male"</span>, job_category = <span class="hljs-string">"Executives"</span>),
</span></span><span class='shcb-loc'><span>    aes(x = x, y = y, xend = xend, yend = yend),
</span></span><span class='shcb-loc'><span>    curvature = <span class="hljs-number">0</span>,
</span></span><span class='shcb-loc'><span>    arrow = arrow(angle = <span class="hljs-number">10</span>, ends = <span class="hljs-string">"first"</span>, type = <span class="hljs-string">"closed"</span>, length = unit(<span class="hljs-number">0.12</span>, <span class="hljs-string">"inches"</span>))
</span></span><span class='shcb-loc'><span>  ) +
</span></span><span class='shcb-loc'><span>  geom_text(
</span></span><span class='shcb-loc'><span>    data = annotation_df,
</span></span><span class='shcb-loc'><span>    aes(x = x, y = y, label = label),
</span></span><span class='shcb-loc'><span>    size = rel(<span class="hljs-number">3.5</span>),
</span></span><span class='shcb-loc'><span>    hjust = <span class="hljs-number">0</span>,
</span></span><span class='shcb-loc'><span>    family = <span class="hljs-string">"sans"</span>,
</span></span><span class='shcb-loc'><span>  )
</span></span><span class='shcb-loc'><span>
</span></span><span class='shcb-loc'><span>
</span></span><span class='shcb-loc'><span>gender_job_cat_race_bar
</span></span><span class='shcb-loc'><span><span class="hljs-comment">####################################################################################################</span>
</span></span><span class='shcb-loc'><span>
</span></span><span class='shcb-loc'><span>
</span></span><span class='shcb-loc'><span><span class="hljs-comment">####################################################################################################</span>
</span></span><span class='shcb-loc'><span><span class="hljs-comment">## Dot charts of job categories and ethnicity/race distribution by gender</span>
</span></span><span class='shcb-loc'><span>
</span></span><span class='shcb-loc'><span>gender_ratio_by_job_cat_role &lt;- it_data_clean %&gt;%
</span></span><span class='shcb-loc'><span>  group_by(job_category, race_short, gender) %&gt;%
</span></span><span class='shcb-loc'><span>  summarize(total = sum(count)) %&gt;%
</span></span><span class='shcb-loc'><span>  group_by(job_category, gender) %&gt;%
</span></span><span class='shcb-loc'><span>  mutate(pct = total / sum(total))
</span></span><span class='shcb-loc'><span>
</span></span><span class='shcb-loc'><span>
</span></span><span class='shcb-loc'><span>gender_ratio_by_job_cat_role &lt;- ungroup(gender_ratio_by_job_cat_role) %&gt;%
</span></span><span class='shcb-loc'><span>  mutate(
</span></span><span class='shcb-loc'><span>    race_short = factor(race_short, rev(ordered_lvls_race)),
</span></span><span class='shcb-loc'><span>    job_category = factor(job_category, ordered_lvls_job_cat)
</span></span><span class='shcb-loc'><span>  )
</span></span><span class='shcb-loc'><span>
</span></span><span class='shcb-loc'><span>
</span></span><span class='shcb-loc'><span>gender_job_cat_race_dot &lt;- ggplot(
</span></span><span class='shcb-loc'><span>  gender_ratio_by_job_cat_role,
</span></span><span class='shcb-loc'><span>  aes(y = race_short, x = pct, group = gender, fill = gender)
</span></span><span class='shcb-loc'><span>) +
</span></span><span class='shcb-loc'><span>  geom_point(shape = <span class="hljs-number">21</span>, color = <span class="hljs-string">"grey80"</span>, size = <span class="hljs-number">4</span>, alpha = <span class="hljs-number">0.9</span>) +
</span></span><span class='shcb-loc'><span>  scale_fill_manual(values = c(<span class="hljs-string">"Female"</span> = <span class="hljs-string">"#8700f9"</span>, <span class="hljs-string">"Male"</span> = <span class="hljs-string">"#00c4aa"</span>)) +
</span></span><span class='shcb-loc'><span>  scale_x_continuous(
</span></span><span class='shcb-loc'><span>    labels = scales::percent_format(accuracy = <span class="hljs-number">1</span>),
</span></span><span class='shcb-loc'><span>    limits = c(<span class="hljs-number">0</span>, <span class="hljs-number">.8</span>),
</span></span><span class='shcb-loc'><span>    breaks = seq(from = <span class="hljs-number">0</span>, to = <span class="hljs-number">8</span>, by = <span class="hljs-number">2</span>) / <span class="hljs-number">10</span>
</span></span><span class='shcb-loc'><span>  ) +
</span></span><span class='shcb-loc'><span>  facet_wrap(job_category ~ ., ncol = <span class="hljs-number">1</span>)
</span></span><span class='shcb-loc'><span>
</span></span><span class='shcb-loc'><span>
</span></span><span class='shcb-loc'><span>gender_job_cat_race_dot &lt;- gender_job_cat_race_dot +
</span></span><span class='shcb-loc'><span>  ggtitle(
</span></span><span class='shcb-loc'><span>    label = <span class="hljs-string">"Job categories and ethnicity/race distribution by gender"</span>,
</span></span><span class='shcb-loc'><span>    subtitle = paste(
</span></span><span class='shcb-loc'><span>      <span class="hljs-string">"&lt;b style='color:#8700f9'&gt;\u25EF Female&lt;/b&gt;"</span>,
</span></span><span class='shcb-loc'><span>      <span class="hljs-string">"&lt;b style='color:#00c4aa'&gt;\u25EF Male&lt;/b&gt;"</span>
</span></span><span class='shcb-loc'><span>    )
</span></span><span class='shcb-loc'><span>  ) +
</span></span><span class='shcb-loc'><span>  labs(caption = <span class="hljs-string">"Source: Reveal, https://www.revealnews.org/topic/silicon-valley-diversity/"</span>) +
</span></span><span class='shcb-loc'><span>  theme_wsj() +
</span></span><span class='shcb-loc'><span>  theme(
</span></span><span class='shcb-loc'><span>    strip.text.x = element_text(face = <span class="hljs-string">"bold"</span>, size = <span class="hljs-number">12</span>),
</span></span><span class='shcb-loc'><span>    strip.background = element_rect(fill = <span class="hljs-string">"grey98"</span>),
</span></span><span class='shcb-loc'><span>    panel.background = element_rect(fill = <span class="hljs-string">"grey98"</span>),
</span></span><span class='shcb-loc'><span>    plot.background = element_rect(fill = <span class="hljs-string">"grey98"</span>),
</span></span><span class='shcb-loc'><span>    panel.spacing = unit(<span class="hljs-number">0.8</span>, <span class="hljs-string">"cm"</span>),
</span></span><span class='shcb-loc'><span>    legend.position = <span class="hljs-string">"none"</span>,
</span></span><span class='shcb-loc'><span>    legend.title = element_blank(),
</span></span><span class='shcb-loc'><span>    plot.title = element_text(size = <span class="hljs-number">14</span>, family = <span class="hljs-string">""</span>),
</span></span><span class='shcb-loc'><span>    plot.title.position = <span class="hljs-string">"plot"</span>,
</span></span><span class='shcb-loc'><span>    plot.subtitle = ggtext::element_markdown(
</span></span><span class='shcb-loc'><span>      lineheight = <span class="hljs-number">1.1</span>,
</span></span><span class='shcb-loc'><span>      family = <span class="hljs-string">"Arial Unicode MS"</span>,
</span></span><span class='shcb-loc'><span>      size = <span class="hljs-number">12</span>
</span></span><span class='shcb-loc'><span>    ),
</span></span><span class='shcb-loc'><span>    plot.caption = element_text(family = <span class="hljs-string">""</span>, size = <span class="hljs-number">8</span>, hjust = <span class="hljs-number">0</span>),
</span></span><span class='shcb-loc'><span>    plot.caption.position = <span class="hljs-string">"plot"</span>
</span></span><span class='shcb-loc'><span>  )
</span></span><span class='shcb-loc'><span>
</span></span><span class='shcb-loc'><span>annotation_df &lt;- data.frame(
</span></span><span class='shcb-loc'><span>  label = <span class="hljs-string">"Of all female managers,&lt;br&gt;about 62% are white,&lt;br&gt; and of all male managers,&lt;br&gt; about 65% are white"</span>,
</span></span><span class='shcb-loc'><span>  x = <span class="hljs-number">.36</span>,
</span></span><span class='shcb-loc'><span>  y = <span class="hljs-number">3</span>,
</span></span><span class='shcb-loc'><span>  gender = <span class="hljs-string">"Male"</span>,
</span></span><span class='shcb-loc'><span>  job_category = <span class="hljs-string">"Managers"</span>,
</span></span><span class='shcb-loc'><span>  race_short = <span class="hljs-string">"Asian"</span>
</span></span><span class='shcb-loc'><span>)
</span></span><span class='shcb-loc'><span>
</span></span><span class='shcb-loc'><span>curve_ann_df &lt;- data.frame(x = <span class="hljs-number">.55</span>, y = <span class="hljs-number">3</span>, xend = <span class="hljs-number">0.64</span>, yend = <span class="hljs-number">4.5</span>, gender = <span class="hljs-string">"Male"</span>, job_category = <span class="hljs-string">"Managers"</span>)
</span></span><span class='shcb-loc'><span>
</span></span><span class='shcb-loc'><span>curve_ann_tiny_line_df &lt;- data.frame(xmin = <span class="hljs-number">.61</span>, y = <span class="hljs-number">4.5</span>, xmax = <span class="hljs-number">0.67</span>, gender = <span class="hljs-string">"Male"</span>, job_category = <span class="hljs-string">"Managers"</span>)
</span></span><span class='shcb-loc'><span>
</span></span><span class='shcb-loc'><span>gender_job_cat_race_dot &lt;- gender_job_cat_race_dot +
</span></span><span class='shcb-loc'><span>  geom_curve(
</span></span><span class='shcb-loc'><span>    data = curve_ann_df,
</span></span><span class='shcb-loc'><span>    aes(x = x, y = y, xend = xend, yend = yend),
</span></span><span class='shcb-loc'><span>    curvature = <span class="hljs-number">.5</span>
</span></span><span class='shcb-loc'><span>  ) +
</span></span><span class='shcb-loc'><span>  geom_errorbar(
</span></span><span class='shcb-loc'><span>    data = curve_ann_tiny_line_df,
</span></span><span class='shcb-loc'><span>    aes(xmin = xmin, y = y, xmax = xmax),
</span></span><span class='shcb-loc'><span>    inherit.aes = <span class="hljs-literal">F</span>,
</span></span><span class='shcb-loc'><span>    width = <span class="hljs-number">0.45</span>
</span></span><span class='shcb-loc'><span>  ) +
</span></span><span class='shcb-loc'><span>  ggtext::geom_richtext(
</span></span><span class='shcb-loc'><span>    data = annotation_df,
</span></span><span class='shcb-loc'><span>    aes(x = x, y = y, label = label),
</span></span><span class='shcb-loc'><span>    size = rel(<span class="hljs-number">3.2</span>),
</span></span><span class='shcb-loc'><span>    hjust = <span class="hljs-number">0</span>,
</span></span><span class='shcb-loc'><span>    family = <span class="hljs-string">"sans"</span>,
</span></span><span class='shcb-loc'><span>    label.color = <span class="hljs-string">"grey98"</span>,
</span></span><span class='shcb-loc'><span>    fill = <span class="hljs-string">"grey98"</span>
</span></span><span class='shcb-loc'><span>  )
</span></span><span class='shcb-loc'><span>
</span></span><span class='shcb-loc'><span>gender_job_cat_race_dot
</span></span><span class='shcb-loc'><span><span class="hljs-comment">####################################################################################################</span>
</span></span><span class='shcb-loc'><span>
</span></span><span class='shcb-loc'><span>
</span></span><span class='shcb-loc'><span>
</span></span><span class='shcb-loc'><span><span class="hljs-comment">####################################################################################################</span>
</span></span><span class='shcb-loc'><span><span class="hljs-comment">## Dot charts of job categories and ethnicity/race distribution by gender</span>
</span></span><span class='shcb-loc'><span>gender_within_job_cat &lt;- it_data_clean %&gt;%
</span></span><span class='shcb-loc'><span>  group_by(race_short, job_category, gender) %&gt;%
</span></span><span class='shcb-loc'><span>  summarize(total = sum(count)) %&gt;%
</span></span><span class='shcb-loc'><span>  group_by(job_category) %&gt;%
</span></span><span class='shcb-loc'><span>  mutate(pct = total / sum(total))
</span></span><span class='shcb-loc'><span>
</span></span><span class='shcb-loc'><span>gender_within_job_cat &lt;- ungroup(gender_within_job_cat) %&gt;%
</span></span><span class='shcb-loc'><span>  mutate(
</span></span><span class='shcb-loc'><span>    race_short = factor(race_short, rev(ordered_lvls_race)),
</span></span><span class='shcb-loc'><span>    job_category = factor(job_category, ordered_lvls_job_cat)
</span></span><span class='shcb-loc'><span>  )
</span></span><span class='shcb-loc'><span>
</span></span><span class='shcb-loc'><span>gender_within_job_cat_dot_plt &lt;- ggplot(gender_within_job_cat, aes(y = race_short, x = pct, group = gender, fill = gender)) +
</span></span><span class='shcb-loc'><span>  geom_line(aes(group = race_short), color = <span class="hljs-string">"grey80"</span>) +
</span></span><span class='shcb-loc'><span>  geom_point(shape = <span class="hljs-number">21</span>, color = <span class="hljs-string">"grey80"</span>, size = <span class="hljs-number">4</span>, alpha = <span class="hljs-number">0.9</span>) +
</span></span><span class='shcb-loc'><span>  scale_fill_manual(values = c(<span class="hljs-string">"Female"</span> = <span class="hljs-string">"#8700f9"</span>, <span class="hljs-string">"Male"</span> = <span class="hljs-string">"#00c4aa"</span>)) +
</span></span><span class='shcb-loc'><span>  scale_x_continuous(labels = scales::percent_format()) +
</span></span><span class='shcb-loc'><span>  facet_wrap(job_category ~ ., ncol = <span class="hljs-number">1</span>)
</span></span><span class='shcb-loc'><span>
</span></span><span class='shcb-loc'><span>
</span></span><span class='shcb-loc'><span>gender_within_job_cat_dot_plt &lt;- gender_within_job_cat_dot_plt + ggtitle(
</span></span><span class='shcb-loc'><span>  label = <span class="hljs-string">"Job categories and ethnicity/race distribution by gender"</span>,
</span></span><span class='shcb-loc'><span>  subtitle = paste(
</span></span><span class='shcb-loc'><span>    <span class="hljs-string">"&lt;b style='color:#8700f9'&gt;\u25EF Female&lt;/b&gt;"</span>,
</span></span><span class='shcb-loc'><span>    <span class="hljs-string">"&lt;b style='color:#00c4aa'&gt;\u25EF Male&lt;/b&gt;"</span>
</span></span><span class='shcb-loc'><span>  )
</span></span><span class='shcb-loc'><span>) +
</span></span><span class='shcb-loc'><span>  theme_wsj() +
</span></span><span class='shcb-loc'><span>  labs(caption = <span class="hljs-string">"Source: Reveal, https://www.revealnews.org/topic/silicon-valley-diversity/"</span>) +
</span></span><span class='shcb-loc'><span>  theme(
</span></span><span class='shcb-loc'><span>    panel.background = element_rect(fill = <span class="hljs-string">"grey98"</span>),
</span></span><span class='shcb-loc'><span>    plot.background = element_rect(fill = <span class="hljs-string">"grey98"</span>),
</span></span><span class='shcb-loc'><span>    panel.spacing = unit(<span class="hljs-number">0.8</span>, <span class="hljs-string">"cm"</span>),
</span></span><span class='shcb-loc'><span>    legend.position = <span class="hljs-string">"none"</span>,
</span></span><span class='shcb-loc'><span>    legend.title = element_blank(),
</span></span><span class='shcb-loc'><span>    plot.title = element_text(size = <span class="hljs-number">14</span>, family = <span class="hljs-string">""</span>),
</span></span><span class='shcb-loc'><span>    plot.title.position = <span class="hljs-string">"plot"</span>,
</span></span><span class='shcb-loc'><span>    plot.subtitle = ggtext::element_markdown(
</span></span><span class='shcb-loc'><span>      lineheight = <span class="hljs-number">1.1</span>,
</span></span><span class='shcb-loc'><span>      family = <span class="hljs-string">"Arial Unicode MS"</span>,
</span></span><span class='shcb-loc'><span>      size = <span class="hljs-number">12</span>
</span></span><span class='shcb-loc'><span>    ),
</span></span><span class='shcb-loc'><span>    panel.grid.major = element_blank(),
</span></span><span class='shcb-loc'><span>    plot.caption = element_text(family = <span class="hljs-string">""</span>, size = <span class="hljs-number">8</span>, hjust = <span class="hljs-number">0</span>),
</span></span><span class='shcb-loc'><span>    plot.caption.position = <span class="hljs-string">"plot"</span>
</span></span><span class='shcb-loc'><span>  )
</span></span><span class='shcb-loc'><span>
</span></span><span class='shcb-loc'><span>
</span></span><span class='shcb-loc'><span>gender_within_job_cat_dot_plt &lt;- gender_within_job_cat_dot_plt +
</span></span><span class='shcb-loc'><span>  theme(
</span></span><span class='shcb-loc'><span>    strip.text.x = element_text(face = <span class="hljs-string">"bold"</span>, size = <span class="hljs-number">12</span>),
</span></span><span class='shcb-loc'><span>    strip.placement = <span class="hljs-string">"outside"</span>,
</span></span><span class='shcb-loc'><span>    strip.background = element_rect(fill = <span class="hljs-string">"grey98"</span>, color = <span class="hljs-string">"grey40"</span>)
</span></span><span class='shcb-loc'><span>  )
</span></span><span class='shcb-loc'><span>
</span></span><span class='shcb-loc'><span>gender_within_job_cat_dot_plt &lt;- gender_within_job_cat_dot_plt +
</span></span><span class='shcb-loc'><span>  geom_curve(
</span></span><span class='shcb-loc'><span>    data = data.frame(x = <span class="hljs-number">.1</span>, y = <span class="hljs-number">4</span>, xend = <span class="hljs-number">.18</span>, yend = <span class="hljs-number">2.5</span>, job_category = <span class="hljs-string">"Executives"</span>, gender = <span class="hljs-string">"Male"</span>, race_short = <span class="hljs-string">"Asian"</span>),
</span></span><span class='shcb-loc'><span>    aes(x = x, y = y, xend = xend, yend = yend),
</span></span><span class='shcb-loc'><span>    arrow = arrow(angle = <span class="hljs-number">20</span>, ends = <span class="hljs-string">"first"</span>, type = <span class="hljs-string">"closed"</span>, length = unit(<span class="hljs-number">0.1</span>, <span class="hljs-string">"inches"</span>))
</span></span><span class='shcb-loc'><span>  ) +
</span></span><span class='shcb-loc'><span>  geom_text(
</span></span><span class='shcb-loc'><span>    data = data.frame(
</span></span><span class='shcb-loc'><span>      label = <span class="hljs-string">"Of all the executives,\n4.5% are Asian women,\nand 16.3% are Asian men."</span>,
</span></span><span class='shcb-loc'><span>      x = <span class="hljs-number">.18</span>,
</span></span><span class='shcb-loc'><span>      y = <span class="hljs-number">2.5</span>,
</span></span><span class='shcb-loc'><span>      gender = <span class="hljs-string">"Male"</span>,
</span></span><span class='shcb-loc'><span>      job_category = <span class="hljs-string">"Executives"</span>,
</span></span><span class='shcb-loc'><span>      race_short = <span class="hljs-string">"Asian"</span>
</span></span><span class='shcb-loc'><span>    ),
</span></span><span class='shcb-loc'><span>    aes(x = x, y = y, label = label),
</span></span><span class='shcb-loc'><span>    hjust = <span class="hljs-number">0</span>
</span></span><span class='shcb-loc'><span>  )
</span></span><span class='shcb-loc'><span>
</span></span><span class='shcb-loc'><span>gender_within_job_cat_dot_plt
</span></span><span class='shcb-loc'><span><span class="hljs-comment">####################################################################################################</span>
</span></span></code></div><small class="shcb-language" id="shcb-language-29"><span class="shcb-language__label">Code language:</span> <span class="shcb-language__name">R</span> <span class="shcb-language__paren">(</span><span class="shcb-language__slug">r</span><span class="shcb-language__paren">)</span></small></pre><span class="tve-leads-two-step-trigger tl-2step-trigger-2626"></span><span class="tve-leads-two-step-trigger tl-2step-trigger-0"></span><p>The post <a rel="nofollow" href="https://nandeshwar.info/data-visualization/pie-chart-vs-bar-chart/">Pie Chart vs. Bar Chart</a> appeared first on <a rel="nofollow" href="https://nandeshwar.info">nandeshwar.info</a>.</p>
]]></content:encoded>
					
		
		
			</item>
		<item>
		<title>How to Create an Economist Data Visualization of US Map Using R</title>
		<link>https://nandeshwar.info/data-visualization/economist-data-visualization-us-map-using-r/</link>
					<comments>https://nandeshwar.info/data-visualization/economist-data-visualization-us-map-using-r/#comments</comments>
		
		<dc:creator><![CDATA[n.ashutosh]]></dc:creator>
		<pubDate>Thu, 16 Jul 2020 05:53:19 +0000</pubDate>
				<category><![CDATA[Data Visualization]]></category>
		<category><![CDATA[data visualization]]></category>
		<category><![CDATA[economist]]></category>
		<category><![CDATA[R]]></category>
		<guid isPermaLink="false">https://nandeshwar.info/?p=3441</guid>

					<description><![CDATA[<p>Map in R In this article, you will learn how to create a really cool data visualization that appeared in the Economist. This chart looks like a map, but instead of your typical filled in maps a.k.a. choropleths, you see an area plot where a state should be. This chart gives you a lot of [&#8230;]</p>
<p>The post <a rel="nofollow" href="https://nandeshwar.info/data-visualization/economist-data-visualization-us-map-using-r/">How to Create an Economist Data Visualization of US Map Using R</a> appeared first on <a rel="nofollow" href="https://nandeshwar.info">nandeshwar.info</a>.</p>
]]></description>
										<content:encoded><![CDATA[
<h2>Map in R</h2>



<p>In this article, you will learn how to create a really cool <a href="https://nandeshwar.info/data-visualization/nyt-wapo-data-visualization-r/">data visualization</a> that appeared in the Economist. This chart looks like a map, but instead of your typical filled in <a href="https://nandeshwar.info/data-visualization/wall-street-journal-data-visualization-r/">maps</a> a.k.a. choropleths, you see an area plot where a state should be. This chart gives you a lot of information in a small space. For example, you can see the changes in the number of cases by time. You can see the result of 2016 presidential election. And, you also see a legend to see the dates when key decisions were made.</p>



<figure class="wp-block-image"><img src="https://www.economist.com/img/b/1280/1250/90/sites/default/files/images/print-edition/20200530_FBC717.png" alt=""/><figcaption>The Original Economist Map Plot</figcaption></figure>



<p>I wanted to see whether I could re-create this chart in R. In this video, you will <a href="https://nandeshwar.info/data-science-2/deep-learning-tensorflow-r-tutorial/">learn about those steps</a>. And the great thing is about this script is you can modify it to create any map type of a plot. We could use fewer choropleths.</p>



<figure class="wp-block-image size-large"><img loading="lazy" width="1024" height="576" src="https://d2py08v4b28rs4.cloudfront.net/wp-content/uploads/Steps-create-us-map-r-ggplot-area-graphs-plot-1024x576.png" alt="steps to create a US map using ggplot and area graphs." class="wp-image-3557" srcset="https://d2py08v4b28rs4.cloudfront.net/wp-content/uploads/Steps-create-us-map-r-ggplot-area-graphs-plot-1024x576.png 1024w, https://d2py08v4b28rs4.cloudfront.net/wp-content/uploads/Steps-create-us-map-r-ggplot-area-graphs-plot-300x169.png 300w, https://d2py08v4b28rs4.cloudfront.net/wp-content/uploads/Steps-create-us-map-r-ggplot-area-graphs-plot-768x432.png 768w, https://d2py08v4b28rs4.cloudfront.net/wp-content/uploads/Steps-create-us-map-r-ggplot-area-graphs-plot-1536x864.png 1536w, https://d2py08v4b28rs4.cloudfront.net/wp-content/uploads/Steps-create-us-map-r-ggplot-area-graphs-plot.png 1920w" sizes="(max-width: 1024px) 100vw, 1024px" /><figcaption>Steps to create a US map with area graphs</figcaption></figure>



<p>Let’s get started.</p>



<p>First, let’s load the libraries.</p>


<pre class="wp-block-code" aria-describedby="shcb-language-30" data-shcb-language-name="R" data-shcb-language-slug="r"><div><code class="hljs language-r"><span class="hljs-keyword">library</span>(dplyr)
<span class="hljs-keyword">library</span>(ggplot2)
<span class="hljs-keyword">library</span>(tidyverse)
<span class="hljs-keyword">library</span>(ggthemes)
<span class="hljs-keyword">library</span>(scales)
<span class="hljs-keyword">library</span>(lubridate)
<span class="hljs-keyword">library</span>(readxl)</code></div><small class="shcb-language" id="shcb-language-30"><span class="shcb-language__label">Code language:</span> <span class="shcb-language__name">R</span> <span class="shcb-language__paren">(</span><span class="shcb-language__slug">r</span><span class="shcb-language__paren">)</span></small></pre>


<p>Then, load an Excel file containing the location of each of the states in an 8 X 11 grid with one square per state.</p>


<pre class="wp-block-code" aria-describedby="shcb-language-31" data-shcb-language-name="R" data-shcb-language-slug="r"><div><code class="hljs language-r shcb-wrap-lines">state_loc_temp_file &lt;- tempfile()
download.file(<span class="hljs-string">"https://www.dropbox.com/s/flwu7ahky9lhlji/us-states-grid-number.xlsx?raw=1"</span>, state_loc_temp_file)

state_loc &lt;- read_excel(path = state_loc_temp_file, col_names = <span class="hljs-literal">TRUE</span>, range = <span class="hljs-string">"A1:B51"</span>)
unlink(state_loc_temp_file)
head(state_loc)</code></div><small class="shcb-language" id="shcb-language-31"><span class="shcb-language__label">Code language:</span> <span class="shcb-language__name">R</span> <span class="shcb-language__paren">(</span><span class="shcb-language__slug">r</span><span class="shcb-language__paren">)</span></small></pre>


<pre><code>## # A tibble: 6 x 2
##   state boxnumber
##   <chr>     <dbl>
## 1 AL           73
## 2 AK           78
## 3 AZ           57
## 4 AR           60
## 5 CA           45
## 6 CO           47</pre>



<p>Then, let's generate some fake data for each of the states. The data frame contains a metric for each state for months from February to June. One metric per month per state.</p>


<pre class="wp-block-code" aria-describedby="shcb-language-32" data-shcb-language-name="R" data-shcb-language-slug="r"><div><code class="hljs language-r shcb-wrap-lines">imp_data_df &lt;- data.frame(state = rep(state.abb, <span class="hljs-number">5</span>),
                          stat_date = rep(seq.Date(from = as.Date(<span class="hljs-string">"2020-02-01"</span>), 
                                                   to = as.Date(<span class="hljs-string">"2020-06-01"</span>), 
                                                   by = <span class="hljs-string">"month"</span>), 
                                          each = <span class="hljs-number">50</span>),
                          stat = rnorm(<span class="hljs-number">250</span>, mean = <span class="hljs-number">1000</span>, sd = <span class="hljs-number">150</span>)) %&gt;%
  mutate(stat = ifelse(stat &gt; <span class="hljs-number">1000</span> | stat &lt; <span class="hljs-number">0</span>, 
                       sample(runif(<span class="hljs-number">100</span>), n(), replace = <span class="hljs-literal">TRUE</span>) * <span class="hljs-number">100</span>, 
                       stat))

head(imp_data_df)</code></div><small class="shcb-language" id="shcb-language-32"><span class="shcb-language__label">Code language:</span> <span class="shcb-language__name">R</span> <span class="shcb-language__paren">(</span><span class="shcb-language__slug">r</span><span class="shcb-language__paren">)</span></small></pre>


<pre><code>##   state  stat_date      stat
## 1    AL 2020-02-01 782.01548
## 2    AK 2020-02-01 925.37455
## 3    AZ 2020-02-01 937.37239
## 4    AR 2020-02-01  73.23171
## 5    CA 2020-02-01 968.69072
## 6    CO 2020-02-01  59.34878</pre>



<p>One feature of this chart is the legend below the chart that shows the changes in the metric with some key dates. Therefore, I generated a data frame with start dates, end dates, and a key decision date. Once a start date for each of the state was generated, I added some random days to get an end date as well as a key decision date.</p>


<pre class="wp-block-code" aria-describedby="shcb-language-33" data-shcb-language-name="R" data-shcb-language-slug="r"><div><code class="hljs language-r shcb-code-table shcb-line-numbers shcb-wrap-lines"><span class='shcb-loc'><span>decision_dates_df &lt;- data.frame(
</span></span><span class='shcb-loc'><span>  state = state.abb,
</span></span><span class='shcb-loc'><span>  start_dt = sample(
</span></span><span class='shcb-loc'><span>    x = as.Date(<span class="hljs-string">"2020-02-01"</span>) + months(<span class="hljs-number">0</span>:<span class="hljs-number">2</span>),
</span></span><span class='shcb-loc'><span>    size = <span class="hljs-number">50</span>,
</span></span><span class='shcb-loc'><span>    replace = <span class="hljs-literal">TRUE</span>
</span></span><span class='shcb-loc'><span>  )
</span></span><span class='shcb-loc'><span>) 
</span></span><span class='shcb-loc'><span>
</span></span><span class='shcb-loc'><span>decision_dates_df &lt;-  mutate(decision_dates_df,
</span></span><span class='shcb-loc'><span>    end_dt = start_dt + months(sample(<span class="hljs-number">2</span>:<span class="hljs-number">4</span>, size = <span class="hljs-number">1</span>)),
</span></span><span class='shcb-loc'><span>    end_dt = if_else(end_dt &gt; as.Date(<span class="hljs-string">"2020-06-01"</span>), as.Date(<span class="hljs-string">"2020-06-01"</span>), end_dt),
</span></span><span class='shcb-loc'><span>    easing_dt = start_dt + days(sample(
</span></span><span class='shcb-loc'><span>      <span class="hljs-number">20</span>:<span class="hljs-number">40</span>,
</span></span><span class='shcb-loc'><span>      size = <span class="hljs-number">1</span>, replace = <span class="hljs-literal">TRUE</span>
</span></span><span class='shcb-loc'><span>    ))
</span></span><span class='shcb-loc'><span>  )
</span></span><span class='shcb-loc'><span>
</span></span><span class='shcb-loc'><span>head(decision_dates_df)
</span></span></code></div><small class="shcb-language" id="shcb-language-33"><span class="shcb-language__label">Code language:</span> <span class="shcb-language__name">R</span> <span class="shcb-language__paren">(</span><span class="shcb-language__slug">r</span><span class="shcb-language__paren">)</span></small></pre>


<pre><code>##   state   start_dt     end_dt  easing_dt
## 1    AL 2020-03-01 2020-06-01 2020-04-03
## 2    AK 2020-03-01 2020-06-01 2020-04-03
## 3    AZ 2020-03-01 2020-06-01 2020-04-03
## 4    AR 2020-03-01 2020-06-01 2020-04-03
## 5    CA 2020-03-01 2020-06-01 2020-04-03
## 6    CO 2020-02-01 2020-06-01 2020-03-05</pre>



<p>Then, let's create a <a href="https://tibble.tidyverse.org/">tibble</a> to randomly assign the highlight colors for each of the states. I joined this table with the state locations we imported in the beginning.</p>


<pre class="wp-block-code" aria-describedby="shcb-language-34" data-shcb-language-name="R" data-shcb-language-slug="r"><div><code class="hljs language-r shcb-wrap-lines">plots_tbl &lt;- tibble(
  state = state.abb,
  highlight_color = sample(c(<span class="hljs-string">"#cd6b61"</span>, <span class="hljs-string">"#578ca4"</span>),
    size = <span class="hljs-number">50</span>,
    replace = <span class="hljs-literal">TRUE</span>
  )
)

head(plots_tbl)

plots_tbl &lt;- left_join(plots_tbl, state_loc)</code></div><small class="shcb-language" id="shcb-language-34"><span class="shcb-language__label">Code language:</span> <span class="shcb-language__name">R</span> <span class="shcb-language__paren">(</span><span class="shcb-language__slug">r</span><span class="shcb-language__paren">)</span></small></pre>


<pre><code>## # A tibble: 6 x 2
##   state highlight_color
##   <chr> <chr>          
## 1 AL    #cd6b61        
## 2 AK    #cd6b61        
## 3 AZ    #cd6b61        
## 4 AR    #cd6b61        
## 5 CA    #578ca4        
## 6 CO    #578ca4</pre>



<p>Let’s create a function to create the range bar based on these dates. This chart is dot plot or a point to show the easing date and a segment plot to create a line between start date and end date. Then I plot the two vertical lines at the end using segment annotations.</p>


<pre class="wp-block-code" aria-describedby="shcb-language-35" data-shcb-language-name="R" data-shcb-language-slug="r"><div><code class="hljs language-r shcb-code-table shcb-line-numbers shcb-wrap-lines"><span class='shcb-loc'><span>rangebar_plot &lt;- <span class="hljs-keyword">function</span>(df, xaxis_st_dt = as.Date(<span class="hljs-string">"2020-02-01"</span>), xaxis_end_dt = as.Date(<span class="hljs-string">"2020-06-01"</span>)) {
</span></span><span class='shcb-loc'><span>    p &lt;- ggplot(df, aes(x = easing_dt, y = <span class="hljs-number">1</span>)) +
</span></span><span class='shcb-loc'><span>      geom_point(size = <span class="hljs-number">0.1</span>) +
</span></span><span class='shcb-loc'><span>      scale_x_date(
</span></span><span class='shcb-loc'><span>        limits = c(xaxis_st_dt, xaxis_end_dt),
</span></span><span class='shcb-loc'><span>        date_breaks = <span class="hljs-string">"1 month"</span>
</span></span><span class='shcb-loc'><span>      )
</span></span><span class='shcb-loc'><span>
</span></span><span class='shcb-loc'><span>    p &lt;- p + geom_segment(
</span></span><span class='shcb-loc'><span>        data = df,
</span></span><span class='shcb-loc'><span>        aes(x = start_dt,
</span></span><span class='shcb-loc'><span>            xend = end_dt,
</span></span><span class='shcb-loc'><span>            y = <span class="hljs-number">1</span>,
</span></span><span class='shcb-loc'><span>            yend = <span class="hljs-number">1</span>),
</span></span><span class='shcb-loc'><span>        size = <span class="hljs-number">0.1</span>) +
</span></span><span class='shcb-loc'><span>      scale_y_continuous(limits = c(<span class="hljs-number">0.98</span>, <span class="hljs-number">1.02</span>), expand = c(<span class="hljs-number">0</span>, <span class="hljs-number">0</span>))
</span></span><span class='shcb-loc'><span>
</span></span><span class='shcb-loc'><span>    segment_st_pos &lt;- <span class="hljs-number">0.99</span>
</span></span><span class='shcb-loc'><span>
</span></span><span class='shcb-loc'><span>    p &lt;- p + annotate(
</span></span><span class='shcb-loc'><span>        <span class="hljs-string">"segment"</span>,
</span></span><span class='shcb-loc'><span>        x = df$start_dt,
</span></span><span class='shcb-loc'><span>        xend = df$start_dt,
</span></span><span class='shcb-loc'><span>        y = segment_st_pos,
</span></span><span class='shcb-loc'><span>        yend = segment_st_pos + <span class="hljs-number">.02</span>,
</span></span><span class='shcb-loc'><span>        size = <span class="hljs-number">0.1</span>
</span></span><span class='shcb-loc'><span>      ) +
</span></span><span class='shcb-loc'><span>      annotate(
</span></span><span class='shcb-loc'><span>        <span class="hljs-string">"segment"</span>,
</span></span><span class='shcb-loc'><span>        x = df$end_dt,
</span></span><span class='shcb-loc'><span>        xend = df$end_dt,
</span></span><span class='shcb-loc'><span>        y = segment_st_pos,
</span></span><span class='shcb-loc'><span>        yend = segment_st_pos + <span class="hljs-number">.02</span>,
</span></span><span class='shcb-loc'><span>        size = <span class="hljs-number">0.1</span>
</span></span><span class='shcb-loc'><span>      )
</span></span><span class='shcb-loc'><span>
</span></span><span class='shcb-loc'><span>    p &lt;- p + theme_void()
</span></span><span class='shcb-loc'><span>
</span></span><span class='shcb-loc'><span>    p
</span></span><span class='shcb-loc'><span>  }
</span></span><span class='shcb-loc'><span>
</span></span><span class='shcb-loc'><span>rangebar_plot(filter(decision_dates_df, state == <span class="hljs-string">'CA'</span>))
</span></span></code></div><small class="shcb-language" id="shcb-language-35"><span class="shcb-language__label">Code language:</span> <span class="shcb-language__name">R</span> <span class="shcb-language__paren">(</span><span class="shcb-language__slug">r</span><span class="shcb-language__paren">)</span></small></pre>


<figure class="wp-block-image size-large"><img loading="lazy" width="1024" height="731" src="https://d2py08v4b28rs4.cloudfront.net/wp-content/uploads/us-map-R-range-bar-plot-1-1-1024x731.png" alt="" class="wp-image-3553" srcset="https://d2py08v4b28rs4.cloudfront.net/wp-content/uploads/us-map-R-range-bar-plot-1-1-1024x731.png 1024w, https://d2py08v4b28rs4.cloudfront.net/wp-content/uploads/us-map-R-range-bar-plot-1-1-300x214.png 300w, https://d2py08v4b28rs4.cloudfront.net/wp-content/uploads/us-map-R-range-bar-plot-1-1-768x549.png 768w, https://d2py08v4b28rs4.cloudfront.net/wp-content/uploads/us-map-R-range-bar-plot-1-1.png 1344w" sizes="(max-width: 1024px) 100vw, 1024px" /><figcaption>Range bar type of a plot</figcaption></figure>



<p>Next, let's create a function to create an area graph which will also show the state name and a line with the highlight colors of our choosing as seen in the <a href="https://www.economist.com/briefing/2020/05/28/americas-covid-19-experience-is-tragic-but-not-that-exceptional">Economist article</a> for the 2016 presidential election results.</p>


<pre class="wp-block-code" aria-describedby="shcb-language-36" data-shcb-language-name="R" data-shcb-language-slug="r"><div><code class="hljs language-r shcb-code-table shcb-line-numbers shcb-wrap-lines"><span class='shcb-loc'><span>state_plot &lt;- <span class="hljs-keyword">function</span>(df, 
</span></span><span class='shcb-loc'><span>                       highlight_color = <span class="hljs-string">"blue"</span>, 
</span></span><span class='shcb-loc'><span>                       xaxis_st_dt = as.Date(<span class="hljs-string">"2020-02-01"</span>), 
</span></span><span class='shcb-loc'><span>                       xaxis_end_dt = as.Date(<span class="hljs-string">"2020-06-01"</span>)) {
</span></span><span class='shcb-loc'><span>    g &lt;- ggplot(df, aes(x = stat_date, y = stat)) +
</span></span><span class='shcb-loc'><span>      geom_area(fill = <span class="hljs-string">"#559ab7"</span>) +
</span></span><span class='shcb-loc'><span>      scale_y_continuous(limits = c(<span class="hljs-number">0</span>, <span class="hljs-number">1100</span>), expand = c(<span class="hljs-number">0</span>, <span class="hljs-number">0</span>)) +
</span></span><span class='shcb-loc'><span>      scale_x_date(limits = c(xaxis_st_dt, xaxis_end_dt),
</span></span><span class='shcb-loc'><span>                   date_breaks = <span class="hljs-string">"1 month"</span>) 
</span></span><span class='shcb-loc'><span>    
</span></span><span class='shcb-loc'><span>    g &lt;- g +
</span></span><span class='shcb-loc'><span>      annotate(
</span></span><span class='shcb-loc'><span>        <span class="hljs-string">"text"</span>,
</span></span><span class='shcb-loc'><span>        x = as.Date(<span class="hljs-string">"2020-02-01"</span>),
</span></span><span class='shcb-loc'><span>        y = <span class="hljs-number">1070</span>,
</span></span><span class='shcb-loc'><span>        label = unique(df$state),
</span></span><span class='shcb-loc'><span>        size = <span class="hljs-number">1.5</span>,
</span></span><span class='shcb-loc'><span>        color = highlight_color,
</span></span><span class='shcb-loc'><span>        hjust = <span class="hljs-number">0</span>,
</span></span><span class='shcb-loc'><span>        vjust = <span class="hljs-number">1</span>,
</span></span><span class='shcb-loc'><span>        fontface = <span class="hljs-string">"bold"</span>
</span></span><span class='shcb-loc'><span>      ) 
</span></span><span class='shcb-loc'><span>    
</span></span><span class='shcb-loc'><span>    g &lt;- g +
</span></span><span class='shcb-loc'><span>      geom_hline(yintercept = <span class="hljs-literal">Inf</span>,
</span></span><span class='shcb-loc'><span>                 size = <span class="hljs-number">0.3</span>,
</span></span><span class='shcb-loc'><span>                 color = highlight_color) 
</span></span><span class='shcb-loc'><span>    
</span></span><span class='shcb-loc'><span>    g &lt;- g +
</span></span><span class='shcb-loc'><span>      theme(
</span></span><span class='shcb-loc'><span>        axis.ticks.length.x = unit(<span class="hljs-number">1.3</span>, <span class="hljs-string">"points"</span>),
</span></span><span class='shcb-loc'><span>        axis.ticks.x = element_line(color = <span class="hljs-string">"#8aa6b6"</span>, size = <span class="hljs-number">0.2</span>),
</span></span><span class='shcb-loc'><span>        axis.line.x = element_line(color = <span class="hljs-string">"#8aa6b6"</span>, size = <span class="hljs-number">0.2</span>),
</span></span><span class='shcb-loc'><span>        axis.text = element_blank(),
</span></span><span class='shcb-loc'><span>        axis.title = element_blank(),
</span></span><span class='shcb-loc'><span>        panel.background = element_rect(fill = <span class="hljs-string">"#d5e4eb"</span>, linetype = <span class="hljs-number">0</span>),
</span></span><span class='shcb-loc'><span>        panel.border = element_blank(),
</span></span><span class='shcb-loc'><span>        plot.background = element_rect(fill = <span class="hljs-string">"#d5e4eb"</span>,
</span></span><span class='shcb-loc'><span>                                       color = <span class="hljs-literal">NA</span>),
</span></span><span class='shcb-loc'><span>        panel.grid = element_blank(),
</span></span><span class='shcb-loc'><span>        axis.ticks.y = element_blank()
</span></span><span class='shcb-loc'><span>      )
</span></span><span class='shcb-loc'><span>    
</span></span><span class='shcb-loc'><span>    g &lt;- g + geom_hline(yintercept = <span class="hljs-number">500</span>,
</span></span><span class='shcb-loc'><span>                        size = <span class="hljs-number">0.1</span>,
</span></span><span class='shcb-loc'><span>                        color = <span class="hljs-string">"white"</span>)
</span></span><span class='shcb-loc'><span>    <span class="hljs-keyword">return</span>(g)
</span></span><span class='shcb-loc'><span>  }
</span></span><span class='shcb-loc'><span>
</span></span><span class='shcb-loc'><span>state_plot(filter(imp_data_df, state == <span class="hljs-string">"CA"</span>))
</span></span></code></div><small class="shcb-language" id="shcb-language-36"><span class="shcb-language__label">Code language:</span> <span class="shcb-language__name">R</span> <span class="shcb-language__paren">(</span><span class="shcb-language__slug">r</span><span class="shcb-language__paren">)</span></small></pre>


<figure class="wp-block-image size-large"><img loading="lazy" width="1024" height="731" src="https://d2py08v4b28rs4.cloudfront.net/wp-content/uploads/us-map-plot-economist-area-graph-R-1-1-1024x731.png" alt="area graph" class="wp-image-3555" srcset="https://d2py08v4b28rs4.cloudfront.net/wp-content/uploads/us-map-plot-economist-area-graph-R-1-1-1024x731.png 1024w, https://d2py08v4b28rs4.cloudfront.net/wp-content/uploads/us-map-plot-economist-area-graph-R-1-1-300x214.png 300w, https://d2py08v4b28rs4.cloudfront.net/wp-content/uploads/us-map-plot-economist-area-graph-R-1-1-768x549.png 768w, https://d2py08v4b28rs4.cloudfront.net/wp-content/uploads/us-map-plot-economist-area-graph-R-1-1.png 1344w" sizes="(max-width: 1024px) 100vw, 1024px" /><figcaption>Area graph</figcaption></figure>



<p>Next, rather than plotting everything, we actually store the plots in the tibble we created earlier. This way, we still retain each state, its location in the grid and the associated plot in a tiblle that we can manipulate. </p>



<p>We use the <a href="https://purrr.tidyverse.org/reference/map2.html">map2 function</a> to iterate over the state and the highlight color to create the area plot, the error bar plot, and put them into a plot using the plot_grids function from the <a href="https://wilkelab.org/cowplot/articles/introduction.html">cowplot package</a>. We could have used a loop here, but mutate and map2 let us create a variable to store the ggplot object.</p>


<pre class="wp-block-code" aria-describedby="shcb-language-37" data-shcb-language-name="R" data-shcb-language-slug="r"><div><code class="hljs language-r shcb-code-table shcb-line-numbers shcb-wrap-lines"><span class='shcb-loc'><span>plots_tbl &lt;- mutate(plots_tbl,
</span></span><span class='shcb-loc'><span>                    plots = map2(.x = state, .y = highlight_color, 
</span></span><span class='shcb-loc'><span>                                 .f = <span class="hljs-keyword">function</span>(x, y) { 
</span></span><span class='shcb-loc'><span>                                   st_plt &lt;- state_plot(filter(imp_data_df, state == x), 
</span></span><span class='shcb-loc'><span>                                                        highlight_color = y)
</span></span><span class='shcb-loc'><span>                                   rng_plt &lt;- rangebar_plot(filter(decision_dates_df, state == x))
</span></span><span class='shcb-loc'><span>                                   cowplot::plot_grid(st_plt, rng_plt, nrow = <span class="hljs-number">2</span>,
</span></span><span class='shcb-loc'><span>                                                      labels = <span class="hljs-literal">NULL</span>,
</span></span><span class='shcb-loc'><span>                                                      align = <span class="hljs-string">"v"</span>,
</span></span><span class='shcb-loc'><span>                                                      axis = <span class="hljs-string">"t"</span>,
</span></span><span class='shcb-loc'><span>                                                      rel_heights = c(<span class="hljs-number">5</span>, <span class="hljs-number">1</span>))
</span></span><span class='shcb-loc'><span>                                   }
</span></span><span class='shcb-loc'><span>                                 ))
</span></span><span class='shcb-loc'><span>
</span></span><span class='shcb-loc'><span>head(plots_tbl)
</span></span></code></div><small class="shcb-language" id="shcb-language-37"><span class="shcb-language__label">Code language:</span> <span class="shcb-language__name">R</span> <span class="shcb-language__paren">(</span><span class="shcb-language__slug">r</span><span class="shcb-language__paren">)</span></small></pre>


<pre><code>## # A tibble: 6 x 4
##   state highlight_color boxnumber plots 
##   <chr> <chr>               <dbl> <list>
## 1 AL    #cd6b61                73 <gg>  
## 2 AK    #cd6b61                78 <gg>  
## 3 AZ    #cd6b61                57 <gg>  
## 4 AR    #cd6b61                60 <gg>  
## 5 CA    #578ca4                45 <gg>  
## 6 CO    #578ca4                47 <gg></pre>



<p>Next, let’s create the legend plot that’s explains the area plot. Since it has additional information, we can’t use the previous function and we need to create a separate plot.</p>


<pre class="wp-block-code" aria-describedby="shcb-language-38" data-shcb-language-name="R" data-shcb-language-slug="r"><div><code class="hljs language-r shcb-code-table shcb-line-numbers shcb-wrap-lines"><span class='shcb-loc'><span>lp &lt;- ggplot(filter(imp_data_df, state == <span class="hljs-string">"CA"</span>), aes(x = stat_date, y = stat)) 
</span></span><span class='shcb-loc'><span>lp &lt;- lp + geom_area(fill = <span class="hljs-string">"#559ab7"</span>) +
</span></span><span class='shcb-loc'><span>  scale_y_continuous(limits = c(<span class="hljs-number">0</span>, <span class="hljs-number">1100</span>), expand = c(<span class="hljs-number">0</span>, <span class="hljs-number">0</span>), labels = comma, position = <span class="hljs-string">"right"</span>) +
</span></span><span class='shcb-loc'><span>  scale_x_date(breaks = as.Date(<span class="hljs-string">"2020-06-01"</span>) - months(<span class="hljs-number">0</span>:<span class="hljs-number">4</span>), 
</span></span><span class='shcb-loc'><span>               labels = rev(c(<span class="hljs-string">"F"</span>, <span class="hljs-string">"M"</span>, <span class="hljs-string">"A"</span>, <span class="hljs-string">"M"</span>, <span class="hljs-string">"Jun"</span>)),
</span></span><span class='shcb-loc'><span>               expand = c(<span class="hljs-number">0</span>, <span class="hljs-number">0</span>)) 
</span></span><span class='shcb-loc'><span>lp &lt;- lp + geom_hline(yintercept = <span class="hljs-literal">Inf</span>, size = <span class="hljs-number">0.3</span>, color = <span class="hljs-string">'#578ca4'</span>) 
</span></span><span class='shcb-loc'><span>
</span></span><span class='shcb-loc'><span>lp &lt;- lp + theme(axis.ticks.length.x = unit(<span class="hljs-number">1.5</span>, <span class="hljs-string">"points"</span>),
</span></span><span class='shcb-loc'><span>                 axis.ticks.x = element_line(color = <span class="hljs-string">"#8aa6b6"</span>, size = <span class="hljs-number">0.2</span>),
</span></span><span class='shcb-loc'><span>                 axis.line.x = element_line(color = <span class="hljs-string">"#8aa6b6"</span>, size = <span class="hljs-number">0.2</span>),
</span></span><span class='shcb-loc'><span>                 axis.text = element_text(color = <span class="hljs-string">"black"</span>, size = <span class="hljs-number">2.2</span>),
</span></span><span class='shcb-loc'><span>                 axis.title = element_blank(),
</span></span><span class='shcb-loc'><span>                 panel.background = element_rect(fill = <span class="hljs-string">"#d5e4eb"</span>, linetype = <span class="hljs-number">0</span>),
</span></span><span class='shcb-loc'><span>                 panel.border = element_blank(),
</span></span><span class='shcb-loc'><span>                 plot.background = element_rect(
</span></span><span class='shcb-loc'><span>                   fill = <span class="hljs-string">"#d5e4eb"</span>,
</span></span><span class='shcb-loc'><span>                   colour = <span class="hljs-literal">NA</span>
</span></span><span class='shcb-loc'><span>                   ),
</span></span><span class='shcb-loc'><span>                 panel.grid = element_blank(),
</span></span><span class='shcb-loc'><span>                 axis.ticks.y = element_blank())
</span></span><span class='shcb-loc'><span>
</span></span><span class='shcb-loc'><span>lp &lt;- lp + geom_hline(yintercept = <span class="hljs-number">500</span>, size = <span class="hljs-number">0.1</span>, color = <span class="hljs-string">"white"</span>)
</span></span><span class='shcb-loc'><span>
</span></span><span class='shcb-loc'><span>lp
</span></span></code></div><small class="shcb-language" id="shcb-language-38"><span class="shcb-language__label">Code language:</span> <span class="shcb-language__name">R</span> <span class="shcb-language__paren">(</span><span class="shcb-language__slug">r</span><span class="shcb-language__paren">)</span></small></pre>


<div class="wp-block-image"><figure class="aligncenter size-large is-resized"><img loading="lazy" src="https://d2py08v4b28rs4.cloudfront.net/wp-content/uploads/us-map-plot-economist-R-range-bar-legend-1-1024x731.png" alt="" class="wp-image-3444" width="498" height="355" srcset="https://d2py08v4b28rs4.cloudfront.net/wp-content/uploads/us-map-plot-economist-R-range-bar-legend-1-1024x731.png 1024w, https://d2py08v4b28rs4.cloudfront.net/wp-content/uploads/us-map-plot-economist-R-range-bar-legend-1-300x214.png 300w, https://d2py08v4b28rs4.cloudfront.net/wp-content/uploads/us-map-plot-economist-R-range-bar-legend-1-768x549.png 768w, https://d2py08v4b28rs4.cloudfront.net/wp-content/uploads/us-map-plot-economist-R-range-bar-legend-1.png 1344w" sizes="(max-width: 498px) 100vw, 498px" /><figcaption>Range bar legend</figcaption></figure></div>



<p>Next, let’s create the legend plot for the highlight color. In this case, we are creating a legend plot for pizza preference.</p>


<pre class="wp-block-code" aria-describedby="shcb-language-39" data-shcb-language-name="R" data-shcb-language-slug="r"><div><code class="hljs language-r shcb-code-table shcb-line-numbers shcb-wrap-lines"><span class='shcb-loc'><span>pizza_pref &lt;- data.frame(y = <span class="hljs-number">1</span>,
</span></span><span class='shcb-loc'><span>                         x = <span class="hljs-number">0</span>,
</span></span><span class='shcb-loc'><span>                         label = c(<span class="hljs-string">"Thin"</span>, <span class="hljs-string">"Deep"</span>),
</span></span><span class='shcb-loc'><span>                         color = c(<span class="hljs-string">'#cd6b61'</span>, <span class="hljs-string">'#578ca4'</span>),
</span></span><span class='shcb-loc'><span>                         stringsAsFactors = <span class="hljs-literal">FALSE</span>)
</span></span><span class='shcb-loc'><span>
</span></span><span class='shcb-loc'><span>pizza_pref_plot_fn &lt;- <span class="hljs-keyword">function</span>(what_type) {
</span></span><span class='shcb-loc'><span>  df &lt;- filter(pizza_pref, label == what_type)
</span></span><span class='shcb-loc'><span>  label_color &lt;- df$color
</span></span><span class='shcb-loc'><span>  
</span></span><span class='shcb-loc'><span>  g &lt;- ggplot(data = df,
</span></span><span class='shcb-loc'><span>              aes(x = x, y = y, label = label, color = label)) +
</span></span><span class='shcb-loc'><span>    geom_segment(aes(xend = <span class="hljs-number">1</span>, yend = y), size = <span class="hljs-number">0.3</span>) 
</span></span><span class='shcb-loc'><span>  g &lt;- g + geom_text(size = <span class="hljs-number">1.5</span>, color = label_color,
</span></span><span class='shcb-loc'><span>                     hjust = <span class="hljs-number">0</span>,
</span></span><span class='shcb-loc'><span>                     vjust = <span class="hljs-number">1.5</span>,
</span></span><span class='shcb-loc'><span>                     fontface = <span class="hljs-string">"bold"</span>) 
</span></span><span class='shcb-loc'><span>  g &lt;- g + scale_color_manual(values = label_color) +
</span></span><span class='shcb-loc'><span>    theme_void() +
</span></span><span class='shcb-loc'><span>    theme(legend.position = <span class="hljs-string">"none"</span>)
</span></span><span class='shcb-loc'><span>  
</span></span><span class='shcb-loc'><span>  g
</span></span><span class='shcb-loc'><span>}
</span></span><span class='shcb-loc'><span>
</span></span><span class='shcb-loc'><span>pizza_pref_plot_fn(what_type = <span class="hljs-string">"Deep"</span>)
</span></span></code></div><small class="shcb-language" id="shcb-language-39"><span class="shcb-language__label">Code language:</span> <span class="shcb-language__name">R</span> <span class="shcb-language__paren">(</span><span class="shcb-language__slug">r</span><span class="shcb-language__paren">)</span></small></pre>


<div class="wp-block-image"><figure class="aligncenter size-large is-resized"><img loading="lazy" src="https://d2py08v4b28rs4.cloudfront.net/wp-content/uploads/us-map-plot-economist-R-line-legend-1-1-1024x731.png" alt="" class="wp-image-3446" width="512" height="366" srcset="https://d2py08v4b28rs4.cloudfront.net/wp-content/uploads/us-map-plot-economist-R-line-legend-1-1-1024x731.png 1024w, https://d2py08v4b28rs4.cloudfront.net/wp-content/uploads/us-map-plot-economist-R-line-legend-1-1-300x214.png 300w, https://d2py08v4b28rs4.cloudfront.net/wp-content/uploads/us-map-plot-economist-R-line-legend-1-1-768x549.png 768w, https://d2py08v4b28rs4.cloudfront.net/wp-content/uploads/us-map-plot-economist-R-line-legend-1-1.png 1344w" sizes="(max-width: 512px) 100vw, 512px" /><figcaption>Line Legend</figcaption></figure></div>



<p>We need to create another legend for the error bar type of a plot, which shows the start, end and the easing dates.</p>


<pre class="wp-block-code" aria-describedby="shcb-language-40" data-shcb-language-name="R" data-shcb-language-slug="r"><div><code class="hljs language-r shcb-code-table shcb-line-numbers shcb-wrap-lines"><span class='shcb-loc'><span>range_bar_legend_plt &lt;- filter(decision_dates_df, state == <span class="hljs-string">'WY'</span>) %&gt;% {
</span></span><span class='shcb-loc'><span>  rangebar_plot(df = .) + 
</span></span><span class='shcb-loc'><span>    scale_x_date(limits = c(as.Date(<span class="hljs-string">"2020-02-01"</span>), as.Date(<span class="hljs-string">"2020-07-01"</span>)), date_breaks = <span class="hljs-string">"1 month"</span>) + 
</span></span><span class='shcb-loc'><span>    annotate(<span class="hljs-string">"text"</span>, x = .$start_dt, label = <span class="hljs-string">"Start Date"</span>, y = <span class="hljs-number">1</span>, hjust = <span class="hljs-number">1</span>, size = <span class="hljs-number">1</span>) + 
</span></span><span class='shcb-loc'><span>    annotate(<span class="hljs-string">"text"</span>, x = .$end_dt, label = <span class="hljs-string">"End Date"</span>, y = <span class="hljs-number">1</span>, hjust = <span class="hljs-number">0</span>, size = <span class="hljs-number">1</span>)  + 
</span></span><span class='shcb-loc'><span>    annotate(<span class="hljs-string">"text"</span>, x = .$easing_dt, label = <span class="hljs-string">"Easing starts"</span>, y = <span class="hljs-number">1</span>, vjust = <span class="hljs-number">1.2</span>, size = <span class="hljs-number">1</span>)
</span></span><span class='shcb-loc'><span>}
</span></span><span class='shcb-loc'><span>  
</span></span><span class='shcb-loc'><span>range_bar_legend_plt
</span></span></code></div><small class="shcb-language" id="shcb-language-40"><span class="shcb-language__label">Code language:</span> <span class="shcb-language__name">R</span> <span class="shcb-language__paren">(</span><span class="shcb-language__slug">r</span><span class="shcb-language__paren">)</span></small></pre>


<figure class="wp-block-image size-large is-resized"><img loading="lazy" src="https://d2py08v4b28rs4.cloudfront.net/wp-content/uploads/us-map-plot-economist-R-range-bar-legend-1-2-1024x731.png" alt="" class="wp-image-3448" width="512" height="366" srcset="https://d2py08v4b28rs4.cloudfront.net/wp-content/uploads/us-map-plot-economist-R-range-bar-legend-1-2-1024x731.png 1024w, https://d2py08v4b28rs4.cloudfront.net/wp-content/uploads/us-map-plot-economist-R-range-bar-legend-1-2-300x214.png 300w, https://d2py08v4b28rs4.cloudfront.net/wp-content/uploads/us-map-plot-economist-R-range-bar-legend-1-2-768x549.png 768w, https://d2py08v4b28rs4.cloudfront.net/wp-content/uploads/us-map-plot-economist-R-range-bar-legend-1-2.png 1344w" sizes="(max-width: 512px) 100vw, 512px" /><figcaption> Range bar legend</figcaption></figure>



<h2>Final Touches</h2>



<p>We’re getting to the final finishing touches now.</p>



<p>Let’s create an empty vector for the 8 X 11 grid cells.</p>


<pre class="wp-block-code" aria-describedby="shcb-language-41" data-shcb-language-name="R" data-shcb-language-slug="r"><div><code class="hljs language-r">my_list &lt;- rep(<span class="hljs-literal">NA</span>, <span class="hljs-number">88</span>)</code></div><small class="shcb-language" id="shcb-language-41"><span class="shcb-language__label">Code language:</span> <span class="shcb-language__name">R</span> <span class="shcb-language__paren">(</span><span class="shcb-language__slug">r</span><span class="shcb-language__paren">)</span></small></pre>


<p>Then, in this vector let’s store the plots in their respective locations.</p>


<pre class="wp-block-code" aria-describedby="shcb-language-42" data-shcb-language-name="R" data-shcb-language-slug="r"><div><code class="hljs language-r">my_list[plots_tbl$boxnumber] &lt;- plots_tbl$plots</code></div><small class="shcb-language" id="shcb-language-42"><span class="shcb-language__label">Code language:</span> <span class="shcb-language__name">R</span> <span class="shcb-language__paren">(</span><span class="shcb-language__slug">r</span><span class="shcb-language__paren">)</span></small></pre>


<p>Place the legend in the first box.</p>


<pre class="wp-block-code" aria-describedby="shcb-language-43" data-shcb-language-name="R" data-shcb-language-slug="r"><div><code class="hljs language-r">my_list[[<span class="hljs-number">1</span>]] &lt;- lp</code></div><small class="shcb-language" id="shcb-language-43"><span class="shcb-language__label">Code language:</span> <span class="shcb-language__name">R</span> <span class="shcb-language__paren">(</span><span class="shcb-language__slug">r</span><span class="shcb-language__paren">)</span></small></pre>


<p>And then the preference legends in the third and fourth box.</p>


<pre class="wp-block-code" aria-describedby="shcb-language-44" data-shcb-language-name="R" data-shcb-language-slug="r"><div><code class="hljs language-r">my_list[[<span class="hljs-number">3</span>]] &lt;- pizza_pref_plot_fn(what_type = <span class="hljs-string">"Thin"</span>)
my_list[[<span class="hljs-number">4</span>]] &lt;- pizza_pref_plot_fn(what_type = <span class="hljs-string">"Deep"</span>)</code></div><small class="shcb-language" id="shcb-language-44"><span class="shcb-language__label">Code language:</span> <span class="shcb-language__name">R</span> <span class="shcb-language__paren">(</span><span class="shcb-language__slug">r</span><span class="shcb-language__paren">)</span></small></pre>


<p>Now, as a final step, place all the subplots in a grid using the plot_grid function from cowplot.</p>


<pre class="wp-block-code" aria-describedby="shcb-language-45" data-shcb-language-name="R" data-shcb-language-slug="r"><div><code class="hljs language-r shcb-wrap-lines">gridded_plots &lt;- cowplot::plot_grid(plotlist = my_list, nrow = <span class="hljs-number">8</span>, ncol = <span class="hljs-number">11</span>)  
gridded_plots</code></div><small class="shcb-language" id="shcb-language-45"><span class="shcb-language__label">Code language:</span> <span class="shcb-language__name">R</span> <span class="shcb-language__paren">(</span><span class="shcb-language__slug">r</span><span class="shcb-language__paren">)</span></small></pre>


<figure class="wp-block-image size-large"><img loading="lazy" width="1024" height="731" src="https://d2py08v4b28rs4.cloudfront.net/wp-content/uploads/us-map-plot-economist-R-grid-1-1024x731.png" alt="map data visualization plot using R" class="wp-image-3449" srcset="https://d2py08v4b28rs4.cloudfront.net/wp-content/uploads/us-map-plot-economist-R-grid-1-1024x731.png 1024w, https://d2py08v4b28rs4.cloudfront.net/wp-content/uploads/us-map-plot-economist-R-grid-1-300x214.png 300w, https://d2py08v4b28rs4.cloudfront.net/wp-content/uploads/us-map-plot-economist-R-grid-1-768x549.png 768w, https://d2py08v4b28rs4.cloudfront.net/wp-content/uploads/us-map-plot-economist-R-grid-1.png 1344w" sizes="(max-width: 1024px) 100vw, 1024px" /><figcaption>Economist Map Plot Using R</figcaption></figure>



<p>Finally, let’s add title, subtitle and captions.</p>


<pre class="wp-block-code" aria-describedby="shcb-language-46" data-shcb-language-name="R" data-shcb-language-slug="r"><div><code class="hljs language-r shcb-code-table shcb-line-numbers shcb-wrap-lines"><span class='shcb-loc'><span>title_pos &lt;- <span class="hljs-number">0.01</span>
</span></span><span class='shcb-loc'><span>
</span></span><span class='shcb-loc'><span>
</span></span><span class='shcb-loc'><span>final_plot &lt;- cowplot::ggdraw(gridded_plots, ylim = c(-<span class="hljs-number">0.05</span>, <span class="hljs-number">1.1</span>)) +
</span></span><span class='shcb-loc'><span>  cowplot:: draw_plot(range_bar_legend_plt, x = <span class="hljs-number">.35</span>, y = <span class="hljs-number">.915</span>, width = <span class="hljs-number">.4</span>, height = <span class="hljs-number">0.04</span>) +
</span></span><span class='shcb-loc'><span>  cowplot::draw_label(<span class="hljs-string">"Important dates to remember"</span>, x = <span class="hljs-number">.48</span>, y = <span class="hljs-number">0.95</span>,  hjust = <span class="hljs-number">0</span>, vjust = <span class="hljs-number">0</span>, size = <span class="hljs-number">4</span>) +
</span></span><span class='shcb-loc'><span>  cowplot::draw_label(<span class="hljs-string">"States of play"</span>, x = title_pos, y = <span class="hljs-number">1.1</span>,  hjust = <span class="hljs-number">0</span>, vjust = <span class="hljs-number">1</span>, size = <span class="hljs-number">8</span>, fontface = <span class="hljs-string">"bold"</span>) +
</span></span><span class='shcb-loc'><span>  cowplot::draw_label(<span class="hljs-string">"Random data. Change people's opinion using this chart"</span>, x = title_pos, y = <span class="hljs-number">1.065</span>,  hjust = <span class="hljs-number">0</span>, vjust = <span class="hljs-number">1</span>, size = <span class="hljs-number">5.5</span>) +
</span></span><span class='shcb-loc'><span>  cowplot::draw_label(<span class="hljs-string">"https://www.nandeshwar.info. Generated using R"</span>, x = title_pos, y = -<span class="hljs-number">0.05</span>,  hjust = <span class="hljs-number">0</span>, vjust = -<span class="hljs-number">1</span>, size = <span class="hljs-number">4</span>) +
</span></span><span class='shcb-loc'><span>  cowplot::draw_label(<span class="hljs-string">"Pizza preference surveyed in 2019"</span>, x = <span class="hljs-number">.186</span>, y = <span class="hljs-number">0.95</span>,  hjust = <span class="hljs-number">0</span>, vjust = <span class="hljs-number">0</span>, size = <span class="hljs-number">4</span>)
</span></span><span class='shcb-loc'><span>
</span></span></code></div><small class="shcb-language" id="shcb-language-46"><span class="shcb-language__label">Code language:</span> <span class="shcb-language__name">R</span> <span class="shcb-language__paren">(</span><span class="shcb-language__slug">r</span><span class="shcb-language__paren">)</span></small></pre>


<p>And save it using the ggsave function! And done!</p>


<pre class="wp-block-code" aria-describedby="shcb-language-47" data-shcb-language-name="R" data-shcb-language-slug="r"><div><code class="hljs language-r shcb-code-table shcb-line-numbers shcb-wrap-lines"><span class='shcb-loc'><span>ggsave(plot = final_plot,
</span></span><span class='shcb-loc'><span>       filename = <span class="hljs-string">"my_us_map_plot.png"</span>,
</span></span><span class='shcb-loc'><span>       width = <span class="hljs-number">6</span>,
</span></span><span class='shcb-loc'><span>       height = <span class="hljs-number">4</span>, 
</span></span><span class='shcb-loc'><span>       bg = <span class="hljs-string">"#E5EBF0"</span>) <span class="hljs-comment"># changing the whole background color</span>
</span></span></code></div><small class="shcb-language" id="shcb-language-47"><span class="shcb-language__label">Code language:</span> <span class="shcb-language__name">R</span> <span class="shcb-language__paren">(</span><span class="shcb-language__slug">r</span><span class="shcb-language__paren">)</span></small></pre>


<div class="wp-block-image"><figure class="aligncenter size-large"><img loading="lazy" width="1024" height="683" src="https://d2py08v4b28rs4.cloudfront.net/wp-content/uploads/economist-data-visualization-how-map-R-1024x683.png" alt="Map plot created in R as seen in the Economist." class="wp-image-3452" srcset="https://d2py08v4b28rs4.cloudfront.net/wp-content/uploads/economist-data-visualization-how-map-R-1024x683.png 1024w, https://d2py08v4b28rs4.cloudfront.net/wp-content/uploads/economist-data-visualization-how-map-R-300x200.png 300w, https://d2py08v4b28rs4.cloudfront.net/wp-content/uploads/economist-data-visualization-how-map-R-768x512.png 768w, https://d2py08v4b28rs4.cloudfront.net/wp-content/uploads/economist-data-visualization-how-map-R-1536x1024.png 1536w, https://d2py08v4b28rs4.cloudfront.net/wp-content/uploads/economist-data-visualization-how-map-R.png 1800w" sizes="(max-width: 1024px) 100vw, 1024px" /></figure></div>



<p>Compare it with the original plot. Pretty close! I am happy with these results, because now this script can be used for other data sets and additional graphic design would be minimal.</p>



<p>There you have it! A map-like plot with an individual area plot for each state, along with some additional details for each of the plots, legends and other keys.</p>



<p>Let me know what you think. And also, if anything is unclear, please let me know.</p>



<h2>Video Walkthrough</h2>



<figure class="wp-block-embed aligncenter is-type-video is-provider-youtube wp-block-embed-youtube wp-embed-aspect-16-9 wp-has-aspect-ratio"><div class="wp-block-embed__wrapper">
<iframe loading="lazy" title="How to create US map plot as seen in the Economist using R" width="500" height="281" src="https://www.youtube.com/embed/vpuHrt8Z3bM?feature=oembed" frameborder="0" allow="accelerometer; autoplay; encrypted-media; gyroscope; picture-in-picture" allowfullscreen></iframe>
</div></figure>
<span class="tve-leads-two-step-trigger tl-2step-trigger-2626"></span><span class="tve-leads-two-step-trigger tl-2step-trigger-0"></span><p>The post <a rel="nofollow" href="https://nandeshwar.info/data-visualization/economist-data-visualization-us-map-using-r/">How to Create an Economist Data Visualization of US Map Using R</a> appeared first on <a rel="nofollow" href="https://nandeshwar.info">nandeshwar.info</a>.</p>
]]></content:encoded>
					
					<wfw:commentRss>https://nandeshwar.info/data-visualization/economist-data-visualization-us-map-using-r/feed/</wfw:commentRss>
			<slash:comments>1</slash:comments>
		
		
			</item>
		<item>
		<title>Survey of Predictive Analytics in Fundraising</title>
		<link>https://nandeshwar.info/data-science-2/survey-of-predictive-analytics-in-fundraising/</link>
		
		<dc:creator><![CDATA[n.ashutosh]]></dc:creator>
		<pubDate>Mon, 15 Jun 2020 01:08:25 +0000</pubDate>
				<category><![CDATA[Data Science]]></category>
		<category><![CDATA[fundraising]]></category>
		<category><![CDATA[analytics]]></category>
		<category><![CDATA[literature review]]></category>
		<guid isPermaLink="false">https://nandeshwar.info/?p=3415</guid>

					<description><![CDATA[<p>Introduction When computer science is making rapid advances, one may ask “what new knowledge can be gained by reviewing previous work?” Cataloging previous work offers many benefits: a) we can notice the gaps to build upon, b) we can sense the future direction of research, and c) we can learn what has worked and what [&#8230;]</p>
<p>The post <a rel="nofollow" href="https://nandeshwar.info/data-science-2/survey-of-predictive-analytics-in-fundraising/">Survey of Predictive Analytics in Fundraising</a> appeared first on <a rel="nofollow" href="https://nandeshwar.info">nandeshwar.info</a>.</p>
]]></description>
										<content:encoded><![CDATA[
<div class="wp-block-image"><figure class="aligncenter size-large is-resized"><img loading="lazy" src="https://d2py08v4b28rs4.cloudfront.net/wp-content/uploads/fundraising-analytics-publications-by-decades-1024x698.png" alt="" class="wp-image-3432" width="512" height="349" srcset="https://d2py08v4b28rs4.cloudfront.net/wp-content/uploads/fundraising-analytics-publications-by-decades-1024x698.png 1024w, https://d2py08v4b28rs4.cloudfront.net/wp-content/uploads/fundraising-analytics-publications-by-decades-300x204.png 300w, https://d2py08v4b28rs4.cloudfront.net/wp-content/uploads/fundraising-analytics-publications-by-decades-768x523.png 768w, https://d2py08v4b28rs4.cloudfront.net/wp-content/uploads/fundraising-analytics-publications-by-decades-1536x1047.png 1536w, https://d2py08v4b28rs4.cloudfront.net/wp-content/uploads/fundraising-analytics-publications-by-decades.png 1683w" sizes="(max-width: 512px) 100vw, 512px" /><figcaption>Fundraising analytics publications by decades</figcaption></figure></div>



<div class="wp-block-columns">
<div class="wp-block-column" style="flex-basis:100%">
<div id="introduction" class="section level1">
<h1>Introduction</h1>
<p>When computer science is making rapid advances, one may ask “what new knowledge can be gained by reviewing previous work?” Cataloging previous work offers many benefits: a) we can notice the gaps to build upon, b) we can sense the future direction of research, and c) we can learn what has worked and what hasn’t. In this paper, I hope to offer an extensive survey of analytics applied to nonprofit fundraising. Using this survey, I note patterns and trends, and present research ideas for future work. The paper has the following structure. First, a brief history of analytics. Then different analytics methods. Followed by a review of the literature in <a href="https://nandeshwar.info/fundraising-analytics-managers/">applied analytics in fundraising</a>. A summary of this review and future direction.</p>
</div>
<div id="review-of-analytics" class="section level1">
<h1>Review of Analytics</h1>
<p>It is easy to get distracted by the current hype of Artificial Intelligence (AI), but when looked carefully, we can see the meaningful methods and techniques to make sense of the available data and information. Statistical analyses involve collecting, analyzing, drawing conclusions from the available data <span class="citation">(Diez, Barr, and Cetinkaya-Rundel 2012)</span>. The field of statistics isn’t new. As <span class="citation">Fienberg (1992)</span> wrote in his review of statistics article, the classic probability theory was formed in the early 1700s, but the inference methods and statistical models were formed much later in the 1890s.</p>
<p>Advances in statistical research and computational power led to the first hype cycle of AI in the 1960s <span class="citation">(Liao, Chu, and Hsiao 2012)</span>. Later, data mining became popular as a means to uncover patterns of significance using modern algorithms. Researchers named it Knowledge Discovery in Databases (KDD): the complete process of finding useful insights <span class="citation">(Fayyad, Piatetsky-Shapiro, and Smyth 1996)</span>. Machine learning, a computer scientist’s way of saying pattern detection, surged in the early 2000s and now AI is back to the future. From a practitioner’s perspective, the differences among these terms and fields are now insignificant, but researchers in those fields care about these differences <span class="citation">(Mannila 1996)</span>. In the end, as <span class="citation">Fayyad, Piatetsky-Shapiro, and Smyth (1996)</span> commented, “The unifying goal [of these methods] is extracting high-level knowledge from low-level data in the context of large data sets.”</p>
<p>Although the latest developments in natural language processing (NLP), <a href="https://nandeshwar.info/data-science-2/natural-language-generation-with-r-python/">natural language generation (NLG)</a>, computer vision, and <a href="https://nandeshwar.info/data-science-2/deep-learning-tensorflow-r-tutorial/">deep-learning</a> help us with other tasks than solely discovering knowledge <span class="citation">(Young et al. 2018)</span>, we will find that the literature for nonprofit fundraising is focused on KDD. This makes sense because fundraising goes up when the right people are asked for the right amount. But in the future, we will see broader applications of data science, helping us automate tasks and increase productivity.</p>
<div id="methods-and-techniques-in-analytics" class="section level2">
<h2>Methods and Techniques in Analytics</h2>
<p>Since the <a href="https://nandeshwar.info/books/what-is-data-mining-analytics-data-science-and-how-to-learn-them/">field of analytics</a> is expansive, let’s review and categorize the common methods and techniques used in the field. I will use these categories while reviewing the research in nonprofit fundraising.</p>
<div id="descriptive-statistics" class="section level3">
<h3>Descriptive Statistics</h3>
<p>Descriptive statistics use standard formulas to calculate measures that reflect the data. Some of these measures include mean, median, standard deviation, frequency, proportions, and other exploratory analyses. These measures give us quick insights into the data. Often, these measures are supported by graphs, such as scatter plots, histograms, and box plots. Such graphs help us see the correlations and patterns in the data <span class="citation">(NIST/SEMATECH 2013)</span>.</p>
</div>
<div id="regression" class="section level3">
<h3>Regression</h3>
<p><a href="https://nandeshwar.info/data-mining-2/linear-regression-in-excel/">Linear regression</a> or least square methods estimate predictions by minimizing the sum of the differences between the actual data points and predicted value. As long as the parameter estimates can be multiplied to a variable (or its function) and these product terms can be added to form a function, we can use a linear regression – even if the function itself isn’t a straight line <span class="citation">(NIST/SEMATECH 2013)</span>. But when the parameters take a non-linear form, we can’t use linear regression and could use non-linear regression. When we estimate parameters to build a model for some data, this approach is called <em>parametric</em>. In contrast, in a <em>non-parametric</em> approach we estimate a function that follows the data closely <span class="citation">(James et al. 2013)</span>.</p>
<p>Regression methods can be used both for quantitative prediction (i.e.&nbsp;gift amount) as well as for predicting class probabilities (i.e.&nbsp;yes or no). Many approaches extend or build upon regression methods. In this paper, I have categorized them under regression. Some of these methods include logistic regression, Linear Discriminant Analysis (LDA), Generalized Additive Models (GAMs), Generalized Linear Models (GLMs), Ridge regression, Tobit regression, and Probit regression.</p>
</div>
<div id="classification" class="section level3">
<h3>Classification</h3>
<p>Classification methods predict the dependent variable into the different values of the dependent variable, such as “Yes” or “No.” These values are called <em>classes</em>. Although regression methods work on classifications problems, machine learning “divide-and-conquer” and “covering” techniques such as decision trees and rules are better equipped to handle missing values and noisy data <span class="citation">(Witten et al. 2016)</span>.</p>
</div>
<div id="clustering" class="section level3">
<h3>Clustering</h3>
<p>Clustering methods attempt to divide the data into <em>n</em> groups of similar data points. These methods are called unsupervised learning methods as they do not require a dependent variable. They work by finding center points for each of these groups and then mark all the data points close to these centers as part of these clusters <span class="citation">(James et al. 2013, 385)</span>.</p>
</div>
<div id="ensemble-methods" class="section level3">
<h3>Ensemble Methods</h3>
<p>There are two types of ensemble methods: a) comparison of many algorithms, and b) using predictions from many algorithms. The comparison of multiple algorithms helps analysts see which methods work well for their data sets. Comparison prevents the potential loss of prediction performance compared to the analyst’s preferred method. Predictions from multiple algorithms can outperform a single algorithm by using <em>stacking</em> methods or <em>super learners</em> <span class="citation">(Polley and van der Laan 2010; Polley, Rose, and van der Laan 2011)</span>.</p>
<p><span class="citation">Polley, Rose, and van der Laan (2011)</span> argue that <em>super learners</em> work well with real-life datasets because no single algorithm can accurately model the data, but combining different algorithms provide us better estimates. As <span class="citation">James et al. (2013)</span> note, “there is no free lunch in statistics: no one method dominates all others over all possible data sets.”</p>
</div>
</div>
</div>
<div id="literature-review" class="section level1">
<h1>Literature Review</h1>
<div id="previous-work" class="section level2">
<h2>Previous Work</h2>
<p><span class="citation">Lindahl and Conley (2002)</span> reviewed research and put it into two categories: “Motivational Studies” and “Predicting Alumni Giving.” The first category consists of work that studies why people choose to give. The second category includes research that identifies and test factors that could predict a person’s choice to give.</p>
<p>More recently, <span class="citation">Bekkers and Wiepking (2010)</span> reviewed more than 500 articles and categorized these works into eight topical areas. While these reviews summarized methods of philanthropy, this paper focuses on the uses of analytical methods.</p>
</div>
<div id="method" class="section level2">
<h2>Method</h2>
<p>I followed the methods and frameworks used in two popular review articles: “Educational data mining: A survey and a data mining-based analysis of recent works” <span class="citation">(Peña-Ayala 2014)</span> and “Data mining techniques and applications–A decade review from 2000 to 2011” <span class="citation">(Liao, Chu, and Hsiao 2012)</span>. Both papers used comprehensive methods to collect and review the published works in data mining. Like their approaches, I started with these broad search terms in Google Scholar:</p>

<pre class="latex"><code>("data mining" OR analytics OR "machine learning" OR "data science" OR
 clustering OR statistics OR predictive) AND 
 (nonprofit OR fundraising OR fund-raising OR non-profit OR charity OR 
 donation OR philanthropy)</code></pre><code>
<p>I filtered the results from these searches and used Google Scholar’s citations feature to search for other papers that cited these works. Additionally, I used <em>Publish or Perish</em> software <span class="citation">(Harzing 2007)</span> to run searches in Scoups and Microsoft Academic search databases as seen in Figure @ref(fig:pubperscreen). In the next phase, I looked at other cited works within these results.</p>

<p class="caption">
Publish or Perish Search Screen
</p>

<p>After reading the results from this search, I decided whether to include the research as part of this review. The excluded work fell into these categories:</p>
<ul>
<li>Unpublished work</li>
<li>Undergraduate thesis</li>
<li>News articles</li>
<li>Company white papers</li>
<li>Research without analytics</li>
</ul>
<p>I ended up with 145 works. Table @ref(tab:publishedworkscats) shows how the works were published, and you can see that Ph.D.&nbsp;dissertations account for the second most publications.</p>
<table>
<caption>
Categories of Published Works
</caption>
<thead>
<tr>
<th style="text-align:left;">
Category
</th>
<th style="text-align:right;">
Published Works
</th>
</tr>
</thead>
<tbody>
<tr>
<td style="text-align:left;">
Article
</td>
<td style="text-align:right;">
78
</td>
</tr>
<tr>
<td style="text-align:left;">
PhD Thesis
</td>
<td style="text-align:right;">
51
</td>
</tr>
<tr>
<td style="text-align:left;">
In Proceedings
</td>
<td style="text-align:right;">
5
</td>
</tr>
<tr>
<td style="text-align:left;">
Book
</td>
<td style="text-align:right;">
4
</td>
</tr>
<tr>
<td style="text-align:left;">
In Collection
</td>
<td style="text-align:right;">
3
</td>
</tr>
<tr>
<td style="text-align:left;">
Masters Thesis
</td>
<td style="text-align:right;">
2
</td>
</tr>
<tr>
<td style="text-align:left;">
Tech Report
</td>
<td style="text-align:right;">
2
</td>
</tr>
</tbody>
</table>
</code></div><code>
<div id="limitations" class="section level2">
<h2>Limitations</h2>
<p>This review and its findings are limited because of my omissions and subjective bias. I omitted any work that I could not find digitally. Although USC library’s catalog is extensive and web searches can find many publications, I missed the digitally unavailable research (fewer than five). My subjective bias towards what qualifies as a study for this review likely excluded some publications. Also, I may have made errors with the search keywords. Finally, operator error: it is likely that I unintentionally missed some research.</p>
</div>
<div id="by-decades" class="section level2">
<h2>By Decades</h2>
<p>The first publication in this field is probably O’Connor’s dissertation on characteristics of alumni donors from 1961 <span class="citation">(O’Connor 1961)</span>. But you can see from Figure @ref(fig:decadeschart) that majority of the works were published between 2010 and 2019. Another noticeable trend, as seen in Figure @ref(fig:decadesfacetmethodplot), is the use of a wider set of techniques during the 2010-2019 period – though regression still leads the way.</p>

<p class="caption">
Total Number of Published Works by Decades
</p>


<p class="caption">
Methods by Decades.Note: Although regression techniques are oft-used methods, ensemble methods are finding greater use.
</p>

<p>Table @ref(tab:methodsdecadecount) shows the raw numbers of the various analytics methods used over time. You can see regression methods and descriptive statistics total more than 100 studies, followed by 10 ensemble studies. This suggests that researchers feel confident in the results from regression methods. Or, researchers from other fields, especially computer science, have not studied fundraising problems.</p>
<table class="table" style="font-size: 9px; margin-left: auto; margin-right: auto;">
<caption style="font-size: initial !important;">
Trends of Methods Used by Decade
</caption>
<thead>
<tr>
<th style="text-align:left;">
Analytics Method
</th>
<th style="text-align:right;">
1960-1969
</th>
<th style="text-align:right;">
1970-1979
</th>
<th style="text-align:right;">
1980-1989
</th>
<th style="text-align:right;">
1990-1999
</th>
<th style="text-align:right;">
2000-2009
</th>
<th style="text-align:right;">
2010-2019
</th>
<th style="text-align:right;">
Total
</th>
</tr>
</thead>
<tbody>
<tr>
<td style="text-align:left;">
CHAID
</td>
<td style="text-align:right;">
0
</td>
<td style="text-align:right;">
0
</td>
<td style="text-align:right;">
0
</td>
<td style="text-align:right;">
1
</td>
<td style="text-align:right;">
1
</td>
<td style="text-align:right;">
0
</td>
<td style="text-align:right;font-weight: bold;">
2
</td>
</tr>
<tr>
<td style="text-align:left;">
Clustering
</td>
<td style="text-align:right;">
0
</td>
<td style="text-align:right;">
0
</td>
<td style="text-align:right;">
0
</td>
<td style="text-align:right;">
1
</td>
<td style="text-align:right;">
2
</td>
<td style="text-align:right;">
4
</td>
<td style="text-align:right;font-weight: bold;">
7
</td>
</tr>
<tr>
<td style="text-align:left;">
Descriptive Statistics
</td>
<td style="text-align:right;">
1
</td>
<td style="text-align:right;">
9
</td>
<td style="text-align:right;">
11
</td>
<td style="text-align:right;">
3
</td>
<td style="text-align:right;">
5
</td>
<td style="text-align:right;">
5
</td>
<td style="text-align:right;font-weight: bold;">
34
</td>
</tr>
<tr>
<td style="text-align:left;">
Ensemble
</td>
<td style="text-align:right;">
0
</td>
<td style="text-align:right;">
0
</td>
<td style="text-align:right;">
0
</td>
<td style="text-align:right;">
0
</td>
<td style="text-align:right;">
1
</td>
<td style="text-align:right;">
9
</td>
<td style="text-align:right;font-weight: bold;">
10
</td>
</tr>
<tr>
<td style="text-align:left;">
Lifetime Value
</td>
<td style="text-align:right;">
0
</td>
<td style="text-align:right;">
0
</td>
<td style="text-align:right;">
0
</td>
<td style="text-align:right;">
2
</td>
<td style="text-align:right;">
1
</td>
<td style="text-align:right;">
0
</td>
<td style="text-align:right;font-weight: bold;">
3
</td>
</tr>
<tr>
<td style="text-align:left;">
Machine Learning
</td>
<td style="text-align:right;">
0
</td>
<td style="text-align:right;">
0
</td>
<td style="text-align:right;">
0
</td>
<td style="text-align:right;">
0
</td>
<td style="text-align:right;">
1
</td>
<td style="text-align:right;">
0
</td>
<td style="text-align:right;font-weight: bold;">
1
</td>
</tr>
<tr>
<td style="text-align:left;">
Markov Chains
</td>
<td style="text-align:right;">
0
</td>
<td style="text-align:right;">
0
</td>
<td style="text-align:right;">
1
</td>
<td style="text-align:right;">
1
</td>
<td style="text-align:right;">
0
</td>
<td style="text-align:right;">
0
</td>
<td style="text-align:right;font-weight: bold;">
2
</td>
</tr>
<tr>
<td style="text-align:left;">
Neural Networks
</td>
<td style="text-align:right;">
0
</td>
<td style="text-align:right;">
0
</td>
<td style="text-align:right;">
0
</td>
<td style="text-align:right;">
1
</td>
<td style="text-align:right;">
0
</td>
<td style="text-align:right;">
0
</td>
<td style="text-align:right;font-weight: bold;">
1
</td>
</tr>
<tr>
<td style="text-align:left;">
Other
</td>
<td style="text-align:right;">
0
</td>
<td style="text-align:right;">
1
</td>
<td style="text-align:right;">
0
</td>
<td style="text-align:right;">
0
</td>
<td style="text-align:right;">
2
</td>
<td style="text-align:right;">
2
</td>
<td style="text-align:right;font-weight: bold;">
5
</td>
</tr>
<tr>
<td style="text-align:left;">
Regression
</td>
<td style="text-align:right;">
0
</td>
<td style="text-align:right;">
2
</td>
<td style="text-align:right;">
7
</td>
<td style="text-align:right;">
17
</td>
<td style="text-align:right;">
21
</td>
<td style="text-align:right;">
30
</td>
<td style="text-align:right;font-weight: bold;">
77
</td>
</tr>
<tr>
<td style="text-align:left;">
Social Media
</td>
<td style="text-align:right;">
0
</td>
<td style="text-align:right;">
0
</td>
<td style="text-align:right;">
0
</td>
<td style="text-align:right;">
0
</td>
<td style="text-align:right;">
0
</td>
<td style="text-align:right;">
1
</td>
<td style="text-align:right;font-weight: bold;">
1
</td>
</tr>
<tr>
<td style="text-align:left;">
Support Vector Machines
</td>
<td style="text-align:right;">
0
</td>
<td style="text-align:right;">
0
</td>
<td style="text-align:right;">
0
</td>
<td style="text-align:right;">
0
</td>
<td style="text-align:right;">
0
</td>
<td style="text-align:right;">
1
</td>
<td style="text-align:right;font-weight: bold;">
1
</td>
</tr>
<tr>
<td style="text-align:left;">
Survival Analysis
</td>
<td style="text-align:right;">
0
</td>
<td style="text-align:right;">
0
</td>
<td style="text-align:right;">
0
</td>
<td style="text-align:right;">
0
</td>
<td style="text-align:right;">
1
</td>
<td style="text-align:right;">
0
</td>
<td style="text-align:right;font-weight: bold;">
1
</td>
</tr>
<tr>
<td style="text-align:left;font-weight: bold;">
Total
</td>
<td style="text-align:right;font-weight: bold;">
1
</td>
<td style="text-align:right;font-weight: bold;">
12
</td>
<td style="text-align:right;font-weight: bold;">
19
</td>
<td style="text-align:right;font-weight: bold;">
26
</td>
<td style="text-align:right;font-weight: bold;">
35
</td>
<td style="text-align:right;font-weight: bold;">
52
</td>
<td style="text-align:right;font-weight: bold;font-weight: bold;">
145
</td>
</tr>
</tbody>
</table>
</div></code></div>
</div>
</div>



<p></p>



<figure class="wp-block-image size-large"><img loading="lazy" width="1024" height="697" src="https://d2py08v4b28rs4.cloudfront.net/wp-content/uploads/fundraising-analytics-publications-by-decades-analytics-methods-1024x697.png" alt="" class="wp-image-3431" srcset="https://d2py08v4b28rs4.cloudfront.net/wp-content/uploads/fundraising-analytics-publications-by-decades-analytics-methods-1024x697.png 1024w, https://d2py08v4b28rs4.cloudfront.net/wp-content/uploads/fundraising-analytics-publications-by-decades-analytics-methods-300x204.png 300w, https://d2py08v4b28rs4.cloudfront.net/wp-content/uploads/fundraising-analytics-publications-by-decades-analytics-methods-768x523.png 768w, https://d2py08v4b28rs4.cloudfront.net/wp-content/uploads/fundraising-analytics-publications-by-decades-analytics-methods-1536x1045.png 1536w, https://d2py08v4b28rs4.cloudfront.net/wp-content/uploads/fundraising-analytics-publications-by-decades-analytics-methods.png 1906w" sizes="(max-width: 1024px) 100vw, 1024px" /><figcaption>Fundraising analytics publications by methods and decades</figcaption></figure>



<div id="by-method" class="section level2">
<h2>By Method</h2>
<div id="chaid" class="section level3">
<h3>CHAID</h3>
<p>CHAID is a decision tree learner, which <span class="citation">Liihe (1998)</span> used to study database marketing at UNICEF. <span class="citation">Denizard-Ramsamy and Medina-Borja (2008)</span> predicted financial vulnerability in non-profit organizations using CHAID; this is a rare paper as most of the studies in this review focus on donor identification.</p>
</div>
<div id="clustering-1" class="section level3">
<h3>Clustering</h3>
<p>Segmentation via clustering has a good use case in fundraising for customized marketing as well as prospect identification. Various researchers have applied segmentation at university settings <span class="citation">(Cermak, File, and Prince 1994; Blanc and Rucks 2009; Luperchio 2009; E. J. Durango-Cohen, Torres, and Durango-Cohen 2013; P. L. Durango-Cohen, Durango-Cohen, and Torres 2013; Zhang 2014; Durango-Cohen and Balasubramanian 2015)</span>.</p>
</div>
<div id="descriptive-statistics-1" class="section level3">
<h3>Descriptive Statistics</h3>
<p>Descriptive statistics include mean, percentage distribution, correlations, Chi-squared tests, and Analysis of Variance (ANOVA). Most of the research in this category studied the effects of alumni characteristics to predict giving <span class="citation">(O’Connor 1961; Morris 1970; Caruthers 1973; Blumenfeld and Sartain 1974; McKee 1975; Gardner 1975; Sundel et al. 1978; Markoff 1978; McKinney 1978; Riecken and Yavas 1979; Anderson 1981; Smith and Beik 1982; Keller 1982; Korvas 1984; Nelson 1984; Chewning 1984; Dietz 1985; McNally 1985; Haddad 1986; Schlegelmilch and Tynan 1989; Oglesby 1991; Hunter, Jones, and Boger 1999; Bingham Jr, Quigley Jr, and Murray 2003; Wylie 2004; Gunsalus 2005; Newman 2011; Loveday 2012; Bruyn and Prokopec 2013; Johnson 2013; Miller 2013)</span>.</p>
<p>A few notable exceptions were:</p>
<ul>
<li><span class="citation">Frederick (1984)</span> studied football success with institutional giving.</li>
<li><span class="citation">Berger and Smith (1997)</span> analyzed the effects of framing the direct mail appeals.</li>
<li><span class="citation">Quigley, Bingham, and Murray (2002)</span> measured the effects of gift acknowledgments on giving.</li>
<li><span class="citation">Magson and Routley (2009)</span> looked at planned giving fundraising.</li>
</ul>
</div>
<div id="ensemble" class="section level3">
<h3>Ensemble</h3>
<p>Ensemble methods often include machine learning techniques, which are either combined to improve performance or used for comparison. <span class="citation">Potharst, Kaymak, and Pijls (2002)</span> used neural networks and CHAID to improve direct marketing outcomes. <span class="citation">Chen (2010)</span> used regression, neural network, and SVMs on the Direct Marketing Education Foundation (DMEF) data. <span class="citation">Ye (2017)</span> used Naive Bayes, Random Forest, and SVM to predict major donors and compared the results from these methods. Other works in this category included: <span class="citation">E. J. Durango-Cohen (2013)</span>, <span class="citation">Moon and Azizi (2013)</span>, <span class="citation">Udenze (2014)</span>, <span class="citation">Torres (2014)</span>, <span class="citation">Chung and Lee (2015)</span>, <span class="citation">Kakrala and Chakraborty (2015)</span>, and <span class="citation">Rattanamethawong, Sinthupinyo, and Chandrachai (2018)</span>.</p>
</div>
<div id="lifetime-value" class="section level3">
<h3>Lifetime Value</h3>
<p>Commonly used in the for-profit/marketing world, lifetime value calculates the future total profit from a customer. This value is used for segmentation and acquisition strategies. Some researchers have built models to calculate this value for donors <span class="citation">(Hunter and Hill 1998; Sargeant 1998; Aldrich 2000)</span>.</p>
</div>
<div id="machine-learning" class="section level3">
<h3>Machine Learning</h3>
<p>Many of the studies in the ensemble category fall in the machine learning category also. There was one study that didn’t fit in the ensemble category: <span class="citation">Weerts and Ronca (2009)</span> used classification trees to predict alumni giving.</p>
</div>
<div id="markov-chains" class="section level3">
<h3>Markov Chains</h3>
<p>Markov Chains use probabilities of prior events to predict the probability of next events, and such a chain continues. A donor’s lifetime giving can also be structured as a chain of events to predict future giving. <span class="citation">Soukup (1983)</span> and <span class="citation">Toohill et al. (1997)</span> used Markov chains to predict giving.</p>
</div>
<div id="neural-networks" class="section level3">
<h3>Neural Networks</h3>
<p>Like the machine learning models that fall under ensemble methods, a few neural network applications were also part of that category. But a standalone implementation of neural networks can be found in <span class="citation">Goodman and Plouff (1997)</span>.</p>
</div>
<div id="other" class="section level3">
<h3>Other</h3>
<p>I placed other publications in this category if I couldn’t classify them. These tend to be either overarching frameworks <span class="citation">(Birkholz 2008; Nandeshwar and Devine 2018)</span>, descriptive works <span class="citation">(Herzlinger 1977)</span>, or rarely applied techniques for fundraising <span class="citation">(Hashemi et al. 2009)</span>.</p>
</div>
<div id="regression-1" class="section level3">
<h3>Regression</h3>
<p>Researchers in higher education have applied different flavors of regression techniques, and as mentioned in the earlier section, I am using the term regression liberally. Most of these studies are Ph.D.&nbsp;dissertations from education schools and colleges <span class="citation">(Manzer 1974; Miracle 1977; Yavas, Riecken, and Parameswaran 1981; Beeler 1982; Rosenblatt, Cusson, and McGown 1986; House 1987; Grill 1988; Leslie and Ramey 1988; Shadoian 1989; Duronio and Loessin 1990; Boyle 1990; Lindahl and Winship 1992, 1994; Hueston 1992; Burgess-Getts 1992; Mosser 1993; Martin 1993; Okunade, Wunnava, and Walsh Jr 1994; Bruggink and Siddiqui 1995; Taylor and Martin 1995; Pearson 1996; Baade and Sundberg 1996; Okunade and Berl 1997; Schlegelmilch, Love, and Diamantopoulos 1997; Selig 1999; Duncan 1999; Greenlee and Trussel 2000; Hanson 2000; Belfield and Beney 2000; Key 2001; Cunningham and Cochi-Ficano 2002; Monks 2003; Bennett 2003, 2006; Hoyt 2004; Marr, Mullin, and Siegfried 2005; Gaier 2005; Tsao and Coll 2005; Sun, Hoffman, and Grady 2007; Diehl 2007; Bohannon 2007; Terry and Macy 2007; Meer and Rosen 2008, 2012; Lawley 2008; McDearmon and Shirley 2009; Holmes 2009; Shen and Tsai 2009; Thompson 2010; Dickert, Sagara, and Slovic 2010; Verhaert 2010; Oliveira, Croson, and Eckel 2011; Steinnes 2011; Baruch and Sang 2012; Ketter 2013; Lara and Johnson 2013; Truitt 2013; Tiger and Preston 2013; Rau 2014; Skari 2014; Morgan 2014; Lertputtarak and Supitchayangkool 2014; Ropp 2014; Walcott 2015; Rau and Erwin 2015; Pinion 2016; Park et al. 2016; Veludo-de-Oliveira et al. 2016; Lawrence, Kudyba, and Lawrence 2017; Brunette, Vo, and Watanabe 2017; Faisal 2017; Saraih et al. 2018; Day 2018; Christian 2018; Liu, Feng, and Ouyang 2018; Naccarato 2019; Lowe 2019)</span>.</p>
</div>
<div id="social-media" class="section level3">
<h3>Social Media</h3>
<p><span class="citation">Vequist IV (2017)</span> studied the use of various social media and giving to various nonprofit organizations. Campaign performance data and other meta-data were used to improve the decision making of the stakeholders and increase social media user donations.</p>
</div>
<div id="support-vector-machines" class="section level3">
<h3>Support Vector Machines</h3>
<p>One study using SVM is notable because it dealt with the imbalanced (or unbalanced) classes that we typically observe in the donation data i.e.&nbsp;either the proportion of donor records in the data is low or few major donors exist in the data. <span class="citation">Kim, Chae, and Olson (2012)</span> used SVMs to build a response model on imbalanced datasets.</p>
</div>
<div id="survival-analysis" class="section level3">
<h3>Survival Analysis</h3>
<p>Although survival analysis is used in analyzing data for a failure event, such as death, <span class="citation">Drye, Wetherill, and Pinnock (2001)</span> used it to predict a donor’s status in her giving lifecycle.</p>
</div>
</div>

<div id="quality-assessment" class="section level1">
<h1>Quality Assessment</h1>
<p>While reviewing the breadth of the methods used for nonprofit fundraising is useful, more important is assessing the rigor, credibility, and relevancy of the predictions in these published works. <span class="citation">Wen et al. (2012)</span> used a 10-question framework to assess the quality of each work. I used a similar method. I answered questions given in Table @ref(tab:qaquestions) for each published work; the possible answers were <em>Yes</em>, <em>No</em>, or <em>Somewhat</em> with weights of 1, 0, and 0.5 respectively. All questions, except for <em>Q4</em> and <em>Q6</em>, are from <span class="citation">Wen et al. (2012)</span>. Of course, these questions are suitable only for those works in which the researchers made predictions or built predictive models. It is also unfair to assess older research when obtaining enough computing power was a challenge. Also, my subjective bias can skew the findings.</p>
<table>
<caption>
Prediction quality assessment questions
</caption>
<thead>
<tr>
<th style="text-align:left;">
ID
</th>
<th style="text-align:left;">
Question
</th>
</tr>
</thead>
<tbody>
<tr>
<td style="text-align:left;">
Q1
</td>
<td style="text-align:left;">
Are the estimation methods well defined and deliberate?
</td>
</tr>
<tr>
<td style="text-align:left;">
Q2
</td>
<td style="text-align:left;">
Is the experiment applied on sufficient data sets?
</td>
</tr>
<tr>
<td style="text-align:left;">
Q3
</td>
<td style="text-align:left;">
Is the estimation accuracy measured and reported?
</td>
</tr>
<tr>
<td style="text-align:left;">
Q4
</td>
<td style="text-align:left;">
Are the estimates significantly better than the baseline?
</td>
</tr>
<tr>
<td style="text-align:left;">
Q5
</td>
<td style="text-align:left;">
Is the proposed estimation method compared with other methods?
</td>
</tr>
<tr>
<td style="text-align:left;">
Q6
</td>
<td style="text-align:left;">
Can the findings be applied widely?
</td>
</tr>
<tr>
<td style="text-align:left;">
Q7
</td>
<td style="text-align:left;">
Are the findings of study clearly stated and supported by reporting results?
</td>
</tr>
</tbody>
</table>


<table>
<thead>
<tr class="header">
<th align="left">Author</th>
<th align="left">Analytics Method</th>
</tr>
</thead>
<tbody>
<tr class="odd">
<td align="left"><span class="citation">Shadoian (1989)</span></td>
<td align="left">Regression</td>
</tr>
<tr class="even">
<td align="left"><span class="citation">Liihe (1998)</span></td>
<td align="left">CHAID</td>
</tr>
<tr class="odd">
<td align="left"><span class="citation">Greenlee and Trussel (2000)</span></td>
<td align="left">Regression</td>
</tr>
<tr class="even">
<td align="left"><span class="citation">Potharst, Kaymak, and Pijls (2002)</span></td>
<td align="left">Ensemble</td>
</tr>
<tr class="odd">
<td align="left"><span class="citation">Chen (2010)</span></td>
<td align="left">Ensemble</td>
</tr>
<tr class="even">
<td align="left"><span class="citation">Kim, Chae, and Olson (2012)</span></td>
<td align="left">Support Vector Machines</td>
</tr>
<tr class="odd">
<td align="left"><span class="citation">Moon and Azizi (2013)</span></td>
<td align="left">Ensemble</td>
</tr>
<tr class="even">
<td align="left"><span class="citation">Chung and Lee (2015)</span></td>
<td align="left">Ensemble</td>
</tr>
<tr class="odd">
<td align="left"><span class="citation">Ye (2017)</span></td>
<td align="left">Ensemble</td>
</tr>
<tr class="even">
<td align="left"><span class="citation">Liu, Feng, and Ouyang (2018)</span></td>
<td align="left">Regression</td>
</tr>
</tbody>
</table>
</div>

<div id="summary-of-literature" class="section level1">
<h1>Summary of Literature</h1>
<p>Most of the studies in this review focused on either predicting the likelihood of a person donating or predicting the giving level or amount. An exception was the <span class="citation">Greenlee and Trussel (2000)</span> study of the financial stability of institutions. As <span class="citation">Brittingham and Pezzullo (1990, 39)</span> wrote about the predictive studies in fundraising, “Most of the studies are dissertations, and most are based on a single institution, most often a university. The results … do not support strong conclusions.” What was true in the 1990s remains true today. As we saw in the earlier sections, dissertations lare the second-most studies in applied analytics for fundraising.</p>
<p>Many dissertations followed a similar pattern: select variables based on literature, study each variable for correlations and significance, include selected variables for an estimation model, reject or accept the null hypothesis, and then present final results.</p>
<p>There are some challenges with this approach.</p>
<ol style="list-style-type: decimal">
<li>These studies are often limited to one institution; hence the results cannot be generalized.</li>
<li>This type of research primarily becomes about the application of a statistical technique to the researcher’s dataset and doesn’t contribute to knowledge advancement, either through the application of newer and different predictive methods or towards a unified theory of giving.</li>
<li>This type of framework can be <a href="https://nandeshwar.info/data-science-2/how-to-automate-statistical-analysis-using-rmarkdown/">templatized using a programming language.</a></li>
</ol>
<p>While building local predictive models are useful for development offices, we need either groundbreaking research to significantly improve on the donor classification problem, or we need to find different fundraising problems to solve.</p>

<p>Many of these studies used the null hypothesis significance testing (NHST) to infer the answers to research questions. This is problematic for two reasons:</p>
<ol style="list-style-type: decimal">
<li><p>As <span class="citation">Trafimow (2014)</span> declared in his editorial of the Basic and Applied Social Psychology journal, “The null hypothesis significance testing procedure has been shown to be logically invalid and to provide little information about the actual likelihood of either the null or experimental hypothesis.” Then next year, while banning the null hypothesis significance testing procedure from the journal, <span class="citation">Trafimow and Marks (2015)</span> said, “<span class="math inline">\(p &lt; .05\)</span> bar is too easy to pass and sometimes serves as an excuse for lower quality research.”</p></li>
<li><p>As <span class="citation">Gliner, Leech, and Morgan (2002)</span> noted, “A common misuse of NHST is the implication that statistical significance means theoretical or practical significance.” In these surveyed studies, you can find examples of researchers interpreting statistically significant results mistaken for important findings.</p></li>
</ol>
<p>While most researchers report on the overall accuracy of their prediction models, very few report on other evaluation measures, such as precision, recall, or specificity. Another challenge is the lack of comparison to baseline proportions. Since such measures or comparisons aren’t reported, it is hard to assess whether the new predictive models performed better than guessing.</p>
<p>For example, say our data had 5% donors and 95% non-donors. We built a predictive model that classified donors and non-donors. Let’s say that this model had an overall accuracy of 95%. Now, if were to evaluate the model only based on accuracy, we might be satisfied with its performance. But even if we guess every row as a non-donor, we achieve 95% accuracy.</p>
<p>Similarly, if the data has 45% donors and 55% non-donors, and the model had an overall accuracy of 50%, it did worse than the baseline. Even if the predictive models aren’t compared to other models, they should at least be compared with the baseline. As my colleagues and I reported in another paper, if the overall accuracy rate is close to the baseline, then the complex analysis can be replicated by a simple majority vote model <span class="citation">(Nandeshwar, Menzies, and Nelson 2011)</span>.</p>
<p>One benefit of the research done over decades into the likelihood of a person’s donation is that we have a comprehensive list of attributes, attitudes, and values that could go into building new predictive models.</p>
</div>
<div id="opportunities-and-future-direction" class="section level1">
<h1>Opportunities and Future Direction</h1>
<p>Today’s technological advancement offers fascinating paths to study various problems in fundraising. Here are some suggestions and ideas to build on our knowledge of applications of data science in nonprofit fundraising.</p>
<ul>
<li><strong>Establish the baseline</strong>. In classification or numeric prediction models, use a majority vote or the mean value to compare the results against. <span class="citation">Witten et al. (2016)</span> call this model is called <em>ZeroR</em>. Also, consider using a simple, single-rule classification model known as <em>1R</em> or <em>OneR</em>. <span class="citation">Holte (1993)</span> calculated the results from this simple model on many datasets and compared them with an advanced decision tree model and found that <em>1R</em> was only “a few percentage points less accurate.”</li>
<li><strong>Use and report a wider set of evaluation metrics</strong>. As we saw earlier, reporting accuracy can be misleading. We can consider different evaluation measures shown in the equations below <span class="citation">(Branco, Torgo, and Ribeiro 2016)</span>. For example, <span class="citation">Rau (2014, 30)</span> reported that “76.4% of cases are correctly classified,” but you can see in Table @ref(tab:raustudycf) that 73% of their study data contained non-donors, so simply predicting everyone a non-donor, our accuracy is 73%. The study predicted only 88 donors who were actual donors, making the recall or true positive rate of 20%. Thus, the model failed at correctly identifying donors. Similarly, the F-measure and balanced accuracy were low at 0.32 and 59%.</li>
</ul>
<table>
<caption>
Confusion Matrix for a Two-class Problem
</caption>
<thead>
<tr>
<th style="border-bottom:hidden" colspan="2">
</th>
<th style="border-bottom:hidden; padding-bottom:0; padding-left:3px;padding-right:3px;text-align: center; font-weight: bold; " colspan="2">
<div style="border-bottom: 1px solid #ddd; padding-bottom: 5px; ">
Predicted
</div>
</th>
</tr>
<tr>
<th style="text-align:left;">
</th>
<th style="text-align:left;">
</th>
<th style="text-align:left;">
Donor
</th>
<th style="text-align:left;">
Non-donor
</th>
</tr>
</thead>
<tbody>
<tr>
<td style="text-align:left;font-weight: bold;vertical-align: middle !important;" rowspan="2">
Actual
</td>
<td style="text-align:left;font-weight: bold;">
Donor
</td>
<td style="text-align:left;">
True Positive (TP)
</td>
<td style="text-align:left;">
False Negative (FN)
</td>
</tr>
<tr>
<td style="text-align:left;font-weight: bold;">
Non-donor
</td>
<td style="text-align:left;">
False Positive (FP)
</td>
<td style="text-align:left;">
True Negative (TN)
</td>
</tr>
</tbody>
</table>
<table>
<caption>
Confusion Matrix from Rau (2014)
</caption>
<thead>
<tr>
<th style="border-bottom:hidden" colspan="2">
</th>
<th style="border-bottom:hidden; padding-bottom:0; padding-left:3px;padding-right:3px;text-align: center; font-weight: bold; " colspan="2">
<div style="border-bottom: 1px solid #ddd; padding-bottom: 5px; ">
Predicted
</div>
</th>
</tr>
<tr>
<th style="text-align:left;">
</th>
<th style="text-align:left;">
</th>
<th style="text-align:right;">
Donor
</th>
<th style="text-align:right;">
Non-donor
</th>
</tr>
</thead>
<tbody>
<tr>
<td style="text-align:left;font-weight: bold;vertical-align: middle !important;" rowspan="2">
Actual
</td>
<td style="text-align:left;font-weight: bold;">
Donor
</td>
<td style="text-align:right;">
88
</td>
<td style="text-align:right;">
342
</td>
</tr>
<tr>
<td style="text-align:left;font-weight: bold;">
Non-donor
</td>
<td style="text-align:right;">
32
</td>
<td style="text-align:right;">
1126
</td>
</tr>
</tbody>
</table>
<p><span class="math display">\[\begin{equation} 
\mathrm{Precision} = \frac{TP}{TP+FP}
\end{equation}\]</span> <!-- $$Precision = \frac{TP}{TP+FP}$$ --> <span class="math display">\[\begin{equation} 
\mathrm{Recall} = \frac{TP}{TP+FN}
\end{equation}\]</span> <!-- $$Recall = \frac{TP}{TP+FN}$$ --> <span class="math display">\[\begin{equation} 
\mathrm{Specificity} = \frac{TN}{TN+FP}
\end{equation}\]</span> <!-- $$Specificity = \frac{TN}{TN+FP}$$ --> <span class="math display">\[\begin{equation} 
\mathrm{F-measure} = 2\times\frac{Precision \times Recall}{Precision + Recall}
\end{equation}\]</span> <!-- $$F-measure = 2\times\frac{Precision \times Recall}{Precision + Recall}$$ --> <span class="math display">\[\begin{equation} 
\mathrm{Balanced Accuracy } = \frac{Recall + Specificity}{2}
\end{equation}\]</span> <!-- $$Balanced Accuracy = \frac{Recall + Specificity}{2}$$ --></p>
<ul>
<li><strong>Consider selecting variables using feature subset selection (FSS)</strong>. In his extensive study of feature subset selectors, <span class="citation">Hall (1999)</span> documented compared his feature (or variable) selector with other predictive techniques. He found that FSS removed redundant and irrelevant features, and in some cases, even improved the performance of the underlying predictive algorithms.</li>
<li><strong>Consider class balancing methods</strong>. When the number of rows for one class (such as non-donor) is higher than the rows for any other class (such as donor), class imbalance occurs. To overcome this problem, <span class="citation">Kim, Chae, and Olson (2012)</span> used undersampling to reduce the number of majority class rows. Some other approaches to achieve class balance: oversampling the minority class rows, synthetic generation of minority class rows, such as SMOTE and family <span class="citation">(Chawla et al. 2002; Han, Wang, and Mao 2005)</span>, and cost-sensitive learning <span class="citation">(Domingos 1999)</span>.</li>
<li><strong>Consider ensemble methods</strong>. Either combine various models (that is bagging, boosting, or stacking methods <span class="citation">(see Witten et al. 2016, Section 8.1)</span>) or compare various models and pre-processors. This type of comparison should be standard. Here’s pseudocode to explain this comparison:</li>
</ul>
<pre class="latex"><code>For each dataset:
    Create P pre-processed datasets
    For each p in P:
        Divide p into ten cross-folds
        For each predictive learning technique t:
            Train t on 9-folds
            Test the model on the remaining folds
            Store results and the resulting model</code></pre><code>
<ul>
<li><strong>Build a large database with data from diverse organizations</strong>. If researchers can collect data from many organizations, they can conduct a large-scale study to build predictive models. For example, <span class="citation">Thompson (2010)</span> used data from eight institutions. Such a large-scale study will show either that accurate donor classification is hard, or that a unified, single model can be built and we can research other topics. A related idea is what <span class="citation">JOHNSON (1991)</span> attempted: get anonymized data from the Internal Revenue Service (IRS) and build models on it.</li>
<li>Research other topics and approaches:
<ul>
<li><strong>Consider modeling methods that work well with long-tail or skewed data</strong>, such as quantile regression <span class="citation">(Perlich et al. 2007)</span> or HyperSMURF, an ensemble method <span class="citation">(Schubach et al. 2017)</span>.</li>
<li><strong>Study creation of personalized appeals and communication</strong>. The latest Natural Language Processing and Generation (NLP and NLG) methods are far superior to previous methods <span class="citation">(Yang et al. 2019)</span>, and they can be used to generate personalized appeals and communication. <span class="citation">Ding and Pan (2016)</span>, for example, generated gain or risk framed text to increase the text’s appeal to the reader.</li>
<li><strong>Study applications of graph theory to learn interests</strong>. Social graphs have value if all the connections in the graph can be known. A better use case for fundraising could be interest graphs, which identify the interests of people and connect people based on these interests <span class="citation">(Yu et al. 2014)</span>.</li>
<li><strong>Use NLG and NLP to automate tasks</strong>. Like creating personalized appeals, we can use pre-trained language models to summarize text, among other things, as shown by <span class="citation">Liu and Lapata (2019)</span>. For example, using a simple Python text summarizer called <a href="https://github.com/miso-belica/sumy"><em>sumy</em></a>, I summarized an article on Bill Gates from <a href="https://www.biography.com/business-figure/bill-gates">biography.com</a>.</li>
</ul></li>
</ul>

<pre class="latex"><code>"In 1975, Gates and Allen formed Micro-Soft, a blend of "micro-computer" 
and "software" (they dropped the hyphen within a year). Bill Gates Fact 
Card Microsoft’s Software for IBM PCs As the computer industry grew, 
with companies like Apple, Intel and IBM developing hardware and 
components, Gates was continuously on the road touting the merits 
of Microsoft software applications. Since stepping down from Microsoft, 
Gates devotes much of his time and energy to the work of the Bill &amp; 
Melinda Gates Foundation."</code></pre><code>
</code></code></div><code><code>
<div id="conclusion" class="section level1">
<h1>Conclusion</h1>
<p>In this paper, I reviewed the literature of analytics for nonprofit fundraising. Although researchers have applied more sophisticated methods over time, regression methods remain the most-used technique for predicting a donor’s likelihood of giving and her giving amount. Also, dissertations account for second-most published works. Machine learning and ensemble techniques are increasingly in use, and we will see more research using these methods in the future. Researchers will also use natural language processing and generation, along with deep learning.</p>

</div>



</code></code>



<div class="wp-block-image"><figure class="aligncenter size-large is-resized"><img loading="lazy" src="https://d2py08v4b28rs4.cloudfront.net/wp-content/uploads/fundraising-analytics-publications-quality-of-publications-1024x672.png" alt="" class="wp-image-3430" width="512" height="336" srcset="https://d2py08v4b28rs4.cloudfront.net/wp-content/uploads/fundraising-analytics-publications-quality-of-publications-1024x672.png 1024w, https://d2py08v4b28rs4.cloudfront.net/wp-content/uploads/fundraising-analytics-publications-quality-of-publications-300x197.png 300w, https://d2py08v4b28rs4.cloudfront.net/wp-content/uploads/fundraising-analytics-publications-quality-of-publications-768x504.png 768w, https://d2py08v4b28rs4.cloudfront.net/wp-content/uploads/fundraising-analytics-publications-quality-of-publications-1536x1008.png 1536w, https://d2py08v4b28rs4.cloudfront.net/wp-content/uploads/fundraising-analytics-publications-quality-of-publications.png 1906w" sizes="(max-width: 512px) 100vw, 512px" /><figcaption>Quality assessment of publications in fundraising analytics </figcaption></figure></div>



<h2>Searchable Bibliography of Analytics in Fundraising</h2>



<link href="https://cdn.datatables.net/1.10.21/css/jquery.dataTables.min.css" rel="stylesheet">
<script src="https://code.jquery.com/jquery-3.5.1.js"></script>
<script src="https://cdn.datatables.net/1.10.21/js/jquery.dataTables.min.js"></script>

<script type="text/javascript">
$(document).ready(function() {
    $('#example').DataTable( {
        "order": [[ 2, "asc" ],[ 0, "asc" ]],
responsive: true
    } );
} );
</script>

<table id="example" class="display compact" width="100%">


<thead>
<tr class="header">
<th align="left">AUTHORS</th>
<th align="left">TITLE</th>
<th align="right">YEAR</th>
<th align="left">JOURNAL</th>
<th align="left">PUBLICATION TYPE</th>
<th align="left">ANALYTICS CATEGORY</th>
<th align="left">DECADE</th>
<th align="left">DOI</th>
</tr>
</thead>
<tbody>
<tr class="odd">
<td align="left">O’Connor, William J.</td>
<td align="left">A Study Of Certain Factors Characteristic Of Alumni Who Provide Financial Support And Alumni Who Provide No Financial Support For Their College</td>
<td align="right">1961</td>
<td align="left"></td>
<td align="left">PHDTHESIS</td>
<td align="left">Descriptive Statistics</td>
<td align="left">1960-1969</td>
<td align="left"></td>
</tr>
<tr class="even">
<td align="left">Morris, Donald A. A.</td>
<td align="left">An Analysis Of Donors Of $10,000 Or More To The $55 Million Program At The University Of Michigan</td>
<td align="right">1970</td>
<td align="left"></td>
<td align="left">PHDTHESIS</td>
<td align="left">Descriptive Statistics</td>
<td align="left">1970-1979</td>
<td align="left"></td>
</tr>
<tr class="odd">
<td align="left">Caruthers, Flora Ann Spencer</td>
<td align="left">Study of Certain Characteristics of Alumni Who Provide Financial Support and Alumni Who Provide No Financial Support for Their Alma Mater</td>
<td align="right">1973</td>
<td align="left"></td>
<td align="left">PHDTHESIS</td>
<td align="left">Descriptive Statistics</td>
<td align="left">1970-1979</td>
<td align="left"></td>
</tr>
<tr class="even">
<td align="left">Manzer, Leslie Lee</td>
<td align="left">Charitable health organization donor behavior: an empirical study of value and attitude structure</td>
<td align="right">1974</td>
<td align="left"></td>
<td align="left">PHDTHESIS</td>
<td align="left">Regression</td>
<td align="left">1970-1979</td>
<td align="left"></td>
</tr>
<tr class="odd">
<td align="left">Warren S. Blumenfeld, Patricia L. Sartain</td>
<td align="left">Predicting alumni financial donation.</td>
<td align="right">1974</td>
<td align="left">Journal of Applied Psychology</td>
<td align="left">ARTICLE</td>
<td align="left">Descriptive Statistics</td>
<td align="left">1970-1979</td>
<td align="left"><a href="https://doi.org/10.1037/h0037298">10.1037/h0037298</a></td>
</tr>
<tr class="even">
<td align="left">Gardner, Paul M.</td>
<td align="left">A STUDY OF THE ATTITUDES OF HARDING COLLEGE ALUMNI WITH AN EMPHASIS ON DONOR AND NON-DONOR CHARACTERISTICS</td>
<td align="right">1975</td>
<td align="left">ProQuest Dissertations and Theses</td>
<td align="left">PHDTHESIS</td>
<td align="left">Descriptive Statistics</td>
<td align="left">1970-1979</td>
<td align="left"></td>
</tr>
<tr class="odd">
<td align="left">McKee, Dale F.</td>
<td align="left">An Analysis Of Factors Which Affect Alumni Participation And Support</td>
<td align="right">1975</td>
<td align="left"></td>
<td align="left">PHDTHESIS</td>
<td align="left">Descriptive Statistics</td>
<td align="left">1970-1979</td>
<td align="left"></td>
</tr>
<tr class="even">
<td align="left">Herzlinger, Regina</td>
<td align="left">Why Data Systems in Nonprofit Organizations Fail.</td>
<td align="right">1977</td>
<td align="left">Harvard Business Review</td>
<td align="left">ARTICLE</td>
<td align="left">Other</td>
<td align="left">1970-1979</td>
<td align="left"></td>
</tr>
<tr class="odd">
<td align="left">Miracle, William D.</td>
<td align="left">Differences Between Givers And Nongivers To The University Of Georgia Annual Fund</td>
<td align="right">1977</td>
<td align="left"></td>
<td align="left">PHDTHESIS</td>
<td align="left">Regression</td>
<td align="left">1970-1979</td>
<td align="left"></td>
</tr>
<tr class="even">
<td align="left">Markoff, Richard M.</td>
<td align="left">An Analysis Of The Relationship Of Alumni Giving And Level Of Participation In Voluntary Organizations: A Case Study</td>
<td align="right">1978</td>
<td align="left"></td>
<td align="left">PHDTHESIS</td>
<td align="left">Descriptive Statistics</td>
<td align="left">1970-1979</td>
<td align="left"></td>
</tr>
<tr class="odd">
<td align="left">McKinney, Ricardo J.</td>
<td align="left">Factors Among Select Donors And Nondonors Related To Major Gifts To A Private University</td>
<td align="right">1978</td>
<td align="left"></td>
<td align="left">PHDTHESIS</td>
<td align="left">Descriptive Statistics</td>
<td align="left">1970-1979</td>
<td align="left"></td>
</tr>
<tr class="even">
<td align="left">Sundel, Harvey H, Zelman, William N, Weaver, Charles N, Pasternak, Richard E</td>
<td align="left">Fund raising: understanding donor motivation</td>
<td align="right">1978</td>
<td align="left">Social Work</td>
<td align="left">ARTICLE</td>
<td align="left">Descriptive Statistics</td>
<td align="left">1970-1979</td>
<td align="left"></td>
</tr>
<tr class="odd">
<td align="left">Glen Riecken, Ugur Yavas</td>
<td align="left">Meeting the Solicitation Challenge Through Marketing</td>
<td align="right">1979</td>
<td align="left">Administration in Social Work</td>
<td align="left">ARTICLE</td>
<td align="left">Descriptive Statistics</td>
<td align="left">1970-1979</td>
<td align="left"><a href="https://doi.org/10.1300/j147v03n03_06">10.1300/j147v03n03_06</a></td>
</tr>
<tr class="even">
<td align="left">Anderson, Gerald L.</td>
<td align="left">Self-Esteem And Altruism Perceived As Motivational Factors For Alumni Giving, And Their Relationships To Various Donor Characteristics</td>
<td align="right">1981</td>
<td align="left"></td>
<td align="left">PHDTHESIS</td>
<td align="left">Descriptive Statistics</td>
<td align="left">1980-1989</td>
<td align="left"></td>
</tr>
<tr class="odd">
<td align="left">Ugur Yavas, Glen Riecken, Ravi Parameswaran</td>
<td align="left">Personality, organization-specific attitude, and socioeconomic correlates of charity giving behavior</td>
<td align="right">1981</td>
<td align="left">Journal of the Academy of Marketing Science</td>
<td align="left">ARTICLE</td>
<td align="left">Regression</td>
<td align="left">1980-1989</td>
<td align="left"><a href="https://doi.org/10.1007/bf02723565">10.1007/bf02723565</a></td>
</tr>
<tr class="even">
<td align="left">Beeler, Karl J.</td>
<td align="left">A STUDY OF PREDICTORS OF ALUMNI PHILANTHROPY IN PRIVATE UNIVERSITIES</td>
<td align="right">1982</td>
<td align="left"></td>
<td align="left">PHDTHESIS</td>
<td align="left">Regression</td>
<td align="left">1980-1989</td>
<td align="left"></td>
</tr>
<tr class="odd">
<td align="left">Keller,Mary J. C.</td>
<td align="left">An Analysis Of Alumni Donor And Non-Donor Characteristics At The University Of Montevallo (Alabama)</td>
<td align="right">1982</td>
<td align="left"></td>
<td align="left">PHDTHESIS</td>
<td align="left">Descriptive Statistics</td>
<td align="left">1980-1989</td>
<td align="left"></td>
</tr>
<tr class="even">
<td align="left">Scott M. Smith, Leland L. Beik</td>
<td align="left">Market segmentation for fund raisers</td>
<td align="right">1982</td>
<td align="left">Journal of the Academy of Marketing Science</td>
<td align="left">ARTICLE</td>
<td align="left">Descriptive Statistics</td>
<td align="left">1980-1989</td>
<td align="left"><a href="https://doi.org/10.1007/bf02729963">10.1007/bf02729963</a></td>
</tr>
<tr class="odd">
<td align="left">David J. Soukup</td>
<td align="left">A Markov Analysis of Fund-Raising Alternatives</td>
<td align="right">1983</td>
<td align="left">Journal of Marketing Research</td>
<td align="left">ARTICLE</td>
<td align="left">Markov Chains</td>
<td align="left">1980-1989</td>
<td align="left"><a href="https://doi.org/10.1177/002224378302000310">10.1177/002224378302000310</a></td>
</tr>
<tr class="even">
<td align="left">Chewning, Paul B.</td>
<td align="left">The Attitudes Of Alumni Non-Donors, Donors, And Consecutive Donors Toward Drake University</td>
<td align="right">1984</td>
<td align="left"></td>
<td align="left">PHDTHESIS</td>
<td align="left">Descriptive Statistics</td>
<td align="left">1980-1989</td>
<td align="left"></td>
</tr>
<tr class="odd">
<td align="left">Frederick, Robert E.</td>
<td align="left">Intercollegiate Football Success And Institutional Private Support: A National Study Of 81 Public Universities, 1965-1979</td>
<td align="right">1984</td>
<td align="left"></td>
<td align="left">PHDTHESIS</td>
<td align="left">Descriptive Statistics</td>
<td align="left">1980-1989</td>
<td align="left"></td>
</tr>
<tr class="even">
<td align="left">Korvas,Ronald J.</td>
<td align="left">The Relationship Of Selected Alumni Characteristics And Attitudes To Alumni Financial Support At A Private College</td>
<td align="right">1984</td>
<td align="left"></td>
<td align="left">PHDTHESIS</td>
<td align="left">Descriptive Statistics</td>
<td align="left">1980-1989</td>
<td align="left"></td>
</tr>
<tr class="odd">
<td align="left">Nelson, William T., Jr.</td>
<td align="left">A COMPARISON OF SELECTED UNDERGRADUATE EXPERIENCES OF ALUMNI WHO FINANCIALLY SUPPORT THEIR ALMA MATER</td>
<td align="right">1984</td>
<td align="left"></td>
<td align="left">PHDTHESIS</td>
<td align="left">Descriptive Statistics</td>
<td align="left">1980-1989</td>
<td align="left"></td>
</tr>
<tr class="even">
<td align="left">Dietz, Larry H</td>
<td align="left">Iowa State University alumni contributions: an analysis of alumni giving patterns by selected class years, 1974 and 1979</td>
<td align="right">1985</td>
<td align="left"></td>
<td align="left">PHDTHESIS</td>
<td align="left">Descriptive Statistics</td>
<td align="left">1980-1989</td>
<td align="left"></td>
</tr>
<tr class="odd">
<td align="left">McNally, Frederick E.</td>
<td align="left">An ANALYSIS OF ALUMNI PHILANTHROPY RELATED TO PERSONAL, ACADEMIC, AND SOCIAL CHARACTERISTICS (FUNDRAISING, PUBLIC INSTITUTION, UNIVERSITY, INSTITUTIONAL ADVANCEMENT)</td>
<td align="right">1985</td>
<td align="left"></td>
<td align="left">PHDTHESIS</td>
<td align="left">Descriptive Statistics</td>
<td align="left">1980-1989</td>
<td align="left"></td>
</tr>
<tr class="even">
<td align="left">Haddad,Freddie D., Jr.</td>
<td align="left">An Analysis Of The Characteristics Of Alumni Donors And Non-Donors At Butler University</td>
<td align="right">1986</td>
<td align="left"></td>
<td align="left">PHDTHESIS</td>
<td align="left">Descriptive Statistics</td>
<td align="left">1980-1989</td>
<td align="left"></td>
</tr>
<tr class="odd">
<td align="left">Rosenblatt, Jerry A, Cusson, Alain J, McGown, Lee</td>
<td align="left">A model to explain charitable donation-health care consumer behavior</td>
<td align="right">1986</td>
<td align="left">Advances in Consumer Research</td>
<td align="left">ARTICLE</td>
<td align="left">Regression</td>
<td align="left">1980-1989</td>
<td align="left"></td>
</tr>
<tr class="even">
<td align="left">House,Michael L.</td>
<td align="left">Annual fund raising in public higher education: The development and validation of a prediction equation</td>
<td align="right">1987</td>
<td align="left">ProQuest Dissertations and Theses</td>
<td align="left">PHDTHESIS</td>
<td align="left">Regression</td>
<td align="left">1980-1989</td>
<td align="left"></td>
</tr>
<tr class="odd">
<td align="left">Grill,Alan J.</td>
<td align="left">An analysis of the relationships of selected variables to financial support provided by alumni of a public university</td>
<td align="right">1988</td>
<td align="left"></td>
<td align="left">PHDTHESIS</td>
<td align="left">Regression</td>
<td align="left">1980-1989</td>
<td align="left"></td>
</tr>
<tr class="even">
<td align="left">Leslie, Larry L, Ramey, Garey</td>
<td align="left">Donor behavior and voluntary support for higher education institutions</td>
<td align="right">1988</td>
<td align="left">The Journal of Higher Education</td>
<td align="left">ARTICLE</td>
<td align="left">Regression</td>
<td align="left">1980-1989</td>
<td align="left"></td>
</tr>
<tr class="odd">
<td align="left">Schlegelmilch, Bodo B, Tynan, AC</td>
<td align="left">The scope for market segmentation within the charity market: An empirical analysis</td>
<td align="right">1989</td>
<td align="left">Managerial and Decision Economics</td>
<td align="left">ARTICLE</td>
<td align="left">Descriptive Statistics</td>
<td align="left">1980-1989</td>
<td align="left"></td>
</tr>
<tr class="even">
<td align="left">Shadoian,Holly L.</td>
<td align="left">A study of predictors of alumni philanthropy in public colleges</td>
<td align="right">1989</td>
<td align="left"></td>
<td align="left">PHDTHESIS</td>
<td align="left">Regression</td>
<td align="left">1980-1989</td>
<td align="left"></td>
</tr>
<tr class="odd">
<td align="left">Boyle,James J.</td>
<td align="left">College quality and alumni giving</td>
<td align="right">1990</td>
<td align="left"></td>
<td align="left">PHDTHESIS</td>
<td align="left">Regression</td>
<td align="left">1990-1999</td>
<td align="left"></td>
</tr>
<tr class="even">
<td align="left">Margaret A. Duronio, Bruce A. Loessin</td>
<td align="left">Fund-Raising Outcomes and Institutional Characteristics in Ten Types of Higher Education Institutions</td>
<td align="right">1990</td>
<td align="left">The Review of Higher Education</td>
<td align="left">ARTICLE</td>
<td align="left">Regression</td>
<td align="left">1990-1999</td>
<td align="left"><a href="https://doi.org/10.1353/rhe.1990.0013">10.1353/rhe.1990.0013</a></td>
</tr>
<tr class="odd">
<td align="left">Oglesby,Rodney A.</td>
<td align="left">Age, student involvement, and other characteristics of alumni donors and alumni non-donors of Southwest Baptist University</td>
<td align="right">1991</td>
<td align="left"></td>
<td align="left">PHDTHESIS</td>
<td align="left">Descriptive Statistics</td>
<td align="left">1990-1999</td>
<td align="left"></td>
</tr>
<tr class="even">
<td align="left">Burgess-Getts,Linda</td>
<td align="left">Alumni as givers: An analysis of donor-nondonor behavior at a Comprehensive I institution</td>
<td align="right">1992</td>
<td align="left"></td>
<td align="left">PHDTHESIS</td>
<td align="left">Regression</td>
<td align="left">1990-1999</td>
<td align="left"></td>
</tr>
<tr class="odd">
<td align="left">Hueston,Frederick R.</td>
<td align="left">Predicting Alumni Giving: A Donor Analysis Test</td>
<td align="right">1992</td>
<td align="left">Fund raising management</td>
<td align="left">ARTICLE</td>
<td align="left">Regression</td>
<td align="left">1990-1999</td>
<td align="left"></td>
</tr>
<tr class="even">
<td align="left">Wesley E Lindahl, Christopher Winship</td>
<td align="left">Predictive models for annual fundraising and major gift fundraising</td>
<td align="right">1992</td>
<td align="left">Nonprofit Management and Leadership</td>
<td align="left">ARTICLE</td>
<td align="left">Regression</td>
<td align="left">1990-1999</td>
<td align="left"></td>
</tr>
<tr class="odd">
<td align="left">Martin,Joseph C., Jr.</td>
<td align="left">Characteristics of alumni donors and non-donors at a Research I, public university</td>
<td align="right">1993</td>
<td align="left"></td>
<td align="left">PHDTHESIS</td>
<td align="left">Regression</td>
<td align="left">1990-1999</td>
<td align="left"></td>
</tr>
<tr class="even">
<td align="left">Mosser, John Wayne</td>
<td align="left">Predicting Alumni/ae Gift Giving Behavior: A Structural Equation Model Approach</td>
<td align="right">1993</td>
<td align="left"></td>
<td align="left">PHDTHESIS</td>
<td align="left">Regression</td>
<td align="left">1990-1999</td>
<td align="left"></td>
</tr>
<tr class="odd">
<td align="left">Dianne S.P. Cermak, Karen Maru File, Russ Alan Prince</td>
<td align="left">A benefit segmentation of the major donor market</td>
<td align="right">1994</td>
<td align="left">Journal of Business Research</td>
<td align="left">ARTICLE</td>
<td align="left">Clustering</td>
<td align="left">1990-1999</td>
<td align="left"><a href="https://doi.org/10.1016/0148-2963(94)90016-7">10.1016/0148-2963(94)90016-7</a></td>
</tr>
<tr class="even">
<td align="left">Okunade, Albert Ade, Wunnava, Phanindra V, Walsh Jr, Raymond</td>
<td align="left">Charitable giving of alumni: micro-data evidence from a large public university</td>
<td align="right">1994</td>
<td align="left">American Journal of Economics and Sociology</td>
<td align="left">ARTICLE</td>
<td align="left">Regression</td>
<td align="left">1990-1999</td>
<td align="left"></td>
</tr>
<tr class="odd">
<td align="left">Wesley E Lindahl, Christopher Winship</td>
<td align="left">A logit model with interactions for predicting major gift donors</td>
<td align="right">1994</td>
<td align="left">Research in Higher Education</td>
<td align="left">ARTICLE</td>
<td align="left">Regression</td>
<td align="left">1990-1999</td>
<td align="left"><a href="https://doi.org/10.1007/bf02497084">10.1007/bf02497084</a></td>
</tr>
<tr class="even">
<td align="left">Alton L. Taylor, Joseph C. Martin</td>
<td align="left">Characteristics of alumni donors and nondonors at a Research I, public university</td>
<td align="right">1995</td>
<td align="left">Research in Higher Education</td>
<td align="left">ARTICLE</td>
<td align="left">Regression</td>
<td align="left">1990-1999</td>
<td align="left"><a href="https://doi.org/10.1007/bf02208312">10.1007/bf02208312</a></td>
</tr>
<tr class="odd">
<td align="left">Bruggink, Thomas H, Siddiqui, Kamran</td>
<td align="left">An econometric model of alumni giving: A case study for a liberal arts college</td>
<td align="right">1995</td>
<td align="left">The American Economist</td>
<td align="left">ARTICLE</td>
<td align="left">Regression</td>
<td align="left">1990-1999</td>
<td align="left"></td>
</tr>
<tr class="even">
<td align="left">Pearson,William E.</td>
<td align="left">A study of donor predictability among graduates of a school of education within a Research I, public university</td>
<td align="right">1996</td>
<td align="left"></td>
<td align="left">PHDTHESIS</td>
<td align="left">Regression</td>
<td align="left">1990-1999</td>
<td align="left"></td>
</tr>
<tr class="odd">
<td align="left">Robert A. Baade, Jeffrey O. Sundberg</td>
<td align="left">What determines alumni generosity?</td>
<td align="right">1996</td>
<td align="left">Economics of Education Review</td>
<td align="left">ARTICLE</td>
<td align="left">Regression</td>
<td align="left">1990-1999</td>
<td align="left"><a href="https://doi.org/10.1016/0272-7757(95)00026-7">10.1016/0272-7757(95)00026-7</a></td>
</tr>
<tr class="even">
<td align="left">Berger, Paul D, Smith, Gerald E</td>
<td align="left">The effect of direct mail framing strategies and segmentation variables on university fundraising performance</td>
<td align="right">1997</td>
<td align="left">Journal of Direct Marketing</td>
<td align="left">ARTICLE</td>
<td align="left">Descriptive Statistics</td>
<td align="left">1990-1999</td>
<td align="left"></td>
</tr>
<tr class="odd">
<td align="left">Goodman, Steve, Plouff, Gary</td>
<td align="left">Neural Network Modeling: Artificial Intelligence Marketing Hits the Non-Profit World</td>
<td align="right">1997</td>
<td align="left">Fund Raising Management</td>
<td align="left">ARTICLE</td>
<td align="left">Neural Networks</td>
<td align="left">1990-1999</td>
<td align="left"></td>
</tr>
<tr class="even">
<td align="left">Jim Toohill, Lisa Mullins, Jean Barclay, Mike Sadnicki</td>
<td align="left">Turning data into donations: A predictive model for individual giving</td>
<td align="right">1997</td>
<td align="left">International Journal of Nonprofit and Voluntary Sector Marketing</td>
<td align="left">ARTICLE</td>
<td align="left">Markov Chains</td>
<td align="left">1990-1999</td>
<td align="left"><a href="https://doi.org/10.1002/nvsm.6090020205">10.1002/nvsm.6090020205</a></td>
</tr>
<tr class="odd">
<td align="left">Okunade, Albert A, Berl, Robert L</td>
<td align="left">Determinants of charitable giving of business school alumni</td>
<td align="right">1997</td>
<td align="left">Research in higher education</td>
<td align="left">ARTICLE</td>
<td align="left">Regression</td>
<td align="left">1990-1999</td>
<td align="left"></td>
</tr>
<tr class="even">
<td align="left">Schlegelmilch, Bodo B, Love, Alix, Diamantopoulos, Adamantios</td>
<td align="left">Responses to different charity appeals: the impact of donor characteristics on the amount of donations</td>
<td align="right">1997</td>
<td align="left">European Journal of Marketing</td>
<td align="left">ARTICLE</td>
<td align="left">Regression</td>
<td align="left">1990-1999</td>
<td align="left"></td>
</tr>
<tr class="odd">
<td align="left">Adrian Sargeant</td>
<td align="left">Donor lifetime value: An empirical analysis</td>
<td align="right">1998</td>
<td align="left">International Journal of Nonprofit and Voluntary Sector Marketing</td>
<td align="left">ARTICLE</td>
<td align="left">Lifetime Value</td>
<td align="left">1990-1999</td>
<td align="left"><a href="https://doi.org/10.1002/nvsm.6090030403">10.1002/nvsm.6090030403</a></td>
</tr>
<tr class="even">
<td align="left">Tim Hunter, Richard Hill</td>
<td align="left">Prediction of donor lifetime value and the development of true segmented donor strategy</td>
<td align="right">1998</td>
<td align="left">International Journal of Nonprofit and Voluntary Sector Marketing</td>
<td align="left">ARTICLE</td>
<td align="left">Lifetime Value</td>
<td align="left">1990-1999</td>
<td align="left"><a href="https://doi.org/10.1002/nvsm.6090030405">10.1002/nvsm.6090030405</a></td>
</tr>
<tr class="odd">
<td align="left">von der Liihe, Markus</td>
<td align="left">How to get more donors: Unicef database marketing and data mining for non-commercial organizations</td>
<td align="right">1998</td>
<td align="left">WIT Transactions on Information and Communication Technologies</td>
<td align="left">ARTICLE</td>
<td align="left">CHAID</td>
<td align="left">1990-1999</td>
<td align="left"></td>
</tr>
<tr class="even">
<td align="left">Brian Duncan</td>
<td align="left">Modeling charitable contributions of time and money</td>
<td align="right">1999</td>
<td align="left">Journal of Public Economics</td>
<td align="left">ARTICLE</td>
<td align="left">Regression</td>
<td align="left">1990-1999</td>
<td align="left"><a href="https://doi.org/10.1016/s0047-2727(98)00097-8">10.1016/s0047-2727(98)00097-8</a></td>
</tr>
<tr class="odd">
<td align="left">Catrelia S. Hunter, Enid B. Jones, Charlotte Boger</td>
<td align="left">A Study of the Relationship between Alumni Giving and Selected Characteristics of Alumni Donors of Livingstone College, {NC</td>
<td align="right">1999</td>
<td align="left">Journal of Black Studies</td>
<td align="left">ARTICLE</td>
<td align="left">Descriptive Statistics</td>
<td align="left">1990-1999</td>
<td align="left"><a href="https://doi.org/10.1177/002193479902900404">10.1177/002193479902900404</a></td>
</tr>
<tr class="even">
<td align="left">Selig,Camden W.</td>
<td align="left">A study of donor predictability among alumni athletes at the University of Virginia</td>
<td align="right">1999</td>
<td align="left"></td>
<td align="left">PHDTHESIS</td>
<td align="left">Regression</td>
<td align="left">1990-1999</td>
<td align="left"></td>
</tr>
<tr class="odd">
<td align="left">C.R. Belfield, A.P. Beney</td>
<td align="left">What Determines Alumni Generosity? Evidence for the {UK</td>
<td align="right">2000</td>
<td align="left">Education Economics</td>
<td align="left">ARTICLE</td>
<td align="left">Regression</td>
<td align="left">2000-2009</td>
<td align="left"><a href="https://doi.org/10.1080/096452900110300">10.1080/096452900110300</a></td>
</tr>
<tr class="even">
<td align="left">Hanson, Sheila Kay</td>
<td align="left">Alumni Characteristics that Predict Promoting and Donating to Alma Mater: Implications for Alumni Relations</td>
<td align="right">2000</td>
<td align="left"></td>
<td align="left">PHDTHESIS</td>
<td align="left">Regression</td>
<td align="left">2000-2009</td>
<td align="left"></td>
</tr>
<tr class="odd">
<td align="left">Janet S. Greenlee, John M. Trussel</td>
<td align="left">Predicting the Financial Vulnerability of Charitable Organizations</td>
<td align="right">2000</td>
<td align="left">Nonprofit Management and Leadership</td>
<td align="left">ARTICLE</td>
<td align="left">Regression</td>
<td align="left">2000-2009</td>
<td align="left"><a href="https://doi.org/10.1002/nml.11205">10.1002/nml.11205</a></td>
</tr>
<tr class="even">
<td align="left">Tobin M. Aldrich</td>
<td align="left">How much are new donors worth? Making donor recruitment investment decisions based on lifetime value analysis</td>
<td align="right">2000</td>
<td align="left">International Journal of Nonprofit and Voluntary Sector Marketing</td>
<td align="left">ARTICLE</td>
<td align="left">Lifetime Value</td>
<td align="left">2000-2009</td>
<td align="left"><a href="https://doi.org/10.1002/nvsm.99">10.1002/nvsm.99</a></td>
</tr>
<tr class="odd">
<td align="left">Jennifer Key</td>
<td align="left">Enhancing fundraising success with custom data modelling</td>
<td align="right">2001</td>
<td align="left">International Journal of Nonprofit and Voluntary Sector Marketing</td>
<td align="left">ARTICLE</td>
<td align="left">Regression</td>
<td align="left">2000-2009</td>
<td align="left"><a href="https://doi.org/10.1002/nvsm.159">10.1002/nvsm.159</a></td>
</tr>
<tr class="even">
<td align="left">Tim Drye, Graham Wetherill, Alison Pinnock</td>
<td align="left">Donor survival analysis: an alternative perspective on lifecycle modelling</td>
<td align="right">2001</td>
<td align="left">International Journal of Nonprofit and Voluntary Sector Marketing</td>
<td align="left">ARTICLE</td>
<td align="left">Survival Analysis</td>
<td align="left">2000-2009</td>
<td align="left"><a href="https://doi.org/10.1002/nvsm.158">10.1002/nvsm.158</a></td>
</tr>
<tr class="odd">
<td align="left">Charles J. Quigley, Frank G. Bingham, Keith B. Murray</td>
<td align="left">An Analysis of the Impact of Acknowledgement Programs on Alumni Giving</td>
<td align="right">2002</td>
<td align="left">Journal of Marketing Theory and Practice</td>
<td align="left">ARTICLE</td>
<td align="left">Descriptive Statistics</td>
<td align="left">2000-2009</td>
<td align="left"><a href="https://doi.org/10.1080/10696679.2002.11501921">10.1080/10696679.2002.11501921</a></td>
</tr>
<tr class="even">
<td align="left">Cunningham, Brendan M, Cochi-Ficano, Carlena K</td>
<td align="left">The determinants of donative revenue flows from alumni of higher education: An empirical inquiry</td>
<td align="right">2002</td>
<td align="left">Journal of Human resources</td>
<td align="left">ARTICLE</td>
<td align="left">Regression</td>
<td align="left">2000-2009</td>
<td align="left"></td>
</tr>
<tr class="odd">
<td align="left">Rob Potharst, Uzay Kaymak, Wim Pijls</td>
<td align="left">Neural Networks for Target Selection in Direct Marketing</td>
<td align="right">2002</td>
<td align="left"></td>
<td align="left">INCOLLECTION</td>
<td align="left">Ensemble</td>
<td align="left">2000-2009</td>
<td align="left"><a href="https://doi.org/10.4018/978-1-930708-31-0.ch006">10.4018/978-1-930708-31-0.ch006</a></td>
</tr>
<tr class="even">
<td align="left">Bingham Jr, Frank G, Quigley Jr, Charles J, Murray, Keith B</td>
<td align="left">An investigation of the influence acknowledgement programs have on alumni giving behavior: Implications for marketing strategy</td>
<td align="right">2003</td>
<td align="left">Journal of Marketing for Higher Education</td>
<td align="left">ARTICLE</td>
<td align="left">Descriptive Statistics</td>
<td align="left">2000-2009</td>
<td align="left"></td>
</tr>
<tr class="odd">
<td align="left">Monks, James</td>
<td align="left">Patterns of giving to one’s alma mater among young graduates from selective institutions</td>
<td align="right">2003</td>
<td align="left">Economics of Education review</td>
<td align="left">ARTICLE</td>
<td align="left">Regression</td>
<td align="left">2000-2009</td>
<td align="left"></td>
</tr>
<tr class="even">
<td align="left">Roger Bennett</td>
<td align="left">Factors underlying the inclination to donate to particular types of charity</td>
<td align="right">2003</td>
<td align="left">International Journal of Nonprofit and Voluntary Sector Marketing</td>
<td align="left">ARTICLE</td>
<td align="left">Regression</td>
<td align="left">2000-2009</td>
<td align="left"><a href="https://doi.org/10.1002/nvsm.198">10.1002/nvsm.198</a></td>
</tr>
<tr class="odd">
<td align="left">Hoyt, Jeff E</td>
<td align="left">Understanding alumni giving: Theory and predictors of donor status</td>
<td align="right">2004</td>
<td align="left"></td>
<td align="left">INPROCEEDINGS</td>
<td align="left">Regression</td>
<td align="left">2000-2009</td>
<td align="left"></td>
</tr>
<tr class="even">
<td align="left">Wylie, Peter B</td>
<td align="left">Data mining for fund raisers: How to use simple statistics to find the gold in your donor database–even if you hate statistics: A starter guide</td>
<td align="right">2004</td>
<td align="left"></td>
<td align="left">BOOK</td>
<td align="left">Descriptive Statistics</td>
<td align="left">2000-2009</td>
<td align="left"></td>
</tr>
<tr class="odd">
<td align="left">Marr, Kelly A, Mullin, Charles H, Siegfried, John J</td>
<td align="left">Undergraduate financial aid and subsequent alumni giving behavior</td>
<td align="right">2005</td>
<td align="left">The Quarterly Review of Economics and Finance</td>
<td align="left">ARTICLE</td>
<td align="left">Regression</td>
<td align="left">2000-2009</td>
<td align="left"></td>
</tr>
<tr class="even">
<td align="left">Robert Gunsalus</td>
<td align="left">The Relationship of Institutional Characteristics and Giving Participation Rates of Alumni</td>
<td align="right">2005</td>
<td align="left">International Journal of Educational Advancement</td>
<td align="left">ARTICLE</td>
<td align="left">Descriptive Statistics</td>
<td align="left">2000-2009</td>
<td align="left"><a href="https://doi.org/10.1057/palgrave.ijea.2140214">10.1057/palgrave.ijea.2140214</a></td>
</tr>
<tr class="odd">
<td align="left">Scott Gaier</td>
<td align="left">Alumni Satisfaction with Their Undergraduate Academic Experience and the Impact on Alumni Giving and Participation</td>
<td align="right">2005</td>
<td align="left">International Journal of Educational Advancement</td>
<td align="left">ARTICLE</td>
<td align="left">Regression</td>
<td align="left">2000-2009</td>
<td align="left"><a href="https://doi.org/10.1057/palgrave.ijea.2140220">10.1057/palgrave.ijea.2140220</a></td>
</tr>
<tr class="even">
<td align="left">Tsao,James C., Coll,Gary</td>
<td align="left">To Give or Not to Give: Factors Determining Alumni Intent to Make Donations as a PR Outcome</td>
<td align="right">2005</td>
<td align="left">Journalism &amp; Mass Communication Educator</td>
<td align="left">ARTICLE</td>
<td align="left">Regression</td>
<td align="left">2000-2009</td>
<td align="left"></td>
</tr>
<tr class="odd">
<td align="left">Roger Bennett</td>
<td align="left">Predicting the Lifetime Durations of Donors to Charities</td>
<td align="right">2006</td>
<td align="left">Journal of Nonprofit {&amp;} Public Sector Marketing</td>
<td align="left">ARTICLE</td>
<td align="left">Regression</td>
<td align="left">2000-2009</td>
<td align="left"><a href="https://doi.org/10.1300/j054v15n01_03">10.1300/j054v15n01_03</a></td>
</tr>
<tr class="even">
<td align="left">Bohannon, Tom</td>
<td align="left">Predictive modelling in higher education</td>
<td align="right">2007</td>
<td align="left"></td>
<td align="left">INPROCEEDINGS</td>
<td align="left">Regression</td>
<td align="left">2000-2009</td>
<td align="left"></td>
</tr>
<tr class="odd">
<td align="left">Diehl, Abigail G</td>
<td align="left">The relationship between alumni giving and receipt of institutional scholarships among undergraduate students at a public, land-grant institution</td>
<td align="right">2007</td>
<td align="left"></td>
<td align="left">PHDTHESIS</td>
<td align="left">Regression</td>
<td align="left">2000-2009</td>
<td align="left"></td>
</tr>
<tr class="even">
<td align="left">Terry, Neil, Macy, Anne</td>
<td align="left">Determinants of alumni giving rates</td>
<td align="right">2007</td>
<td align="left">Journal of Economics and Economic Education Research</td>
<td align="left">ARTICLE</td>
<td align="left">Regression</td>
<td align="left">2000-2009</td>
<td align="left"></td>
</tr>
<tr class="odd">
<td align="left">Xiaogeng Sun, Sharon C Hoffman, Marilyn L Grady</td>
<td align="left">A multivariate causal model of alumni giving: Implications for alumni fundraisers</td>
<td align="right">2007</td>
<td align="left">International Journal of Educational Advancement</td>
<td align="left">ARTICLE</td>
<td align="left">Regression</td>
<td align="left">2000-2009</td>
<td align="left"></td>
</tr>
<tr class="even">
<td align="left">Denizard-Ramsamy, Wilhelrm, Medina-Borja, Alexandra</td>
<td align="left">Using chaid as a method to predict financial vulnerablity in non-profit organizations</td>
<td align="right">2008</td>
<td align="left"></td>
<td align="left">INPROCEEDINGS</td>
<td align="left">CHAID</td>
<td align="left">2000-2009</td>
<td align="left"></td>
</tr>
<tr class="odd">
<td align="left">Jonathan Meer, Harvey Rosen</td>
<td align="left">The Impact of Athletic Performance on Alumni Giving: An Analysis of Micro Data</td>
<td align="right">2008</td>
<td align="left"></td>
<td align="left">TECHREPORT</td>
<td align="left">Regression</td>
<td align="left">2000-2009</td>
<td align="left"><a href="https://doi.org/10.3386/w13937">10.3386/w13937</a></td>
</tr>
<tr class="even">
<td align="left">Joshua Birkholz</td>
<td align="left">Fundraising analytics: Using data to guide strategy</td>
<td align="right">2008</td>
<td align="left"></td>
<td align="left">BOOK</td>
<td align="left">Other</td>
<td align="left">2000-2009</td>
<td align="left"></td>
</tr>
<tr class="odd">
<td align="left">Lawley,Cecelia D.</td>
<td align="left">Factors that affect alumni loyalty at a public university</td>
<td align="right">2008</td>
<td align="left"></td>
<td align="left">PHDTHESIS</td>
<td align="left">Regression</td>
<td align="left">2000-2009</td>
<td align="left"></td>
</tr>
<tr class="even">
<td align="left">Anyuan Shen, Chih-Yang Tsai</td>
<td align="left">Are single-gift committed donors different from their multiple-gift counterparts?</td>
<td align="right">2009</td>
<td align="left">International Journal of Nonprofit and Voluntary Sector Marketing</td>
<td align="left">ARTICLE</td>
<td align="left">Regression</td>
<td align="left">2000-2009</td>
<td align="left"><a href="https://doi.org/10.1002/nvsm.387">10.1002/nvsm.387</a></td>
</tr>
<tr class="odd">
<td align="left">David J. Weerts, Justin M. Ronca</td>
<td align="left">Using classification trees to predict alumni giving for higher education</td>
<td align="right">2009</td>
<td align="left">Education Economics</td>
<td align="left">ARTICLE</td>
<td align="left">Machine Learning</td>
<td align="left">2000-2009</td>
<td align="left"><a href="https://doi.org/10.1080/09645290801976985">10.1080/09645290801976985</a></td>
</tr>
<tr class="even">
<td align="left">Hashemi,Ray R., Le Blanc,Louis,A., Bahrami,Azita A., Bahar,Mahmood, Traywick,Bryan</td>
<td align="left">Association Analysis of Alumni Giving: A Formal Concept Analysis</td>
<td align="right">2009</td>
<td align="left">International Journal of Intelligent Information Technologies</td>
<td align="left">ARTICLE</td>
<td align="left">Other</td>
<td align="left">2000-2009</td>
<td align="left"></td>
</tr>
<tr class="odd">
<td align="left">J Travis McDearmon, Kathryn Shirley</td>
<td align="left">Characteristics and institutional factors related to young alumni donors and non-donors</td>
<td align="right">2009</td>
<td align="left">International Journal of Educational Advancement</td>
<td align="left">ARTICLE</td>
<td align="left">Regression</td>
<td align="left">2000-2009</td>
<td align="left"><a href="https://doi.org/10.1057/ijea.2009.29">10.1057/ijea.2009.29</a></td>
</tr>
<tr class="even">
<td align="left">Jessica Holmes</td>
<td align="left">Prestige, charitable deductions and other determinants of alumni giving: Evidence from a highly selective liberal arts college</td>
<td align="right">2009</td>
<td align="left">Economics of Education Review</td>
<td align="left">ARTICLE</td>
<td align="left">Regression</td>
<td align="left">2000-2009</td>
<td align="left"><a href="https://doi.org/10.1016/j.econedurev.2007.10.008">10.1016/j.econedurev.2007.10.008</a></td>
</tr>
<tr class="odd">
<td align="left">Louis A Le Blanc, Conway T Rucks</td>
<td align="left">Data mining of university philanthropic giving: Cluster-discriminant analysis and Pareto effects</td>
<td align="right">2009</td>
<td align="left">International Journal of Educational Advancement</td>
<td align="left">ARTICLE</td>
<td align="left">Clustering</td>
<td align="left">2000-2009</td>
<td align="left"><a href="https://doi.org/10.1057/ijea.2009.28">10.1057/ijea.2009.28</a></td>
</tr>
<tr class="even">
<td align="left">Luperchio, Dan</td>
<td align="left">Data Mining and Predictive Modeling in Institutional Advancement: How Ten Schools Found Success. Technical Report.</td>
<td align="right">2009</td>
<td align="left"></td>
<td align="left">TECHREPORT</td>
<td align="left">Clustering</td>
<td align="left">2000-2009</td>
<td align="left"></td>
</tr>
<tr class="odd">
<td align="left">Nigel Magson, Claire Routley</td>
<td align="left">Using data in legacy fundraising: a practical approach</td>
<td align="right">2009</td>
<td align="left">International Journal of Nonprofit and Voluntary Sector Marketing</td>
<td align="left">ARTICLE</td>
<td align="left">Descriptive Statistics</td>
<td align="left">2000-2009</td>
<td align="left"><a href="https://doi.org/10.1002/nvsm.374">10.1002/nvsm.374</a></td>
</tr>
<tr class="even">
<td align="left">Qin Chen</td>
<td align="left">Predictive modeling for non-profit fundraising</td>
<td align="right">2010</td>
<td align="left"></td>
<td align="left">MASTERSTHESIS</td>
<td align="left">Ensemble</td>
<td align="left">2010-2019</td>
<td align="left"></td>
</tr>
<tr class="odd">
<td align="left">Stephan Dickert, Namika Sagara, Paul Slovic</td>
<td align="left">Affective motivations to help others: A two-stage model of donation decisions</td>
<td align="right">2010</td>
<td align="left">Journal of Behavioral Decision Making</td>
<td align="left">ARTICLE</td>
<td align="left">Regression</td>
<td align="left">2010-2019</td>
<td align="left"><a href="https://doi.org/10.1002/bdm.697">10.1002/bdm.697</a></td>
</tr>
<tr class="even">
<td align="left">Thompson,Lori A.</td>
<td align="left">Data mining for higher education advancement: A study of eight North American colleges and universities</td>
<td align="right">2010</td>
<td align="left"></td>
<td align="left">PHDTHESIS</td>
<td align="left">Regression</td>
<td align="left">2010-2019</td>
<td align="left"></td>
</tr>
<tr class="odd">
<td align="left">Verhaert, Griet</td>
<td align="left">The Role of Database Marketing in Improving Direct Mail Fundraising</td>
<td align="right">2010</td>
<td align="left"></td>
<td align="left">PHDTHESIS</td>
<td align="left">Regression</td>
<td align="left">2010-2019</td>
<td align="left"></td>
</tr>
<tr class="even">
<td align="left">Angela C.M. de Oliveira, Rachel T.A. Croson, Catherine Eckel</td>
<td align="left">The giving type: Identifying donors</td>
<td align="right">2011</td>
<td align="left">Journal of Public Economics</td>
<td align="left">ARTICLE</td>
<td align="left">Regression</td>
<td align="left">2010-2019</td>
<td align="left"><a href="https://doi.org/10.1016/j.jpubeco.2010.11.012">10.1016/j.jpubeco.2010.11.012</a></td>
</tr>
<tr class="odd">
<td align="left">Donald N. Steinnes</td>
<td align="left">An Econometric Analysis Of Aging And Alumni/ae Altruism</td>
<td align="right">2011</td>
<td align="left">International Business {&amp;} Economics Research Journal ({IBER})</td>
<td align="left">ARTICLE</td>
<td align="left">Regression</td>
<td align="left">2010-2019</td>
<td align="left"><a href="https://doi.org/10.19030/iber.v1i5.3924">10.19030/iber.v1i5.3924</a></td>
</tr>
<tr class="even">
<td align="left">Melissa D Newman</td>
<td align="left">Does membership matter? Examining the relationship between alumni association membership and alumni giving</td>
<td align="right">2011</td>
<td align="left">International Journal of Educational Advancement</td>
<td align="left">ARTICLE</td>
<td align="left">Descriptive Statistics</td>
<td align="left">2010-2019</td>
<td align="left"><a href="https://doi.org/10.1057/ijea.2011.5">10.1057/ijea.2011.5</a></td>
</tr>
<tr class="odd">
<td align="left">Baruch, Yehuda, Sang, Katherine JC</td>
<td align="left">Predicting MBA graduates’ donation behaviour to their alma mater</td>
<td align="right">2012</td>
<td align="left">Journal of Management Development</td>
<td align="left">ARTICLE</td>
<td align="left">Regression</td>
<td align="left">2010-2019</td>
<td align="left"><a href="https://doi.org/10.1108/02621711211253268">10.1108/02621711211253268</a></td>
</tr>
<tr class="even">
<td align="left">Gitae Kim, Bongsug Kevin Chae, David L. Olson</td>
<td align="left">A support vector machine ({SVM}) approach to imbalanced datasets of customer responses: comparison with other customer response models</td>
<td align="right">2012</td>
<td align="left">Service Business</td>
<td align="left">ARTICLE</td>
<td align="left">Support Vector Machines</td>
<td align="left">2010-2019</td>
<td align="left"><a href="https://doi.org/10.1007/s11628-012-0147-9">10.1007/s11628-012-0147-9</a></td>
</tr>
<tr class="odd">
<td align="left">Jonathan Meer, Harvey S. Rosen</td>
<td align="left">Does generosity beget generosity? Alumni giving and undergraduate financial aid</td>
<td align="right">2012</td>
<td align="left">Economics of Education Review</td>
<td align="left">ARTICLE</td>
<td align="left">Regression</td>
<td align="left">2010-2019</td>
<td align="left"><a href="https://doi.org/10.1016/j.econedurev.2012.06.009">10.1016/j.econedurev.2012.06.009</a></td>
</tr>
<tr class="even">
<td align="left">Loveday, Christine Hawk</td>
<td align="left">An analysis of the variables associated with alumni giving and employee giving to a mid-sized southeastern university</td>
<td align="right">2012</td>
<td align="left"></td>
<td align="left">PHDTHESIS</td>
<td align="left">Descriptive Statistics</td>
<td align="left">2010-2019</td>
<td align="left"></td>
</tr>
<tr class="odd">
<td align="left">Andrew Tiger, Landon Preston</td>
<td align="left">Logged In And Connected? A Quantitative Analysis Of Online Course Use And Alumni Giving</td>
<td align="right">2013</td>
<td align="left">American Journal of Business Education ({AJBE})</td>
<td align="left">ARTICLE</td>
<td align="left">Regression</td>
<td align="left">2010-2019</td>
<td align="left"><a href="https://doi.org/10.19030/ajbe.v6i3.7816">10.19030/ajbe.v6i3.7816</a></td>
</tr>
<tr class="even">
<td align="left">Arnaud De Bruyn, Sonja Prokopec</td>
<td align="left">Opening a donor’s wallet: The influence of appeal scales on likelihood and magnitude of donation</td>
<td align="right">2013</td>
<td align="left">Journal of Consumer Psychology</td>
<td align="left">ARTICLE</td>
<td align="left">Descriptive Statistics</td>
<td align="left">2010-2019</td>
<td align="left"><a href="https://doi.org/10.1016/j.jcps.2013.03.004">10.1016/j.jcps.2013.03.004</a></td>
</tr>
<tr class="odd">
<td align="left">Christen Lara, Daniel Johnson</td>
<td align="left">The anatomy of a likely donor: econometric evidence on philanthropy to higher education</td>
<td align="right">2013</td>
<td align="left">Education Economics</td>
<td align="left">ARTICLE</td>
<td align="left">Regression</td>
<td align="left">2010-2019</td>
<td align="left"><a href="https://doi.org/10.1080/09645292.2013.766672">10.1080/09645292.2013.766672</a></td>
</tr>
<tr class="even">
<td align="left">Durango-Cohen, Elizabeth J, Torres, Ram{&#8216;o}n L, Durango-Cohen, Pablo L</td>
<td align="left">Donor segmentation: When summary statistics don’t tell the whole story</td>
<td align="right">2013</td>
<td align="left">Journal of Interactive Marketing</td>
<td align="left">ARTICLE</td>
<td align="left">Clustering</td>
<td align="left">2010-2019</td>
<td align="left"></td>
</tr>
<tr class="odd">
<td align="left">Elizabeth J Durango-Cohen</td>
<td align="left">Modeling contribution behavior in fundraising: Segmentation analysis for a public broadcasting station</td>
<td align="right">2013</td>
<td align="left">European Journal of Operational Research</td>
<td align="left">ARTICLE</td>
<td align="left">Ensemble</td>
<td align="left">2010-2019</td>
<td align="left"><a href="https://doi.org/10.1016/j.ejor.2013.01.008">10.1016/j.ejor.2013.01.008</a></td>
</tr>
<tr class="even">
<td align="left">Johnson,Elizabeth A. M.</td>
<td align="left">Factors associated with non-traditional and traditional undergraduate alumni giving to alma maters</td>
<td align="right">2013</td>
<td align="left"></td>
<td align="left">PHDTHESIS</td>
<td align="left">Descriptive Statistics</td>
<td align="left">2010-2019</td>
<td align="left"></td>
</tr>
<tr class="odd">
<td align="left">Ketter, Jason W</td>
<td align="left">Predictors of Alumni Donor Behavior in Graduates of the Traditional MBA and iMBA Programs at The Pennsylvania State University</td>
<td align="right">2013</td>
<td align="left"></td>
<td align="left">PHDTHESIS</td>
<td align="left">Regression</td>
<td align="left">2010-2019</td>
<td align="left"></td>
</tr>
<tr class="even">
<td align="left">Miller,Myra E.</td>
<td align="left">Why alumni give: How campus environment and sense of belonging shape nontraditional students’ intent to give financially to their university</td>
<td align="right">2013</td>
<td align="left"></td>
<td align="left">PHDTHESIS</td>
<td align="left">Descriptive Statistics</td>
<td align="left">2010-2019</td>
<td align="left"></td>
</tr>
<tr class="odd">
<td align="left">Pablo L. Durango-Cohen, Elizabeth J. Durango-Cohen, Ram{&#8216;{o}}n L. Torres</td>
<td align="left">A Bernoulli{}Gaussian mixture model of donation likelihood and monetary value: An application to alumni segmentation in a university setting</td>
<td align="right">2013</td>
<td align="left">Computers {&amp;} Industrial Engineering</td>
<td align="left">ARTICLE</td>
<td align="left">Clustering</td>
<td align="left">2010-2019</td>
<td align="left"><a href="https://doi.org/10.1016/j.cie.2013.08.007">10.1016/j.cie.2013.08.007</a></td>
</tr>
<tr class="even">
<td align="left">Sangkil Moon, Kathryn Azizi</td>
<td align="left">Finding Donors by Relationship Fundraising</td>
<td align="right">2013</td>
<td align="left">Journal of Interactive Marketing</td>
<td align="left">ARTICLE</td>
<td align="left">Ensemble</td>
<td align="left">2010-2019</td>
<td align="left"><a href="https://doi.org/10.1016/j.intmar.2012.10.002">10.1016/j.intmar.2012.10.002</a></td>
</tr>
<tr class="odd">
<td align="left">Truitt, Joshua</td>
<td align="left">The relationship between student engagement and recent alumni donors at Carnegie baccalaureate colleges located in the southeastern United States</td>
<td align="right">2013</td>
<td align="left"></td>
<td align="left">PHDTHESIS</td>
<td align="left">Regression</td>
<td align="left">2010-2019</td>
<td align="left"></td>
</tr>
<tr class="even">
<td align="left">Kevin MacDonell, Peter Wylie</td>
<td align="left">Score! Data-Driven Success for Your Advancement Team</td>
<td align="right">2014</td>
<td align="left"></td>
<td align="left">BOOK</td>
<td align="left">Other</td>
<td align="left">2010-2019</td>
<td align="left"></td>
</tr>
<tr class="odd">
<td align="left">Lisa Ann Skari</td>
<td align="left">Community college alumni: Predicting who gives</td>
<td align="right">2014</td>
<td align="left">Community College Review</td>
<td align="left">ARTICLE</td>
<td align="left">Regression</td>
<td align="left">2010-2019</td>
<td align="left"></td>
</tr>
<tr class="even">
<td align="left">Morgan, Robert Andrew</td>
<td align="left">Factors that lead Millennial alumni to donate to their alma mater</td>
<td align="right">2014</td>
<td align="left"></td>
<td align="left">PHDTHESIS</td>
<td align="left">Regression</td>
<td align="left">2010-2019</td>
<td align="left"></td>
</tr>
<tr class="odd">
<td align="left">Nicholas Rau</td>
<td align="left">Predictive Modeling of Alumni Donors: An Engagement Model for Fundraising in Postsecondary Education</td>
<td align="right">2014</td>
<td align="left"></td>
<td align="left">PHDTHESIS</td>
<td align="left">Regression</td>
<td align="left">2010-2019</td>
<td align="left"></td>
</tr>
<tr class="even">
<td align="left">Ropp, Christopher Tylerr</td>
<td align="left">The relationship between student academic engagement and alumni giving at a public, state flagship university</td>
<td align="right">2014</td>
<td align="left"></td>
<td align="left">PHDTHESIS</td>
<td align="left">Regression</td>
<td align="left">2010-2019</td>
<td align="left"></td>
</tr>
<tr class="odd">
<td align="left">Sarunya Lertputtarak, Surat Supitchayangkool</td>
<td align="left">Factors Influencing Alumni Donations</td>
<td align="right">2014</td>
<td align="left">International Journal of Business and Management</td>
<td align="left">ARTICLE</td>
<td align="left">Regression</td>
<td align="left">2010-2019</td>
<td align="left"><a href="https://doi.org/10.5539/ijbm.v9n3p170">10.5539/ijbm.v9n3p170</a></td>
</tr>
<tr class="even">
<td align="left">Torres,Ramon L.</td>
<td align="left">Dynamic Segmentation Modeling: Application of Finite Mixture Models to Explain the Giving Behavior of Donors in a University Setting</td>
<td align="right">2014</td>
<td align="left"></td>
<td align="left">PHDTHESIS</td>
<td align="left">Ensemble</td>
<td align="left">2010-2019</td>
<td align="left"></td>
</tr>
<tr class="odd">
<td align="left">Udenze, Adrian</td>
<td align="left">APPLICATION OF DATA MINING TECHNIQUES TO PROBLEMS IN FUND RAISING</td>
<td align="right">2014</td>
<td align="left">International Journal of Current Research and Review</td>
<td align="left">ARTICLE</td>
<td align="left">Ensemble</td>
<td align="left">2010-2019</td>
<td align="left"></td>
</tr>
<tr class="even">
<td align="left">Weizeng Zhang</td>
<td align="left">Segmentation modeling: Applications of Finite Mixture Regression Models in University Fundraising and Management of Transportation Infrastructure</td>
<td align="right">2014</td>
<td align="left"></td>
<td align="left">PHDTHESIS</td>
<td align="left">Clustering</td>
<td align="left">2010-2019</td>
<td align="left"></td>
</tr>
<tr class="odd">
<td align="left">Durango-Cohen, Elizabeth J, Balasubramanian, Siva K</td>
<td align="left">Effective segmentation of university alumni: Mining contribution data with finite-mixture models</td>
<td align="right">2015</td>
<td align="left">Research in Higher Education</td>
<td align="left">ARTICLE</td>
<td align="left">Clustering</td>
<td align="left">2010-2019</td>
<td align="left"></td>
</tr>
<tr class="even">
<td align="left">Jinwook Chung, Kyumin Lee</td>
<td align="left">A Long-Term Study of a Crowdfunding Platform</td>
<td align="right">2015</td>
<td align="left"></td>
<td align="left">INPROCEEDINGS</td>
<td align="left">Ensemble</td>
<td align="left">2010-2019</td>
<td align="left"><a href="https://doi.org/10.1145/2700171.2791045">10.1145/2700171.2791045</a></td>
</tr>
<tr class="odd">
<td align="left">Kakrala, Ramcharan, Chakraborty, Goutam</td>
<td align="left">Donor Sentiment and Characteristic Analysis Using SAS® Enterprise Miner<img src="https://s.w.org/images/core/emoji/13.1.0/72x72/2122.png" alt="™" class="wp-smiley" style="height: 1em; max-height: 1em;" /> and SAS® Sentiment Analysis Studio</td>
<td align="right">2015</td>
<td align="left"></td>
<td align="left">INPROCEEDINGS</td>
<td align="left">Ensemble</td>
<td align="left">2010-2019</td>
<td align="left"><a href="https://doi.org/10.13140/RG.2.1.2716.7842">10.13140/RG.2.1.2716.7842</a></td>
</tr>
<tr class="even">
<td align="left">Mark E Walcott</td>
<td align="left">Predictive modeling and alumni fundraising in higher education</td>
<td align="right">2015</td>
<td align="left"></td>
<td align="left">PHDTHESIS</td>
<td align="left">Regression</td>
<td align="left">2010-2019</td>
<td align="left"></td>
</tr>
<tr class="odd">
<td align="left">Nicholas E Rau, T Dary Erwin</td>
<td align="left">Using student engagement to predict alumni donors: An analytical model</td>
<td align="right">2015</td>
<td align="left">The Journal of Nonprofit Education and Leadership</td>
<td align="left">ARTICLE</td>
<td align="left">Regression</td>
<td align="left">2010-2019</td>
<td align="left"></td>
</tr>
<tr class="even">
<td align="left">Chanmin Park, Yong Jae Ko, Hee Youn Kim, Michael Sagas, Melfy Eddosary</td>
<td align="left">Donor motivation in college sport: Does contribution level matter?</td>
<td align="right">2016</td>
<td align="left">Social Behavior and Personality: an international journal</td>
<td align="left">ARTICLE</td>
<td align="left">Regression</td>
<td align="left">2010-2019</td>
<td align="left"><a href="https://doi.org/10.2224/sbp.2016.44.6.1015">10.2224/sbp.2016.44.6.1015</a></td>
</tr>
<tr class="odd">
<td align="left">Pinion, Tyson L</td>
<td align="left">Factors That Influence Alumni Giving at Three Private Universities</td>
<td align="right">2016</td>
<td align="left"></td>
<td align="left">PHDTHESIS</td>
<td align="left">Regression</td>
<td align="left">2010-2019</td>
<td align="left"></td>
</tr>
<tr class="even">
<td align="left">Tania M. Veludo-de-Oliveira, Ibrahim S. Alhaidari, Mirella Yani-de-Soriano, Shumaila Y. Yousafzai</td>
<td align="left">Comparing the Explanatory and Predictive Power of Intention-Based Theories of Personal Monetary Donation to Charitable Organizations</td>
<td align="right">2016</td>
<td align="left">VOLUNTAS}: International Journal of Voluntary and Nonprofit Organizations</td>
<td align="left">ARTICLE</td>
<td align="left">Regression</td>
<td align="left">2010-2019</td>
<td align="left"><a href="https://doi.org/10.1007/s11266-016-9690-7">10.1007/s11266-016-9690-7</a></td>
</tr>
<tr class="odd">
<td align="left">Brunette, Charlie, Vo, Ngoc, Watanabe, Nicholas M</td>
<td align="left">Donation intention in current students: An analysis of university engagement and sense of place in future athletic, academic, and split donors</td>
<td align="right">2017</td>
<td align="left">Journal of Issues in Intercollegiate Athletics</td>
<td align="left">ARTICLE</td>
<td align="left">Regression</td>
<td align="left">2010-2019</td>
<td align="left"></td>
</tr>
<tr class="even">
<td align="left">David George {Vequist IV</td>
<td align="left">Nonprofit Fundraising Transformation through Analytics</td>
<td align="right">2017</td>
<td align="left"></td>
<td align="left">INCOLLECTION</td>
<td align="left">Social Media</td>
<td align="left">2010-2019</td>
<td align="left"></td>
</tr>
<tr class="odd">
<td align="left">Faisal, Ali</td>
<td align="left">An Investigation of the Relationship of Student Engagement to Alumni Giving at an Independent Technological University</td>
<td align="right">2017</td>
<td align="left"></td>
<td align="left">PHDTHESIS</td>
<td align="left">Regression</td>
<td align="left">2010-2019</td>
<td align="left"></td>
</tr>
<tr class="even">
<td align="left">Kenneth D Lawrence, Stephan Kudyba, Sheila M Lawrence</td>
<td align="left">Funding Analytics: Predictive Analysis in a Major State Research University</td>
<td align="right">2017</td>
<td align="left"></td>
<td align="left">INCOLLECTION</td>
<td align="left">Regression</td>
<td align="left">2010-2019</td>
<td align="left"><a href="https://doi.org/10.1108/S1477-407020170000012005">10.1108/S1477-407020170000012005</a></td>
</tr>
<tr class="odd">
<td align="left">Liang Ye</td>
<td align="left">A machine learning approach to fundraising success in higher education</td>
<td align="right">2017</td>
<td align="left"></td>
<td align="left">MASTERSTHESIS</td>
<td align="left">Ensemble</td>
<td align="left">2010-2019</td>
<td align="left"></td>
</tr>
<tr class="even">
<td align="left">AR Nandeshwar, R Devine</td>
<td align="left">Data Science for Fundraising: Build Data-driven Solutions Using R</td>
<td align="right">2018</td>
<td align="left"></td>
<td align="left">BOOK</td>
<td align="left">Other</td>
<td align="left">2010-2019</td>
<td align="left"></td>
</tr>
<tr class="odd">
<td align="left">Christian, Kelsey M</td>
<td align="left">Identifying Demographic Variables that can Predict Alumni Giving at a Regional Comprehensive Four-Year University in the South</td>
<td align="right">2018</td>
<td align="left"></td>
<td align="left">PHDTHESIS</td>
<td align="left">Regression</td>
<td align="left">2010-2019</td>
<td align="left"></td>
</tr>
<tr class="even">
<td align="left">Day, Deborah A</td>
<td align="left">Factors in the Undergraduate Experience that Influence Young Alumni Giving</td>
<td align="right">2018</td>
<td align="left"></td>
<td align="left">PHDTHESIS</td>
<td align="left">Regression</td>
<td align="left">2010-2019</td>
<td align="left"></td>
</tr>
<tr class="odd">
<td align="left">Liu, Fangyao, Feng, Xixi, Ouyang, Qinge</td>
<td align="left">Factors Exploration on Alumni Donation: A Case Study of Creighton University</td>
<td align="right">2018</td>
<td align="left">Journal of Contemporary Management</td>
<td align="left">ARTICLE</td>
<td align="left">Regression</td>
<td align="left">2010-2019</td>
<td align="left"></td>
</tr>
<tr class="even">
<td align="left">Natthawat Rattanamethawong, Sukree Sinthupinyo, Achara Chandrachai</td>
<td align="left">An innovation model of alumni relationship management: Alumni segmentation analysis</td>
<td align="right">2018</td>
<td align="left">Kasetsart Journal of Social Sciences</td>
<td align="left">ARTICLE</td>
<td align="left">Ensemble</td>
<td align="left">2010-2019</td>
<td align="left"><a href="https://doi.org/10.1016/j.kjss.2017.02.002">10.1016/j.kjss.2017.02.002</a></td>
</tr>
<tr class="odd">
<td align="left">U. N. Saraih, Nor Irwani Abdul Rahman, Norshahrizan Noordin, Sayang Nurshahrizleen Ramlan, Razli Ahmad, Mohd Fo’ad Sakdan, M. Harith Amlus</td>
<td align="left">Modelling Students’ Experience Towards the Development of Alumni Involvement and Alumni Loyalty</td>
<td align="right">2018</td>
<td align="left">MATEC} Web of Conferences</td>
<td align="left">ARTICLE</td>
<td align="left">Regression</td>
<td align="left">2010-2019</td>
<td align="left"><a href="https://doi.org/10.1051/matecconf/201815005050">10.1051/matecconf/201815005050</a></td>
</tr>
<tr class="even">
<td align="left">Lowe,LaKeisha D.</td>
<td align="left">Repeated College Alumni Giving: Application of the Commitment-Trust Theory of Relationship Marketing</td>
<td align="right">2019</td>
<td align="left"></td>
<td align="left">PHDTHESIS</td>
<td align="left">Regression</td>
<td align="left">2010-2019</td>
<td align="left"></td>
</tr>
<tr class="odd">
<td align="left">Naccarato,Shawn L.</td>
<td align="left">Predicting Alumni Giving at a Public Comprehensive Regional University: Predictive Multivariate Causal Models for Annual Giving, Significant Cumulative Giving, Major Giving, and Planned Giving</td>
<td align="right">2019</td>
<td align="left"></td>
<td align="left">PHDTHESIS</td>
<td align="left">Regression</td>
<td align="left">2010-2019</td>
<td align="left"></td>
</tr>
</tbody>

</table>



<h2>Download</h2>



<p><a aria-label="undefined (opens in a new tab)" href="https://d2py08v4b28rs4.cloudfront.net/wp-content/uploads/Predictive-Analytics-Survey-Literature-Review-Data-Science-Fundraising.pdf" target="_blank" rel="noreferrer noopener">Download a pre-print (and draft) version of this literature review.</a></p>
<span class="tve-leads-two-step-trigger tl-2step-trigger-2626"></span><span class="tve-leads-two-step-trigger tl-2step-trigger-0"></span><p>The post <a rel="nofollow" href="https://nandeshwar.info/data-science-2/survey-of-predictive-analytics-in-fundraising/">Survey of Predictive Analytics in Fundraising</a> appeared first on <a rel="nofollow" href="https://nandeshwar.info">nandeshwar.info</a>.</p>
]]></content:encoded>
					
		
		
			</item>
		<item>
		<title>How to Automate Statistical Analysis using RMarkdown</title>
		<link>https://nandeshwar.info/data-science-2/how-to-automate-statistical-analysis-using-rmarkdown/</link>
					<comments>https://nandeshwar.info/data-science-2/how-to-automate-statistical-analysis-using-rmarkdown/#comments</comments>
		
		<dc:creator><![CDATA[n.ashutosh]]></dc:creator>
		<pubDate>Fri, 29 May 2020 03:13:36 +0000</pubDate>
				<category><![CDATA[Data Science]]></category>
		<category><![CDATA[automated reports]]></category>
		<category><![CDATA[knitr]]></category>
		<category><![CDATA[R]]></category>
		<category><![CDATA[Rmarkdown]]></category>
		<category><![CDATA[Tufte]]></category>
		<guid isPermaLink="false">https://nandeshwar.info/?p=3377</guid>

					<description><![CDATA[<p>In this post, you will learn how to repeat certain analysis using R and RMarkdown. Recently, I saw many dissertations on higher education fundraising in which the researchers had used a data set, selected some variables, ran correlation or association as well as significance tests, and built a logistic regression model on the selected variables. [&#8230;]</p>
<p>The post <a rel="nofollow" href="https://nandeshwar.info/data-science-2/how-to-automate-statistical-analysis-using-rmarkdown/">How to Automate Statistical Analysis using RMarkdown</a> appeared first on <a rel="nofollow" href="https://nandeshwar.info">nandeshwar.info</a>.</p>
]]></description>
										<content:encoded><![CDATA[
<p>In this post, you will learn how to repeat certain analysis using R and RMarkdown.</p>



<p>Recently, I saw many dissertations on higher education fundraising in which the researchers had used a data set, selected some variables, ran correlation or association as well as significance tests, and built a logistic regression model on the selected variables. The unfortunate part was that the researchers concluded based on weak results.</p>



<p>Almost every dissertation had the same approach. I was confident that this part can be <a href="https://nandeshwar.info/data-science-2/automated-reports-and-dashboards-in-r/">programmed with a template using RMarkdown</a> and RStudio. Since repeating this type of analysis is hardly special, I hope that future researchers will focus on bigger problems.</p>



<p>Let&#8217;s go through the steps to create this document.</p>



<h2>R Markdown Document</h2>



<p>We will open a New R Markdown document and use <a href="https://github.com/Tufte-LaTeX/tufte-latex">Tufte Handout template</a>. Make sure that this works on your computer before you make any changes. If you get errors, you will have to update some settings or install other packages.</p>



<div class="wp-block-image"><figure class="aligncenter size-large"><img loading="lazy" width="490" height="428" src="https://d2py08v4b28rs4.cloudfront.net/wp-content/uploads/rmarkdown-new-document-tufte-book-format-automated-analysis.png" alt="new Rmarkdown document window" class="wp-image-3380" srcset="https://d2py08v4b28rs4.cloudfront.net/wp-content/uploads/rmarkdown-new-document-tufte-book-format-automated-analysis.png 490w, https://d2py08v4b28rs4.cloudfront.net/wp-content/uploads/rmarkdown-new-document-tufte-book-format-automated-analysis-300x262.png 300w" sizes="(max-width: 490px) 100vw, 490px" /></figure></div>



<p>Once you know that the document <a href="https://bookdown.org/yihui/rmarkdown/">knits</a>, get rid of all other stuff below the Introduction header.</p>



<p>Then make changes in the header such as the title and author.</p>



<p>Also replace the output format with <strong>tufte_book2</strong>.</p>



<p>Replace the bibliography file name, which I named repeat-analysis.bib</p>



<h2>R Code</h2>



<p>Insert a R chunk to load our favorite libraries and custom functions. Here, I have a mode function (<a href="https://stackoverflow.com/a/8189441">copied from Stackoverflow</a>, of course), which we will use to replace missing values.</p>


<pre class="wp-block-code" aria-describedby="shcb-language-48" data-shcb-language-name="R" data-shcb-language-slug="r"><div><code class="hljs language-r">```{r libsfunctions, include=FALSE}
library(readr)

library(dplyr)
library(purrr)
library(tables)
library(janitor)
library(vcd)
library(scales)
library(kableExtra)
library(stringr)

Mode &lt;- function(x) {
  x &lt;- x[!is.na(x)]
  ux &lt;- unique(x)
  ux[which.max(tabulate(match(x, ux)))]
}
```</code></div><small class="shcb-language" id="shcb-language-48"><span class="shcb-language__label">Code language:</span> <span class="shcb-language__name">R</span> <span class="shcb-language__paren">(</span><span class="shcb-language__slug">r</span><span class="shcb-language__paren">)</span></small></pre>


<p>Next, we will load and manipulate a dataset. This synthetic data set is from my and Rodger Devine&#8217;s book <a href="https://ds4fr.nandeshwar.info/">Data Science for Fundraising</a>.</p>



<p>Let’s see the <a href="https://nandeshwar.info/ultimate-guide-r-data-manipulation/">data manipulation</a> step by step.</p>


<pre class="wp-block-code" aria-describedby="shcb-language-49" data-shcb-language-name="R" data-shcb-language-slug="r"><div><code class="hljs language-r shcb-code-table shcb-line-numbers shcb-wrap-lines"><span class='shcb-loc'><span>```{r loaddata, include=FALSE}
</span></span><span class='shcb-loc'><span>sample_data &lt;- read_csv("https://www.dropbox.com/s/ntd5tbhr7fxmrr4/DonorSampleDataCleaned.csv?dl=1")
</span></span><span class='shcb-loc'><span>
</span></span><span class='shcb-loc'><span>sample_data_refi &lt;- sample_data %&gt;%
</span></span><span class='shcb-loc'><span>  filter(ALUMNUS_IND == 'Y') %&gt;%
</span></span><span class='shcb-loc'><span>  select(
</span></span><span class='shcb-loc'><span>    ID,
</span></span><span class='shcb-loc'><span>    AGE,
</span></span><span class='shcb-loc'><span>    MARITAL_STATUS,
</span></span><span class='shcb-loc'><span>    GENDER,
</span></span><span class='shcb-loc'><span>    PARENT_IND,
</span></span><span class='shcb-loc'><span>    HAS_INVOLVEMENT_IND,
</span></span><span class='shcb-loc'><span>    WEALTH_RATING,
</span></span><span class='shcb-loc'><span>    DEGREE_LEVEL,
</span></span><span class='shcb-loc'><span>    PREF_ADDRESS_TYPE,
</span></span><span class='shcb-loc'><span>    EMAIL_PRESENT_IND,
</span></span><span class='shcb-loc'><span>    TotalGiving
</span></span><span class='shcb-loc'><span>  ) %&gt;%
</span></span><span class='shcb-loc'><span>  mutate(TotalGivingGroup = cut(
</span></span><span class='shcb-loc'><span>    TotalGiving,
</span></span><span class='shcb-loc'><span>    breaks = c(0, 100, 1000, 10000, max(TotalGiving)),
</span></span><span class='shcb-loc'><span>    include.lowest = T,
</span></span><span class='shcb-loc'><span>    labels = c("0-100", "101-1000", "1001-10000", "10000+")
</span></span><span class='shcb-loc'><span>  )) %&gt;%
</span></span><span class='shcb-loc'><span>  mutate(AgeGroup = cut(
</span></span><span class='shcb-loc'><span>    AGE,
</span></span><span class='shcb-loc'><span>    breaks = c(0, 35, 60, 10000),
</span></span><span class='shcb-loc'><span>    include.lowest = T,
</span></span><span class='shcb-loc'><span>    labels = c("0-35", "36-60", "60+")
</span></span><span class='shcb-loc'><span>  )) %&gt;%
</span></span><span class='shcb-loc'><span>  mutate(ON_CAMPUS_RESIDENCE = sample(c("Y", "N"), size = nrow(.), replace = TRUE)) %&gt;%
</span></span><span class='shcb-loc'><span>  mutate(EVENT_ATTEND_IND = sample(c("Y", "N"), size = nrow(.), replace = TRUE)) %&gt;%
</span></span><span class='shcb-loc'><span>  mutate(DISTANCE_FROM_CAMPUS = sample(
</span></span><span class='shcb-loc'><span>    c("0-50 miles", "51-100 miles", "100+ miles"),
</span></span><span class='shcb-loc'><span>    size = nrow(.),
</span></span><span class='shcb-loc'><span>    replace = TRUE
</span></span><span class='shcb-loc'><span>  )) %&gt;%
</span></span><span class='shcb-loc'><span>  mutate_if(is.numeric, list( ~ replace(., is.na(.), median(., na.rm = TRUE)))) %&gt;%
</span></span><span class='shcb-loc'><span>  mutate_if(is.character, list( ~ replace(., is.na(.), Mode(.))))
</span></span><span class='shcb-loc'><span>
</span></span><span class='shcb-loc'><span>names(sample_data_refi) &lt;- gsub("_", " ", names(sample_data_refi))
</span></span><span class='shcb-loc'><span>```
</span></span></code></div><small class="shcb-language" id="shcb-language-49"><span class="shcb-language__label">Code language:</span> <span class="shcb-language__name">R</span> <span class="shcb-language__paren">(</span><span class="shcb-language__slug">r</span><span class="shcb-language__paren">)</span></small></pre>


<p>First, I removed any non-alumni from the dataset. Line # 5.</p>



<p>Then, I selected a few variables of importance. Lines # 6-18.</p>



<p>Then I created buckets using the Total Giving variable. Lines # 6-18.</p>



<p>Same with the Age variable. Lines # 25-30.</p>



<p>I added a few fake variables and randomly generated their values. Lines # 31-37.</p>



<p>Finally, I replaced the missing values from a numeric variable with the median of that variable and for a character variable, I used Mode value of that variable. Lines # 38-39.</p>



<p>I also replaced the underscores in the variable names with spaces. Line # 41</p>



<h2>Map Functions</h2>



<p>Before we generate statistical results, let’s take a detour to understand the map functions from the package <a href="https://purrr.tidyverse.org/reference/map.html">purr</a>.</p>



<p>The map function is similar to the apply family of functions in that you can apply any function to a list and do something with the results. In our case, we want to apply statistical tests to each variable in our dataset.</p>



<p>Let’s take a simple example. We have two variables in our data frame x and y. We want to find the mean of each of those variables. Of course, this can be done running summary(), but we want to see how we can this function for complex use cases.</p>


<pre class="wp-block-code" aria-describedby="shcb-language-50" data-shcb-language-name="R" data-shcb-language-slug="r"><div><code class="hljs language-r"><span class="hljs-keyword">library</span>(dplyr)
<span class="hljs-keyword">library</span>(purrr)

data.frame(x = <span class="hljs-number">1</span>:<span class="hljs-number">10</span>, y = <span class="hljs-number">11</span>:<span class="hljs-number">20</span>) %&gt;% 
  summary()</code></div><small class="shcb-language" id="shcb-language-50"><span class="shcb-language__label">Code language:</span> <span class="shcb-language__name">R</span> <span class="shcb-language__paren">(</span><span class="shcb-language__slug">r</span><span class="shcb-language__paren">)</span></small></pre>


<p>Here&#8217;s using the map function. The tilde sign here represents the data that’s before the map command.</p>



<p>And .x represents each column from the previous data set.</p>


<pre class="wp-block-code" aria-describedby="shcb-language-51" data-shcb-language-name="R" data-shcb-language-slug="r"><div><code class="hljs language-r">data.frame(x = <span class="hljs-number">1</span>:<span class="hljs-number">10</span>, y = <span class="hljs-number">11</span>:<span class="hljs-number">20</span>) %&gt;% 
  map(~ mean(.x)) %&gt;% 
  str()</code></div><small class="shcb-language" id="shcb-language-51"><span class="shcb-language__label">Code language:</span> <span class="shcb-language__name">R</span> <span class="shcb-language__paren">(</span><span class="shcb-language__slug">r</span><span class="shcb-language__paren">)</span></small></pre>


<p>When we execute this, we can a list of two items, one for each variable as seen by the str function.</p>



<p>Instead of getting a list, we can return a dataframe using map_dfr function.</p>


<pre class="wp-block-code" aria-describedby="shcb-language-52" data-shcb-language-name="PHP" data-shcb-language-slug="php"><div><code class="hljs language-php">data.frame(x = <span class="hljs-number">1</span>:<span class="hljs-number">10</span>, y = <span class="hljs-number">11</span>:<span class="hljs-number">20</span>) %&gt;% 
  map(~ mean(.x)) %&gt;% 
  map_dfr(~ broom::tidy(.), .id = <span class="hljs-string">'source'</span>)</code></div><small class="shcb-language" id="shcb-language-52"><span class="shcb-language__label">Code language:</span> <span class="shcb-language__name">PHP</span> <span class="shcb-language__paren">(</span><span class="shcb-language__slug">php</span><span class="shcb-language__paren">)</span></small></pre>


<p>Or, we can use a shortcut function map_dbl to return a vector with the mean values.</p>


<pre class="wp-block-code" aria-describedby="shcb-language-53" data-shcb-language-name="R" data-shcb-language-slug="r"><div><code class="hljs language-r">data.frame(x = <span class="hljs-number">1</span>:<span class="hljs-number">10</span>, y = <span class="hljs-number">11</span>:<span class="hljs-number">20</span>) %&gt;% 
  map_dbl(~ mean(.x))</code></div><small class="shcb-language" id="shcb-language-53"><span class="shcb-language__label">Code language:</span> <span class="shcb-language__name">R</span> <span class="shcb-language__paren">(</span><span class="shcb-language__slug">r</span><span class="shcb-language__paren">)</span></small></pre>


<p>Let’s run a <a href="https://nandeshwar.info/data-mining-2/linear-regression-in-excel/">linear regression</a> model on variables from a dataset.</p>



<p>First, let’s create a test data frame with our dependent variable, z, takes some function form of x and y.</p>


<pre class="wp-block-code" aria-describedby="shcb-language-54" data-shcb-language-name="R" data-shcb-language-slug="r"><div><code class="hljs language-r">test_df &lt;- data.frame(x = <span class="hljs-number">1</span>:<span class="hljs-number">10</span>, y = <span class="hljs-number">11</span>:<span class="hljs-number">20</span>) %&gt;% 
  mutate(z = (x+y)^<span class="hljs-number">2</span>)  </code></div><small class="shcb-language" id="shcb-language-54"><span class="shcb-language__label">Code language:</span> <span class="shcb-language__name">R</span> <span class="shcb-language__paren">(</span><span class="shcb-language__slug">r</span><span class="shcb-language__paren">)</span></small></pre>


<p>To do so, we exclude z from the data set that’s fed to the map function, but provide the original dataset to the data argument of the map function.</p>


<pre class="wp-block-code" aria-describedby="shcb-language-55" data-shcb-language-name="R" data-shcb-language-slug="r"><div><code class="hljs language-r">select(test_df, -z) %&gt;% 
  map( ~ lm(z ~ .x, data = test_df))</code></div><small class="shcb-language" id="shcb-language-55"><span class="shcb-language__label">Code language:</span> <span class="shcb-language__name">R</span> <span class="shcb-language__paren">(</span><span class="shcb-language__slug">r</span><span class="shcb-language__paren">)</span></small></pre>


<p>We get a list with two models for x and y each.</p>



<p>Let’s extract the summary for each of the models.</p>


<pre class="wp-block-code" aria-describedby="shcb-language-56" data-shcb-language-name="R" data-shcb-language-slug="r"><div><code class="hljs language-r">select(test_df, -z)  %&gt;% 
  map( ~ lm(z ~ .x, data = test_df)) %&gt;% 
  map(summary)</code></div><small class="shcb-language" id="shcb-language-56"><span class="shcb-language__label">Code language:</span> <span class="shcb-language__name">R</span> <span class="shcb-language__paren">(</span><span class="shcb-language__slug">r</span><span class="shcb-language__paren">)</span></small></pre>


<p>Now, let’s extract the R-square statistic from each model.</p>


<pre class="wp-block-code" aria-describedby="shcb-language-57" data-shcb-language-name="R" data-shcb-language-slug="r"><div><code class="hljs language-r">select(test_df, -z)  %&gt;% 
  map( ~ lm(z ~ .x, data = test_df)) %&gt;% 
  map(summary) %&gt;% 
  map_dbl(<span class="hljs-string">"r.squared"</span>)</code></div><small class="shcb-language" id="shcb-language-57"><span class="shcb-language__label">Code language:</span> <span class="shcb-language__name">R</span> <span class="shcb-language__paren">(</span><span class="shcb-language__slug">r</span><span class="shcb-language__paren">)</span></small></pre>


<p>If we want to get all the summary statistics, we can use the tidy function from the broom package and return these as a dataframe.</p>


<pre class="wp-block-code" aria-describedby="shcb-language-58" data-shcb-language-name="R" data-shcb-language-slug="r"><div><code class="hljs language-r">select(test_df, -z)  %&gt;% 
  map( ~ lm(z ~ .x, data = test_df)) %&gt;% 
  map_dfr(~ broom::tidy(.), .id = <span class="hljs-string">'source'</span>) </code></div><small class="shcb-language" id="shcb-language-58"><span class="shcb-language__label">Code language:</span> <span class="shcb-language__name">R</span> <span class="shcb-language__paren">(</span><span class="shcb-language__slug">r</span><span class="shcb-language__paren">)</span></small></pre>


<p>This forms the backbone of our analysis.</p>



<p>We create data frames or lists containing the statistics we are interested in and then extract values for the variables we are interested in. For example, we want to extract the p-value of the y-variable, here’s how we can do so.</p>


<pre class="wp-block-code" aria-describedby="shcb-language-59" data-shcb-language-name="R" data-shcb-language-slug="r"><div><code class="hljs language-r">select(test_df, -z)  %&gt;% 
  map( ~ lm(z ~ .x, data = test_df)) %&gt;% 
  map_dfr(~ broom::tidy(.), .id = <span class="hljs-string">'source'</span>) %&gt;% 
  filter(<span class="hljs-keyword">source</span> == <span class="hljs-string">'y'</span>, term == <span class="hljs-string">'.x'</span>) %&gt;% 
  select(p.value) %&gt;% 
  as.double</code></div><small class="shcb-language" id="shcb-language-59"><span class="shcb-language__label">Code language:</span> <span class="shcb-language__name">R</span> <span class="shcb-language__paren">(</span><span class="shcb-language__slug">r</span><span class="shcb-language__paren">)</span></small></pre>


<h2>Statistical Analysis</h2>



<p>Following the map function approach, let&#8217;s run statistical analysis and store them. Insert a new R chunk and enter the following code.</p>



<p>First, let’s run the <a href="https://www.itl.nist.gov/div898/handbook/prc/section4/prc433.htm">analysis of variance</a> on each of the nominal variable and the Total Giving Group variable. Then extract the p-value for each of those variables.</p>


<pre class="wp-block-code" aria-describedby="shcb-language-60" data-shcb-language-name="R" data-shcb-language-slug="r"><div><code class="hljs language-r">aov_values &lt;- sample_data_refi %&gt;% 
  select(-TotalGivingGroup, -ID, -TotalGiving, -AgeGroup) %&gt;% 
  select_if(is.character) %&gt;% 
  map(~ aov(TotalGiving ~ .x, data = sample_data_refi)) %&gt;%
  map_dfr(~ broom::tidy(.), .id = <span class="hljs-string">'source'</span>) %&gt;%
  mutate(p.value = round(p.value, <span class="hljs-number">8</span>))</code></div><small class="shcb-language" id="shcb-language-60"><span class="shcb-language__label">Code language:</span> <span class="shcb-language__name">R</span> <span class="shcb-language__paren">(</span><span class="shcb-language__slug">r</span><span class="shcb-language__paren">)</span></small></pre>


<p>Next, get the Pearson&#8217;s product moment correlation coefficient for numeric variables.</p>


<pre class="wp-block-code" aria-describedby="shcb-language-61" data-shcb-language-name="R" data-shcb-language-slug="r"><div><code class="hljs language-r">pearson_values &lt;- sample_data_refi %&gt;% 
  select(-TotalGivingGroup, -ID, -TotalGiving, -AgeGroup) %&gt;%
  select_if(is.numeric) %&gt;% 
  map(~ cor.test(sample_data_refi$TotalGiving, .x,  method = <span class="hljs-string">"pearson"</span>)) %&gt;%
  map_dfr(~ broom::tidy(.), .id = <span class="hljs-string">'source'</span>) %&gt;%
  mutate(p.value = round(p.value, <span class="hljs-number">8</span>))</code></div><small class="shcb-language" id="shcb-language-61"><span class="shcb-language__label">Code language:</span> <span class="shcb-language__name">R</span> <span class="shcb-language__paren">(</span><span class="shcb-language__slug">r</span><span class="shcb-language__paren">)</span></small></pre>


<p>Then Cramer&#8217;s V value to measure the association between Total Giving group variable and each nominal variable.</p>


<pre class="wp-block-code" aria-describedby="shcb-language-62" data-shcb-language-name="R" data-shcb-language-slug="r"><div><code class="hljs language-r">cramer_values &lt;- sample_data_refi %&gt;%
  select(-ID, -TotalGiving, -AGE, -TotalGivingGroup) %&gt;%
  map(~ xtabs(~TotalGivingGroup + .x, data = sample_data_refi)) %&gt;% 
  map(assocstats) %&gt;% 
  map_dbl(<span class="hljs-string">"cramer"</span>)</code></div><small class="shcb-language" id="shcb-language-62"><span class="shcb-language__label">Code language:</span> <span class="shcb-language__name">R</span> <span class="shcb-language__paren">(</span><span class="shcb-language__slug">r</span><span class="shcb-language__paren">)</span></small></pre>


<p>Now, let’s build logistic regression models for each of the variables and the Total Giving group variable.</p>


<pre class="wp-block-code" aria-describedby="shcb-language-63" data-shcb-language-name="R" data-shcb-language-slug="r"><div><code class="hljs language-r">logistic_reg_stats &lt;- sample_data_refi %&gt;%
  select(-ID, -TotalGiving, -AGE, -TotalGivingGroup) %&gt;%
  map(~ glm(TotalGivingGroup ~  .x, data = sample_data_refi, family = binomial)) %&gt;% 
  map_dfr(~ broom::tidy(.), .id = <span class="hljs-string">'source'</span>)  %&gt;% 
  mutate(term = gsub(pattern = <span class="hljs-string">".x"</span>,replacement =  <span class="hljs-string">""</span>, x = term))</code></div><small class="shcb-language" id="shcb-language-63"><span class="shcb-language__label">Code language:</span> <span class="shcb-language__name">R</span> <span class="shcb-language__paren">(</span><span class="shcb-language__slug">r</span><span class="shcb-language__paren">)</span></small></pre>


<p>And extract, the AIC value of these models.</p>


<pre class="wp-block-code" aria-describedby="shcb-language-64" data-shcb-language-name="R" data-shcb-language-slug="r"><div><code class="hljs language-r">logistic_reg_aic &lt;- sample_data_refi %&gt;%
  select(-ID, -TotalGiving, -AGE, -TotalGivingGroup) %&gt;%
  map(~ glm(TotalGivingGroup ~  .x, data = sample_data_refi, family = binomial)) %&gt;% 
  map_dbl(<span class="hljs-string">"aic"</span>) </code></div><small class="shcb-language" id="shcb-language-64"><span class="shcb-language__label">Code language:</span> <span class="shcb-language__name">R</span> <span class="shcb-language__paren">(</span><span class="shcb-language__slug">r</span><span class="shcb-language__paren">)</span></small></pre>


<p>Now, we have all the results we need to put them into a nice-looking document. </p>



<h2>Generate Markdown Syntax</h2>



<p>Let’s add some text to the introduction section and create a new section called <strong>Variable analysis</strong>.</p>



<p>We will now insert a R chunk to create sections for each variable. In this section, we will report the proportions and counts for each value in the variable. We will then generate a conditional sentence based on the p-value from the ANOVA results.</p>


<pre class="wp-block-code" aria-describedby="shcb-language-65" data-shcb-language-name="R" data-shcb-language-slug="r"><div><code class="hljs language-r shcb-code-table shcb-line-numbers shcb-wrap-lines"><span class='shcb-loc'><span>vars_to_study &lt;-
</span></span><span class='shcb-loc'><span>  sort(names(
</span></span><span class='shcb-loc'><span>    select(sample_data_refi, -ID, -TotalGiving, -TotalGivingGroup, -AGE)
</span></span><span class='shcb-loc'><span>  ))
</span></span><span class='shcb-loc'><span>
</span></span><span class='shcb-loc'><span><span class="hljs-keyword">for</span> (var <span class="hljs-keyword">in</span> vars_to_study) {
</span></span><span class='shcb-loc'><span>  cat(paste(<span class="hljs-string">"##"</span>, var, <span class="hljs-string">"\n"</span>))
</span></span><span class='shcb-loc'><span>  
</span></span><span class='shcb-loc'><span>  cat(
</span></span><span class='shcb-loc'><span>    paste(
</span></span><span class='shcb-loc'><span>      <span class="hljs-string">"The data contained in Table"</span>,
</span></span><span class='shcb-loc'><span>      str_replace_all(paste0(<span class="hljs-string">"\\@ref("</span>, <span class="hljs-string">"tab:proptable"</span>, var, <span class="hljs-string">")"</span>), <span class="hljs-string">" "</span>, <span class="hljs-string">""</span>),
</span></span><span class='shcb-loc'><span>      <span class="hljs-string">"provide evidence of a relationship between"</span>,
</span></span><span class='shcb-loc'><span>      var,
</span></span><span class='shcb-loc'><span>      <span class="hljs-string">"and giving group."</span>
</span></span><span class='shcb-loc'><span>    )
</span></span><span class='shcb-loc'><span>  )
</span></span><span class='shcb-loc'><span>  
</span></span><span class='shcb-loc'><span>  cat(<span class="hljs-string">"\n\n\n"</span>)
</span></span><span class='shcb-loc'><span>  
</span></span><span class='shcb-loc'><span>  fi_df &lt;- select(sample_data_refi, TotalGivingGroup)
</span></span><span class='shcb-loc'><span>  prnt_tbl &lt;- sample_data_refi %&gt;%
</span></span><span class='shcb-loc'><span>    tabyl(!!sym(var), TotalGivingGroup)  %&gt;%
</span></span><span class='shcb-loc'><span>    adorn_percentages(<span class="hljs-string">"row"</span>) %&gt;%
</span></span><span class='shcb-loc'><span>    adorn_pct_formatting(rounding = <span class="hljs-string">"half up"</span>, digits = <span class="hljs-number">2</span>) %&gt;%
</span></span><span class='shcb-loc'><span>    adorn_ns() %&gt;%
</span></span><span class='shcb-loc'><span>    adorn_title(<span class="hljs-string">"combined"</span>)
</span></span><span class='shcb-loc'><span>  
</span></span><span class='shcb-loc'><span>  
</span></span><span class='shcb-loc'><span>  kable_cmd &lt;- knitr::kable(
</span></span><span class='shcb-loc'><span>    prnt_tbl,
</span></span><span class='shcb-loc'><span>    format = <span class="hljs-string">"latex"</span>,
</span></span><span class='shcb-loc'><span>    label = paste0(<span class="hljs-string">"proptable"</span>, gsub(<span class="hljs-string">" "</span>, <span class="hljs-string">""</span>, var)),
</span></span><span class='shcb-loc'><span>    caption = paste0(<span class="hljs-string">"Giving Group Segmented by "</span>, var),
</span></span><span class='shcb-loc'><span>    booktabs = <span class="hljs-literal">TRUE</span>
</span></span><span class='shcb-loc'><span>  ) %&gt;%
</span></span><span class='shcb-loc'><span>    row_spec(<span class="hljs-number">0</span>, bold = <span class="hljs-literal">TRUE</span>) %&gt;%
</span></span><span class='shcb-loc'><span>    column_spec(<span class="hljs-number">1</span>, width = <span class="hljs-string">"15em"</span>) %&gt;%
</span></span><span class='shcb-loc'><span>    kable_styling(latex_options = <span class="hljs-string">"scale_down"</span>)
</span></span><span class='shcb-loc'><span>  
</span></span><span class='shcb-loc'><span>  print(kable_cmd)
</span></span><span class='shcb-loc'><span>  
</span></span><span class='shcb-loc'><span>  cat(<span class="hljs-string">"\n\n\n"</span>)
</span></span><span class='shcb-loc'><span>  
</span></span><span class='shcb-loc'><span>  
</span></span><span class='shcb-loc'><span>  aov_df &lt;- filter(aov_values, <span class="hljs-keyword">source</span>  == var, term == <span class="hljs-string">".x"</span>)
</span></span><span class='shcb-loc'><span>  
</span></span><span class='shcb-loc'><span>  pearson_df &lt;- filter(pearson_values, <span class="hljs-keyword">source</span> == var)
</span></span><span class='shcb-loc'><span>  
</span></span><span class='shcb-loc'><span>  <span class="hljs-keyword">if</span> (nrow(aov_df) == <span class="hljs-number">1</span>) {
</span></span><span class='shcb-loc'><span>    prnt_txt &lt;-
</span></span><span class='shcb-loc'><span>      paste(
</span></span><span class='shcb-loc'><span>        <span class="hljs-string">"A one-way analysis of variance was conducted to further investigate the strength of the relationship between giving group status and"</span>,
</span></span><span class='shcb-loc'><span>        var
</span></span><span class='shcb-loc'><span>      )
</span></span><span class='shcb-loc'><span>    
</span></span><span class='shcb-loc'><span>    prnt_txt &lt;-
</span></span><span class='shcb-loc'><span>      paste0(
</span></span><span class='shcb-loc'><span>        prnt_txt,
</span></span><span class='shcb-loc'><span>        <span class="hljs-string">", and the results **"</span>,
</span></span><span class='shcb-loc'><span>        ifelse(aov_df$p.value &lt;= <span class="hljs-number">0.05</span>, <span class="hljs-string">"confirmed"</span>, <span class="hljs-string">"did not confirm"</span>),
</span></span><span class='shcb-loc'><span>        <span class="hljs-string">"** the presence of a relationship with a $p$-value of "</span>,
</span></span><span class='shcb-loc'><span>        round(aov_df$p.value, <span class="hljs-number">3</span>)
</span></span><span class='shcb-loc'><span>      )
</span></span><span class='shcb-loc'><span>    
</span></span><span class='shcb-loc'><span>    
</span></span><span class='shcb-loc'><span>    prnt_txt &lt;-
</span></span><span class='shcb-loc'><span>      paste(
</span></span><span class='shcb-loc'><span>        prnt_txt,
</span></span><span class='shcb-loc'><span>        <span class="hljs-string">" with $F$-statistic of"</span>,
</span></span><span class='shcb-loc'><span>        round(aov_df$statistic, <span class="hljs-number">3</span>),
</span></span><span class='shcb-loc'><span>        <span class="hljs-string">"and"</span>,
</span></span><span class='shcb-loc'><span>        aov_df$df,
</span></span><span class='shcb-loc'><span>        <span class="hljs-string">"degrees of freedom."</span>
</span></span><span class='shcb-loc'><span>      )
</span></span><span class='shcb-loc'><span>    
</span></span><span class='shcb-loc'><span>    cat(prnt_txt)
</span></span><span class='shcb-loc'><span>  }
</span></span><span class='shcb-loc'><span>  
</span></span><span class='shcb-loc'><span>  cat(<span class="hljs-string">"\n\n"</span>)
</span></span><span class='shcb-loc'><span>}
</span></span></code></div><small class="shcb-language" id="shcb-language-65"><span class="shcb-language__label">Code language:</span> <span class="shcb-language__name">R</span> <span class="shcb-language__paren">(</span><span class="shcb-language__slug">r</span><span class="shcb-language__paren">)</span></small></pre>


<p>First, we create a loop to go through the variables we want to report on. Line # 6.</p>



<p>We then create a markdown subsection for that variable. Line # 7.</p>



<p>We write a generic sentence to provide a reference to the proportions table. Lines # 9-17.</p>



<p>We create the proportions table using the <a href="http://sfirke.github.io/janitor/articles/tabyls.html">tabyl</a> function from the janitor package. You will see that I am using two exclamation marks and sym function to get the underlying column name from the looping variable. Lines # 22-27.</p>



<p>We use the adorn functions to add percentages and totals.</p>



<p>Then using the kable function and some functions from the <a href="http://haozhu233.github.io/kableExtra/">kableExtra</a> package we create a proportions table to our liking. Lines # 30-39.</p>



<p>We have to add a few newline characters to make sure our final results look good.</p>



<p>Next, using the p-values from the ANOVA results, we write a conditional sentence i.e. if the p-value is less than or equal to 0.05, there is a relationship between that variable and the total giving variable. Lines # 57-64.</p>



<p>We have to add a few newline characters to make sure our final results look good.</p>



<h2>Logistic Regression Results</h2>



<p>Now, we report out the results from the logistic regression models.</p>



<p>We create a new section.</p>



<p>We pull the important variables using the p-value again. For example’s sake, I am using 0.5 p-value as a threshold.</p>


<pre class="wp-block-code" aria-describedby="shcb-language-66" data-shcb-language-name="R" data-shcb-language-slug="r"><div><code class="hljs language-r">imp_vars &lt;- filter(aov_values,  term == <span class="hljs-string">".x"</span>, p.value &lt;= <span class="hljs-number">0.5</span>) %&gt;% 
  .$<span class="hljs-keyword">source</span></code></div><small class="shcb-language" id="shcb-language-66"><span class="shcb-language__label">Code language:</span> <span class="shcb-language__name">R</span> <span class="shcb-language__paren">(</span><span class="shcb-language__slug">r</span><span class="shcb-language__paren">)</span></small></pre>


<p>Using an inline R command, we print all the important variables.</p>



<p>In a new R chunk, we follow the same approach from the previous loop.</p>


<pre class="wp-block-code" aria-describedby="shcb-language-67" data-shcb-language-name="R" data-shcb-language-slug="r"><div><code class="hljs language-r shcb-code-table shcb-line-numbers shcb-wrap-lines"><span class='shcb-loc'><span><span class="hljs-keyword">for</span> (var <span class="hljs-keyword">in</span> imp_vars) {
</span></span><span class='shcb-loc'><span>  cat(paste(<span class="hljs-string">"##"</span>, var, <span class="hljs-string">"\n"</span>))
</span></span><span class='shcb-loc'><span>  
</span></span><span class='shcb-loc'><span>  cat(
</span></span><span class='shcb-loc'><span>    paste(
</span></span><span class='shcb-loc'><span>      <span class="hljs-string">"The data in Table"</span>,
</span></span><span class='shcb-loc'><span>      str_replace_all(paste0(
</span></span><span class='shcb-loc'><span>        <span class="hljs-string">"\\@ref("</span>, <span class="hljs-string">"tab:logregresultstable"</span>, var, <span class="hljs-string">")"</span>
</span></span><span class='shcb-loc'><span>      ), <span class="hljs-string">" "</span>, <span class="hljs-string">""</span>),
</span></span><span class='shcb-loc'><span>      <span class="hljs-string">"show the logistic model using"</span>,
</span></span><span class='shcb-loc'><span>      var,
</span></span><span class='shcb-loc'><span>      <span class="hljs-string">" as an independent variable and the giving group as a dependent variable."</span>
</span></span><span class='shcb-loc'><span>    )
</span></span><span class='shcb-loc'><span>  )
</span></span><span class='shcb-loc'><span>  
</span></span><span class='shcb-loc'><span>  cat(<span class="hljs-string">"\n\n\n"</span>)
</span></span><span class='shcb-loc'><span>  
</span></span><span class='shcb-loc'><span>  prnt_df &lt;-
</span></span><span class='shcb-loc'><span>    filter(logistic_reg_stats, <span class="hljs-keyword">source</span> == !!var) %&gt;% select(-<span class="hljs-keyword">source</span>)
</span></span><span class='shcb-loc'><span>  
</span></span><span class='shcb-loc'><span>  kable_cmd &lt;- knitr::kable(
</span></span><span class='shcb-loc'><span>    prnt_df,
</span></span><span class='shcb-loc'><span>    format = <span class="hljs-string">"latex"</span>,
</span></span><span class='shcb-loc'><span>    label = paste0(<span class="hljs-string">"logregresultstable"</span>, gsub(<span class="hljs-string">" "</span>, <span class="hljs-string">""</span>, var)),
</span></span><span class='shcb-loc'><span>    caption = paste0(<span class="hljs-string">"Logistic Regression Results for "</span>, var),
</span></span><span class='shcb-loc'><span>    booktabs = <span class="hljs-literal">TRUE</span>
</span></span><span class='shcb-loc'><span>  ) %&gt;%
</span></span><span class='shcb-loc'><span>    row_spec(<span class="hljs-number">0</span>, bold = <span class="hljs-literal">TRUE</span>) %&gt;%
</span></span><span class='shcb-loc'><span>    kable_styling(latex_options = <span class="hljs-string">"scale_down"</span>)
</span></span><span class='shcb-loc'><span>  
</span></span><span class='shcb-loc'><span>  
</span></span><span class='shcb-loc'><span>  print(kable_cmd)
</span></span><span class='shcb-loc'><span>  
</span></span><span class='shcb-loc'><span>  
</span></span><span class='shcb-loc'><span>  cat(<span class="hljs-string">"\n\n\n"</span>)
</span></span><span class='shcb-loc'><span>  
</span></span><span class='shcb-loc'><span>  
</span></span><span class='shcb-loc'><span>  prnt_txt &lt;-
</span></span><span class='shcb-loc'><span>    paste0(
</span></span><span class='shcb-loc'><span>      <span class="hljs-string">"A correlation analysis using Cramer's V was conducted to further investigate the association between giving level groups and "</span>,
</span></span><span class='shcb-loc'><span>      var,
</span></span><span class='shcb-loc'><span>      <span class="hljs-string">" and the value was found to be "</span>,
</span></span><span class='shcb-loc'><span>      round(cramer_values[var], <span class="hljs-number">4</span>),
</span></span><span class='shcb-loc'><span>      <span class="hljs-string">", which indicates a **"</span>,
</span></span><span class='shcb-loc'><span>      ifelse(cramer_values[var] &gt;= <span class="hljs-number">0.5</span>, <span class="hljs-string">"strong"</span>, <span class="hljs-string">"weak"</span>),
</span></span><span class='shcb-loc'><span>      <span class="hljs-string">"** association"</span>,
</span></span><span class='shcb-loc'><span>      <span class="hljs-string">"."</span>
</span></span><span class='shcb-loc'><span>    )
</span></span><span class='shcb-loc'><span>  
</span></span><span class='shcb-loc'><span>  prnt_txt &lt;-   paste0(
</span></span><span class='shcb-loc'><span>    prnt_txt,
</span></span><span class='shcb-loc'><span>    <span class="hljs-string">" The logistic regression generated AIC value for "</span>,
</span></span><span class='shcb-loc'><span>    var,
</span></span><span class='shcb-loc'><span>    <span class="hljs-string">" was "</span>,
</span></span><span class='shcb-loc'><span>    comma(logistic_reg_aic[var]),
</span></span><span class='shcb-loc'><span>    <span class="hljs-string">"."</span>
</span></span><span class='shcb-loc'><span>  )
</span></span><span class='shcb-loc'><span>  
</span></span><span class='shcb-loc'><span>  cat(prnt_txt)
</span></span><span class='shcb-loc'><span>  
</span></span><span class='shcb-loc'><span>  cat(<span class="hljs-string">"\n\n"</span>)
</span></span><span class='shcb-loc'><span>}
</span></span></code></div><small class="shcb-language" id="shcb-language-67"><span class="shcb-language__label">Code language:</span> <span class="shcb-language__name">R</span> <span class="shcb-language__paren">(</span><span class="shcb-language__slug">r</span><span class="shcb-language__paren">)</span></small></pre>


<p>We create a new subsection.</p>



<p>We print a generic sentence referencing the results from the logistic regression model. Lines #4-14.</p>



<p>Then select the appropriate rows from the logistic regression results data frame and print the table. Lines #18-29.</p>



<p>We add conditional text based on the Cramer’s V values whether the association was strong or weak. Lines #39-48.</p>



<p>And finally, we print the AIC values from the logistic regression model. Lines #50-57.</p>



<p>We have everything to create our final document. Let’s hit the “knit” button to see the document.</p>



<figure class="wp-block-image size-large"><img loading="lazy" width="850" height="739" src="https://d2py08v4b28rs4.cloudfront.net/wp-content/uploads/automated-analysis-rmarkdown-final-tufte-pdf-scroll.gif" alt="" class="wp-image-3394"/></figure>



<p>There it is. A beautiful document with descriptive statistics, vague inferences, and dubious <a href="https://nandeshwar.info/enrollment-prediction-models-using-data-mining/">predictive models</a>.</p>



<h2>Conclusion</h2>



<p>This could be a good starting point for anyone who wants to analyze his or her data. But this type of analysis should not form the core components of a dissertation, especially when the scientific journals are discouraging or banning the null hypothesis significance testing to draw inferences. As the editor of the Basic and Applied Social Psychology journal said, “p &lt; .05 bar is too easy to pass and sometimes serves as an excuse for lower quality research.”</p>



<blockquote class="wp-block-quote"><p>p &lt; .05 bar is too easy to pass and sometimes serves as an excuse for lower quality research</p><cite>Editor, Basic and Applied Social Psychology Journal</cite></blockquote>



<p>I hope that you learned how to automate and repeat statistical analysis using R and Rmarkdown. </p>



<h2>Video</h2>



<figure class="wp-block-embed is-type-video is-provider-youtube wp-block-embed-youtube wp-embed-aspect-16-9 wp-has-aspect-ratio"><div class="wp-block-embed__wrapper">
<iframe loading="lazy" title="How to automate statistical analysis and create a nice-looking report using RStudio and RMarkdown" width="500" height="281" src="https://www.youtube.com/embed/Is5yyHr70ao?feature=oembed" frameborder="0" allow="accelerometer; autoplay; encrypted-media; gyroscope; picture-in-picture" allowfullscreen></iframe>
</div></figure>



<h2>Some Books You May Like</h2>



<div class="aawp">

    <div id="aawp-tb-3067">

        <!-- Desktop -->
        <div class="aawp-tb aawp-tb--desktop aawp-tb--cols-6 aawp-tb--ribbon aawp-tb--hide-labels">

            
                
                <div class="aawp-tb__row">

                    <div class="aawp-tb__head">
                                            </div>

                    
                        
                        
                            <div class="aawp-tb-product-0 aawp-tb__data aawp-tb__data--type-thumb">
                                                                <div class="aawp-tb-product-data-thumb"><a href="https://www.amazon.com/dp/1491977310?tag=nandeshwarinf-20&#038;linkCode=ogi&#038;th=1&#038;psc=1" title="Practical Tableau: 100 Tips, Tutorials, and Strategies from a Tableau Zen Master" target="_blank" rel="nofollow" data-aawp-product-id="1491977310" data-aawp-product-title="Practical Tableau: 100 Tips, Tutorials, and Strategies from a Tableau Zen Master" data-aawp-click-tracking="true"><span class="aawp-tb-thumb" style="background-image: url(https://m.media-amazon.com/images/I/51cViJWC9XL._SL160_.jpg);"><img src="https://d2py08v4b28rs4.cloudfront.net/wp-content/plugins/aawp/public/assets/img/thumb-spacer.png" alt="Practical Tableau: 100 Tips, Tutorials, and Strategies from a Tableau Zen Master" /></span></a></div>                            </div>

                        
                    
                        
                        
                            <div class="aawp-tb-product-1 aawp-tb__data aawp-tb__data--type-thumb">
                                                                <div class="aawp-tb-product-data-thumb"><a href="https://www.amazon.com/dp/1119560209?tag=nandeshwarinf-20&#038;linkCode=ogi&#038;th=1&#038;psc=1" title="Visual Analytics with Tableau" target="_blank" rel="nofollow" data-aawp-product-id="1119560209" data-aawp-product-title="Visual Analytics with Tableau" data-aawp-click-tracking="true"><span class="aawp-tb-thumb" style="background-image: url(https://m.media-amazon.com/images/I/41+I6HSsoUL._SL160_.jpg);"><img src="https://d2py08v4b28rs4.cloudfront.net/wp-content/plugins/aawp/public/assets/img/thumb-spacer.png" alt="Visual Analytics with Tableau" /></span></a></div>                            </div>

                        
                    
                        
                        
                            <div class="aawp-tb-product-2 aawp-tb__data aawp-tb__data--type-thumb aawp-tb__data--highlight">
                                <span class="aawp-tb-ribbon">Recommended</span>                                <div class="aawp-tb-product-data-thumb"><a href="https://www.amazon.com/dp/1491910399?tag=nandeshwarinf-20&#038;linkCode=ogi&#038;th=1&#038;psc=1" title="R for Data Science: Import, Tidy, Transform, Visualize, and Model Data" target="_blank" rel="nofollow" data-aawp-product-id="1491910399" data-aawp-product-title="R for Data Science: Import, Tidy, Transform, Visualize, and Model Data" data-aawp-click-tracking="true"><span class="aawp-tb-thumb" style="background-image: url(https://m.media-amazon.com/images/I/51eTKL+SkeL._SL160_.jpg);"><img src="https://d2py08v4b28rs4.cloudfront.net/wp-content/plugins/aawp/public/assets/img/thumb-spacer.png" alt="R for Data Science: Import, Tidy, Transform, Visualize, and Model Data" /></span></a></div>                            </div>

                        
                    
                        
                        
                            <div class="aawp-tb-product-3 aawp-tb__data aawp-tb__data--type-thumb">
                                                                <div class="aawp-tb-product-data-thumb"><a href="https://www.amazon.com/dp/1593273843?tag=nandeshwarinf-20&#038;linkCode=ogi&#038;th=1&#038;psc=1" title="The Art of R Programming: A Tour of Statistical Software Design" target="_blank" rel="nofollow" data-aawp-product-id="1593273843" data-aawp-product-title="The Art of R Programming: A Tour of Statistical Software Design" data-aawp-click-tracking="true"><span class="aawp-tb-thumb" style="background-image: url(https://m.media-amazon.com/images/I/41vY-ssLxIL._SL160_.jpg);"><img src="https://d2py08v4b28rs4.cloudfront.net/wp-content/plugins/aawp/public/assets/img/thumb-spacer.png" alt="The Art of R Programming: A Tour of Statistical Software Design" /></span></a></div>                            </div>

                        
                    
                        
                        
                            <div class="aawp-tb-product-4 aawp-tb__data aawp-tb__data--type-thumb">
                                                                <div class="aawp-tb-product-data-thumb"><a href="https://www.amazon.com/dp/0596809158?tag=nandeshwarinf-20&#038;linkCode=ogi&#038;th=1&#038;psc=1" title="R Cookbook: Proven Recipes for Data Analysis, Statistics, and Graphics" target="_blank" rel="nofollow" data-aawp-product-id="0596809158" data-aawp-product-title="R Cookbook: Proven Recipes for Data Analysis, Statistics, and Graphics" data-aawp-click-tracking="true"><span class="aawp-tb-thumb" style="background-image: url(https://m.media-amazon.com/images/I/51sKo-dTddL._SL160_.jpg);"><img src="https://d2py08v4b28rs4.cloudfront.net/wp-content/plugins/aawp/public/assets/img/thumb-spacer.png" alt="R Cookbook: Proven Recipes for Data Analysis, Statistics, and Graphics" /></span></a></div>                            </div>

                        
                    
                </div>

            
                
                <div class="aawp-tb__row">

                    <div class="aawp-tb__head">
                                            </div>

                    
                        
                        
                            <div class="aawp-tb-product-0 aawp-tb__data aawp-tb__data--type-title">
                                                                <div class="aawp-tb-product-data-title"><a  data-aawp-product-id="1491977310" data-aawp-product-title="Practical Tableau  100 Tips Tutorials and Strategies from a Tableau Zen Master" data-aawp-click-tracking="true" class="aawp-field-link" href="https://www.amazon.com/dp/1491977310?tag=nandeshwarinf-20&linkCode=ogi&th=1&psc=1" title="Practical Tableau: 100 Tips, Tutorials, and Strategies from a Tableau Zen Master" target="_blank" rel="nofollow">Practical Tableau: 100 Tips, Tutorials, and Strategies from a Tableau Zen Master</a></div>                            </div>

                        
                    
                        
                        
                            <div class="aawp-tb-product-1 aawp-tb__data aawp-tb__data--type-title">
                                                                <div class="aawp-tb-product-data-title"><a  data-aawp-product-id="1119560209" data-aawp-product-title="Visual Analytics with Tableau" data-aawp-click-tracking="true" class="aawp-field-link" href="https://www.amazon.com/dp/1119560209?tag=nandeshwarinf-20&linkCode=ogi&th=1&psc=1" title="Visual Analytics with Tableau" target="_blank" rel="nofollow">Visual Analytics with Tableau</a></div>                            </div>

                        
                    
                        
                        
                            <div class="aawp-tb-product-2 aawp-tb__data aawp-tb__data--type-title aawp-tb__data--highlight">
                                                                <div class="aawp-tb-product-data-title"><a  data-aawp-product-id="1491910399" data-aawp-product-title="R for Data Science  Import Tidy Transform Visualize and Model Data" data-aawp-click-tracking="true" class="aawp-field-link" href="https://www.amazon.com/dp/1491910399?tag=nandeshwarinf-20&linkCode=ogi&th=1&psc=1" title="R for Data Science: Import, Tidy, Transform, Visualize, and Model Data" target="_blank" rel="nofollow">R for Data Science: Import, Tidy, Transform, Visualize, and Model Data</a></div>                            </div>

                        
                    
                        
                        
                            <div class="aawp-tb-product-3 aawp-tb__data aawp-tb__data--type-title">
                                                                <div class="aawp-tb-product-data-title"><a  data-aawp-product-id="1593273843" data-aawp-product-title="The Art of R Programming  A Tour of Statistical Software Design" data-aawp-click-tracking="true" class="aawp-field-link" href="https://www.amazon.com/dp/1593273843?tag=nandeshwarinf-20&linkCode=ogi&th=1&psc=1" title="The Art of R Programming: A Tour of Statistical Software Design" target="_blank" rel="nofollow">The Art of R Programming: A Tour of Statistical Software Design</a></div>                            </div>

                        
                    
                        
                        
                            <div class="aawp-tb-product-4 aawp-tb__data aawp-tb__data--type-title">
                                                                <div class="aawp-tb-product-data-title"><a  data-aawp-product-id="0596809158" data-aawp-product-title="R Cookbook  Proven Recipes for Data Analysis Statistics and Graphics" data-aawp-click-tracking="true" class="aawp-field-link" href="https://www.amazon.com/dp/0596809158?tag=nandeshwarinf-20&linkCode=ogi&th=1&psc=1" title="R Cookbook: Proven Recipes for Data Analysis, Statistics, and Graphics" target="_blank" rel="nofollow">R Cookbook: Proven Recipes for Data Analysis, Statistics, and Graphics</a></div>                            </div>

                        
                    
                </div>

            
                
                <div class="aawp-tb__row">

                    <div class="aawp-tb__head">
                                            </div>

                    
                        
                        
                            <div class="aawp-tb-product-0 aawp-tb__data aawp-tb__data--type-prime">
                                                                <div class="aawp-tb-product-data-prime"><a  data-aawp-product-id="1491977310" data-aawp-product-title="Practical Tableau  100 Tips Tutorials and Strategies from a Tableau Zen Master" data-aawp-click-tracking="true" class="aawp-check-prime" href="https://www.amazon.com/gp/prime/?tag=nandeshwarinf-20" title="Amazon Prime" rel="nofollow" target="_blank"></a></div>                            </div>

                        
                    
                        
                        
                            <div class="aawp-tb-product-1 aawp-tb__data aawp-tb__data--type-prime">
                                                                <div class="aawp-tb-product-data-prime">-</div>                            </div>

                        
                    
                        
                        
                            <div class="aawp-tb-product-2 aawp-tb__data aawp-tb__data--type-prime aawp-tb__data--highlight">
                                                                <div class="aawp-tb-product-data-prime">-</div>                            </div>

                        
                    
                        
                        
                            <div class="aawp-tb-product-3 aawp-tb__data aawp-tb__data--type-prime">
                                                                <div class="aawp-tb-product-data-prime"><a  data-aawp-product-id="1593273843" data-aawp-product-title="The Art of R Programming  A Tour of Statistical Software Design" data-aawp-click-tracking="true" class="aawp-check-prime" href="https://www.amazon.com/gp/prime/?tag=nandeshwarinf-20" title="Amazon Prime" rel="nofollow" target="_blank"></a></div>                            </div>

                        
                    
                        
                        
                            <div class="aawp-tb-product-4 aawp-tb__data aawp-tb__data--type-prime">
                                                                <div class="aawp-tb-product-data-prime"><a  data-aawp-product-id="0596809158" data-aawp-product-title="R Cookbook  Proven Recipes for Data Analysis Statistics and Graphics" data-aawp-click-tracking="true" class="aawp-check-prime" href="https://www.amazon.com/gp/prime/?tag=nandeshwarinf-20" title="Amazon Prime" rel="nofollow" target="_blank"></a></div>                            </div>

                        
                    
                </div>

            
                
                <div class="aawp-tb__row">

                    <div class="aawp-tb__head">
                                            </div>

                    
                        
                        
                            <div class="aawp-tb-product-0 aawp-tb__data aawp-tb__data--type-reviews">
                                                                <div class="aawp-tb-product-data-reviews">-</div>                            </div>

                        
                    
                        
                        
                            <div class="aawp-tb-product-1 aawp-tb__data aawp-tb__data--type-reviews">
                                                                <div class="aawp-tb-product-data-reviews">-</div>                            </div>

                        
                    
                        
                        
                            <div class="aawp-tb-product-2 aawp-tb__data aawp-tb__data--type-reviews aawp-tb__data--highlight">
                                                                <div class="aawp-tb-product-data-reviews">-</div>                            </div>

                        
                    
                        
                        
                            <div class="aawp-tb-product-3 aawp-tb__data aawp-tb__data--type-reviews">
                                                                <div class="aawp-tb-product-data-reviews">-</div>                            </div>

                        
                    
                        
                        
                            <div class="aawp-tb-product-4 aawp-tb__data aawp-tb__data--type-reviews">
                                                                <div class="aawp-tb-product-data-reviews">-</div>                            </div>

                        
                    
                </div>

            
                
                <div class="aawp-tb__row">

                    <div class="aawp-tb__head">
                                            </div>

                    
                        
                        
                            <div class="aawp-tb-product-0 aawp-tb__data aawp-tb__data--type-price">
                                                                <div class="aawp-tb-product-data-price">&#36;37.08</div>                            </div>

                        
                    
                        
                        
                            <div class="aawp-tb-product-1 aawp-tb__data aawp-tb__data--type-price">
                                                                <div class="aawp-tb-product-data-price">&#36;26.27</div>                            </div>

                        
                    
                        
                        
                            <div class="aawp-tb-product-2 aawp-tb__data aawp-tb__data--type-price aawp-tb__data--highlight">
                                                                <div class="aawp-tb-product-data-price">&#36;38.68</div>                            </div>

                        
                    
                        
                        
                            <div class="aawp-tb-product-3 aawp-tb__data aawp-tb__data--type-price">
                                                                <div class="aawp-tb-product-data-price">&#36;31.99</div>                            </div>

                        
                    
                        
                        
                            <div class="aawp-tb-product-4 aawp-tb__data aawp-tb__data--type-price">
                                                                <div class="aawp-tb-product-data-price">&#36;14.80</div>                            </div>

                        
                    
                </div>

            
                
                <div class="aawp-tb__row">

                    <div class="aawp-tb__head">
                                            </div>

                    
                        
                        
                            <div class="aawp-tb-product-0 aawp-tb__data aawp-tb__data--type-button">
                                                                <div class="aawp-tb-product-data-button"><a  data-aawp-product-id="1491977310" data-aawp-product-title="Practical Tableau  100 Tips Tutorials and Strategies from a Tableau Zen Master" data-aawp-click-tracking="true" class="aawp-button aawp-button--buy aawp-button aawp-button--amazon aawp-button--icon aawp-button--icon-black" href="https://www.amazon.com/dp/1491977310?tag=nandeshwarinf-20&#038;linkCode=ogi&#038;th=1&#038;psc=1" title="Buy on Amazon" target="_blank" rel="nofollow">Buy on Amazon</a></div>                            </div>

                        
                    
                        
                        
                            <div class="aawp-tb-product-1 aawp-tb__data aawp-tb__data--type-button">
                                                                <div class="aawp-tb-product-data-button"><a  data-aawp-product-id="1119560209" data-aawp-product-title="Visual Analytics with Tableau" data-aawp-click-tracking="true" class="aawp-button aawp-button--buy aawp-button aawp-button--amazon aawp-button--icon aawp-button--icon-black" href="https://www.amazon.com/dp/1119560209?tag=nandeshwarinf-20&#038;linkCode=ogi&#038;th=1&#038;psc=1" title="Buy on Amazon" target="_blank" rel="nofollow">Buy on Amazon</a></div>                            </div>

                        
                    
                        
                        
                            <div class="aawp-tb-product-2 aawp-tb__data aawp-tb__data--type-button aawp-tb__data--highlight">
                                                                <div class="aawp-tb-product-data-button"><a  data-aawp-product-id="1491910399" data-aawp-product-title="R for Data Science  Import Tidy Transform Visualize and Model Data" data-aawp-click-tracking="true" class="aawp-button aawp-button--buy aawp-button aawp-button--amazon aawp-button--icon aawp-button--icon-black" href="https://www.amazon.com/dp/1491910399?tag=nandeshwarinf-20&#038;linkCode=ogi&#038;th=1&#038;psc=1" title="Buy on Amazon" target="_blank" rel="nofollow">Buy on Amazon</a></div>                            </div>

                        
                    
                        
                        
                            <div class="aawp-tb-product-3 aawp-tb__data aawp-tb__data--type-button">
                                                                <div class="aawp-tb-product-data-button"><a  data-aawp-product-id="1593273843" data-aawp-product-title="The Art of R Programming  A Tour of Statistical Software Design" data-aawp-click-tracking="true" class="aawp-button aawp-button--buy aawp-button aawp-button--amazon aawp-button--icon aawp-button--icon-black" href="https://www.amazon.com/dp/1593273843?tag=nandeshwarinf-20&#038;linkCode=ogi&#038;th=1&#038;psc=1" title="Buy on Amazon" target="_blank" rel="nofollow">Buy on Amazon</a></div>                            </div>

                        
                    
                        
                        
                            <div class="aawp-tb-product-4 aawp-tb__data aawp-tb__data--type-button">
                                                                <div class="aawp-tb-product-data-button"><a  data-aawp-product-id="0596809158" data-aawp-product-title="R Cookbook  Proven Recipes for Data Analysis Statistics and Graphics" data-aawp-click-tracking="true" class="aawp-button aawp-button--buy aawp-button aawp-button--amazon aawp-button--icon aawp-button--icon-black" href="https://www.amazon.com/dp/0596809158?tag=nandeshwarinf-20&#038;linkCode=ogi&#038;th=1&#038;psc=1" title="Buy on Amazon" target="_blank" rel="nofollow">Buy on Amazon</a></div>                            </div>

                        
                    
                </div>

                    </div>

        <!-- Mobile -->
        <div class="aawp-tb aawp-tb--mobile aawp-tb--ribbon aawp-tb--hide-labels">

            
                <div class="aawp-tb__product aawp-tb-product-0">

                    
                    
                        
                        <div class="aawp-tb__row">

                            <div class="aawp-tb__head">
                                                            </div>

                            
                                <div class="aawp-tb__data aawp-tb__data--type-thumb">
                                    <div class="aawp-tb-product-data-thumb"><a href="https://www.amazon.com/dp/1491977310?tag=nandeshwarinf-20&#038;linkCode=ogi&#038;th=1&#038;psc=1" title="Practical Tableau: 100 Tips, Tutorials, and Strategies from a Tableau Zen Master" target="_blank" rel="nofollow" data-aawp-product-id="1491977310" data-aawp-product-title="Practical Tableau: 100 Tips, Tutorials, and Strategies from a Tableau Zen Master" data-aawp-click-tracking="true"><span class="aawp-tb-thumb" style="background-image: url(https://m.media-amazon.com/images/I/51cViJWC9XL._SL160_.jpg);"><img src="https://d2py08v4b28rs4.cloudfront.net/wp-content/plugins/aawp/public/assets/img/thumb-spacer.png" alt="Practical Tableau: 100 Tips, Tutorials, and Strategies from a Tableau Zen Master" /></span></a></div>                                </div>

                            
                        </div>

                    
                        
                        <div class="aawp-tb__row">

                            <div class="aawp-tb__head">
                                                            </div>

                            
                                <div class="aawp-tb__data aawp-tb__data--type-title">
                                    <div class="aawp-tb-product-data-title"><a  data-aawp-product-id="1491977310" data-aawp-product-title="Practical Tableau  100 Tips Tutorials and Strategies from a Tableau Zen Master" data-aawp-click-tracking="true" class="aawp-field-link" href="https://www.amazon.com/dp/1491977310?tag=nandeshwarinf-20&linkCode=ogi&th=1&psc=1" title="Practical Tableau: 100 Tips, Tutorials, and Strategies from a Tableau Zen Master" target="_blank" rel="nofollow">Practical Tableau: 100 Tips, Tutorials, and Strategies from a Tableau Zen Master</a></div>                                </div>

                            
                        </div>

                    
                        
                        <div class="aawp-tb__row">

                            <div class="aawp-tb__head">
                                                            </div>

                            
                                <div class="aawp-tb__data aawp-tb__data--type-prime">
                                    <div class="aawp-tb-product-data-prime"><a  data-aawp-product-id="1491977310" data-aawp-product-title="Practical Tableau  100 Tips Tutorials and Strategies from a Tableau Zen Master" data-aawp-click-tracking="true" class="aawp-check-prime" href="https://www.amazon.com/gp/prime/?tag=nandeshwarinf-20" title="Amazon Prime" rel="nofollow" target="_blank"></a></div>                                </div>

                            
                        </div>

                    
                        
                        <div class="aawp-tb__row">

                            <div class="aawp-tb__head">
                                                            </div>

                            
                                <div class="aawp-tb__data aawp-tb__data--type-reviews">
                                    <div class="aawp-tb-product-data-reviews">-</div>                                </div>

                            
                        </div>

                    
                        
                        <div class="aawp-tb__row">

                            <div class="aawp-tb__head">
                                                            </div>

                            
                                <div class="aawp-tb__data aawp-tb__data--type-price">
                                    <div class="aawp-tb-product-data-price">&#36;37.08</div>                                </div>

                            
                        </div>

                    
                        
                        <div class="aawp-tb__row">

                            <div class="aawp-tb__head">
                                                            </div>

                            
                                <div class="aawp-tb__data aawp-tb__data--type-button">
                                    <div class="aawp-tb-product-data-button"><a  data-aawp-product-id="1491977310" data-aawp-product-title="Practical Tableau  100 Tips Tutorials and Strategies from a Tableau Zen Master" data-aawp-click-tracking="true" class="aawp-button aawp-button--buy aawp-button aawp-button--amazon aawp-button--icon aawp-button--icon-black" href="https://www.amazon.com/dp/1491977310?tag=nandeshwarinf-20&#038;linkCode=ogi&#038;th=1&#038;psc=1" title="Buy on Amazon" target="_blank" rel="nofollow">Buy on Amazon</a></div>                                </div>

                            
                        </div>

                    
                </div>

            
                <div class="aawp-tb__product aawp-tb-product-1">

                    
                    
                        
                        <div class="aawp-tb__row">

                            <div class="aawp-tb__head">
                                                            </div>

                            
                                <div class="aawp-tb__data aawp-tb__data--type-thumb">
                                    <div class="aawp-tb-product-data-thumb"><a href="https://www.amazon.com/dp/1119560209?tag=nandeshwarinf-20&#038;linkCode=ogi&#038;th=1&#038;psc=1" title="Visual Analytics with Tableau" target="_blank" rel="nofollow" data-aawp-product-id="1119560209" data-aawp-product-title="Visual Analytics with Tableau" data-aawp-click-tracking="true"><span class="aawp-tb-thumb" style="background-image: url(https://m.media-amazon.com/images/I/41+I6HSsoUL._SL160_.jpg);"><img src="https://d2py08v4b28rs4.cloudfront.net/wp-content/plugins/aawp/public/assets/img/thumb-spacer.png" alt="Visual Analytics with Tableau" /></span></a></div>                                </div>

                            
                        </div>

                    
                        
                        <div class="aawp-tb__row">

                            <div class="aawp-tb__head">
                                                            </div>

                            
                                <div class="aawp-tb__data aawp-tb__data--type-title">
                                    <div class="aawp-tb-product-data-title"><a  data-aawp-product-id="1119560209" data-aawp-product-title="Visual Analytics with Tableau" data-aawp-click-tracking="true" class="aawp-field-link" href="https://www.amazon.com/dp/1119560209?tag=nandeshwarinf-20&linkCode=ogi&th=1&psc=1" title="Visual Analytics with Tableau" target="_blank" rel="nofollow">Visual Analytics with Tableau</a></div>                                </div>

                            
                        </div>

                    
                        
                        <div class="aawp-tb__row">

                            <div class="aawp-tb__head">
                                                            </div>

                            
                                <div class="aawp-tb__data aawp-tb__data--type-prime">
                                    <div class="aawp-tb-product-data-prime">-</div>                                </div>

                            
                        </div>

                    
                        
                        <div class="aawp-tb__row">

                            <div class="aawp-tb__head">
                                                            </div>

                            
                                <div class="aawp-tb__data aawp-tb__data--type-reviews">
                                    <div class="aawp-tb-product-data-reviews">-</div>                                </div>

                            
                        </div>

                    
                        
                        <div class="aawp-tb__row">

                            <div class="aawp-tb__head">
                                                            </div>

                            
                                <div class="aawp-tb__data aawp-tb__data--type-price">
                                    <div class="aawp-tb-product-data-price">&#36;26.27</div>                                </div>

                            
                        </div>

                    
                        
                        <div class="aawp-tb__row">

                            <div class="aawp-tb__head">
                                                            </div>

                            
                                <div class="aawp-tb__data aawp-tb__data--type-button">
                                    <div class="aawp-tb-product-data-button"><a  data-aawp-product-id="1119560209" data-aawp-product-title="Visual Analytics with Tableau" data-aawp-click-tracking="true" class="aawp-button aawp-button--buy aawp-button aawp-button--amazon aawp-button--icon aawp-button--icon-black" href="https://www.amazon.com/dp/1119560209?tag=nandeshwarinf-20&#038;linkCode=ogi&#038;th=1&#038;psc=1" title="Buy on Amazon" target="_blank" rel="nofollow">Buy on Amazon</a></div>                                </div>

                            
                        </div>

                    
                </div>

            
                <div class="aawp-tb__product aawp-tb-product-2 aawp-tb__product--highlight">

                    <span class="aawp-tb-ribbon">Recommended</span>
                    
                        
                        <div class="aawp-tb__row">

                            <div class="aawp-tb__head">
                                                            </div>

                            
                                <div class="aawp-tb__data aawp-tb__data--type-thumb">
                                    <div class="aawp-tb-product-data-thumb"><a href="https://www.amazon.com/dp/1491910399?tag=nandeshwarinf-20&#038;linkCode=ogi&#038;th=1&#038;psc=1" title="R for Data Science: Import, Tidy, Transform, Visualize, and Model Data" target="_blank" rel="nofollow" data-aawp-product-id="1491910399" data-aawp-product-title="R for Data Science: Import, Tidy, Transform, Visualize, and Model Data" data-aawp-click-tracking="true"><span class="aawp-tb-thumb" style="background-image: url(https://m.media-amazon.com/images/I/51eTKL+SkeL._SL160_.jpg);"><img src="https://d2py08v4b28rs4.cloudfront.net/wp-content/plugins/aawp/public/assets/img/thumb-spacer.png" alt="R for Data Science: Import, Tidy, Transform, Visualize, and Model Data" /></span></a></div>                                </div>

                            
                        </div>

                    
                        
                        <div class="aawp-tb__row">

                            <div class="aawp-tb__head">
                                                            </div>

                            
                                <div class="aawp-tb__data aawp-tb__data--type-title">
                                    <div class="aawp-tb-product-data-title"><a  data-aawp-product-id="1491910399" data-aawp-product-title="R for Data Science  Import Tidy Transform Visualize and Model Data" data-aawp-click-tracking="true" class="aawp-field-link" href="https://www.amazon.com/dp/1491910399?tag=nandeshwarinf-20&linkCode=ogi&th=1&psc=1" title="R for Data Science: Import, Tidy, Transform, Visualize, and Model Data" target="_blank" rel="nofollow">R for Data Science: Import, Tidy, Transform, Visualize, and Model Data</a></div>                                </div>

                            
                        </div>

                    
                        
                        <div class="aawp-tb__row">

                            <div class="aawp-tb__head">
                                                            </div>

                            
                                <div class="aawp-tb__data aawp-tb__data--type-prime">
                                    <div class="aawp-tb-product-data-prime">-</div>                                </div>

                            
                        </div>

                    
                        
                        <div class="aawp-tb__row">

                            <div class="aawp-tb__head">
                                                            </div>

                            
                                <div class="aawp-tb__data aawp-tb__data--type-reviews">
                                    <div class="aawp-tb-product-data-reviews">-</div>                                </div>

                            
                        </div>

                    
                        
                        <div class="aawp-tb__row">

                            <div class="aawp-tb__head">
                                                            </div>

                            
                                <div class="aawp-tb__data aawp-tb__data--type-price">
                                    <div class="aawp-tb-product-data-price">&#36;38.68</div>                                </div>

                            
                        </div>

                    
                        
                        <div class="aawp-tb__row">

                            <div class="aawp-tb__head">
                                                            </div>

                            
                                <div class="aawp-tb__data aawp-tb__data--type-button">
                                    <div class="aawp-tb-product-data-button"><a  data-aawp-product-id="1491910399" data-aawp-product-title="R for Data Science  Import Tidy Transform Visualize and Model Data" data-aawp-click-tracking="true" class="aawp-button aawp-button--buy aawp-button aawp-button--amazon aawp-button--icon aawp-button--icon-black" href="https://www.amazon.com/dp/1491910399?tag=nandeshwarinf-20&#038;linkCode=ogi&#038;th=1&#038;psc=1" title="Buy on Amazon" target="_blank" rel="nofollow">Buy on Amazon</a></div>                                </div>

                            
                        </div>

                    
                </div>

            
                <div class="aawp-tb__product aawp-tb-product-3">

                    
                    
                        
                        <div class="aawp-tb__row">

                            <div class="aawp-tb__head">
                                                            </div>

                            
                                <div class="aawp-tb__data aawp-tb__data--type-thumb">
                                    <div class="aawp-tb-product-data-thumb"><a href="https://www.amazon.com/dp/1593273843?tag=nandeshwarinf-20&#038;linkCode=ogi&#038;th=1&#038;psc=1" title="The Art of R Programming: A Tour of Statistical Software Design" target="_blank" rel="nofollow" data-aawp-product-id="1593273843" data-aawp-product-title="The Art of R Programming: A Tour of Statistical Software Design" data-aawp-click-tracking="true"><span class="aawp-tb-thumb" style="background-image: url(https://m.media-amazon.com/images/I/41vY-ssLxIL._SL160_.jpg);"><img src="https://d2py08v4b28rs4.cloudfront.net/wp-content/plugins/aawp/public/assets/img/thumb-spacer.png" alt="The Art of R Programming: A Tour of Statistical Software Design" /></span></a></div>                                </div>

                            
                        </div>

                    
                        
                        <div class="aawp-tb__row">

                            <div class="aawp-tb__head">
                                                            </div>

                            
                                <div class="aawp-tb__data aawp-tb__data--type-title">
                                    <div class="aawp-tb-product-data-title"><a  data-aawp-product-id="1593273843" data-aawp-product-title="The Art of R Programming  A Tour of Statistical Software Design" data-aawp-click-tracking="true" class="aawp-field-link" href="https://www.amazon.com/dp/1593273843?tag=nandeshwarinf-20&linkCode=ogi&th=1&psc=1" title="The Art of R Programming: A Tour of Statistical Software Design" target="_blank" rel="nofollow">The Art of R Programming: A Tour of Statistical Software Design</a></div>                                </div>

                            
                        </div>

                    
                        
                        <div class="aawp-tb__row">

                            <div class="aawp-tb__head">
                                                            </div>

                            
                                <div class="aawp-tb__data aawp-tb__data--type-prime">
                                    <div class="aawp-tb-product-data-prime"><a  data-aawp-product-id="1593273843" data-aawp-product-title="The Art of R Programming  A Tour of Statistical Software Design" data-aawp-click-tracking="true" class="aawp-check-prime" href="https://www.amazon.com/gp/prime/?tag=nandeshwarinf-20" title="Amazon Prime" rel="nofollow" target="_blank"></a></div>                                </div>

                            
                        </div>

                    
                        
                        <div class="aawp-tb__row">

                            <div class="aawp-tb__head">
                                                            </div>

                            
                                <div class="aawp-tb__data aawp-tb__data--type-reviews">
                                    <div class="aawp-tb-product-data-reviews">-</div>                                </div>

                            
                        </div>

                    
                        
                        <div class="aawp-tb__row">

                            <div class="aawp-tb__head">
                                                            </div>

                            
                                <div class="aawp-tb__data aawp-tb__data--type-price">
                                    <div class="aawp-tb-product-data-price">&#36;31.99</div>                                </div>

                            
                        </div>

                    
                        
                        <div class="aawp-tb__row">

                            <div class="aawp-tb__head">
                                                            </div>

                            
                                <div class="aawp-tb__data aawp-tb__data--type-button">
                                    <div class="aawp-tb-product-data-button"><a  data-aawp-product-id="1593273843" data-aawp-product-title="The Art of R Programming  A Tour of Statistical Software Design" data-aawp-click-tracking="true" class="aawp-button aawp-button--buy aawp-button aawp-button--amazon aawp-button--icon aawp-button--icon-black" href="https://www.amazon.com/dp/1593273843?tag=nandeshwarinf-20&#038;linkCode=ogi&#038;th=1&#038;psc=1" title="Buy on Amazon" target="_blank" rel="nofollow">Buy on Amazon</a></div>                                </div>

                            
                        </div>

                    
                </div>

            
                <div class="aawp-tb__product aawp-tb-product-4">

                    
                    
                        
                        <div class="aawp-tb__row">

                            <div class="aawp-tb__head">
                                                            </div>

                            
                                <div class="aawp-tb__data aawp-tb__data--type-thumb">
                                    <div class="aawp-tb-product-data-thumb"><a href="https://www.amazon.com/dp/0596809158?tag=nandeshwarinf-20&#038;linkCode=ogi&#038;th=1&#038;psc=1" title="R Cookbook: Proven Recipes for Data Analysis, Statistics, and Graphics" target="_blank" rel="nofollow" data-aawp-product-id="0596809158" data-aawp-product-title="R Cookbook: Proven Recipes for Data Analysis, Statistics, and Graphics" data-aawp-click-tracking="true"><span class="aawp-tb-thumb" style="background-image: url(https://m.media-amazon.com/images/I/51sKo-dTddL._SL160_.jpg);"><img src="https://d2py08v4b28rs4.cloudfront.net/wp-content/plugins/aawp/public/assets/img/thumb-spacer.png" alt="R Cookbook: Proven Recipes for Data Analysis, Statistics, and Graphics" /></span></a></div>                                </div>

                            
                        </div>

                    
                        
                        <div class="aawp-tb__row">

                            <div class="aawp-tb__head">
                                                            </div>

                            
                                <div class="aawp-tb__data aawp-tb__data--type-title">
                                    <div class="aawp-tb-product-data-title"><a  data-aawp-product-id="0596809158" data-aawp-product-title="R Cookbook  Proven Recipes for Data Analysis Statistics and Graphics" data-aawp-click-tracking="true" class="aawp-field-link" href="https://www.amazon.com/dp/0596809158?tag=nandeshwarinf-20&linkCode=ogi&th=1&psc=1" title="R Cookbook: Proven Recipes for Data Analysis, Statistics, and Graphics" target="_blank" rel="nofollow">R Cookbook: Proven Recipes for Data Analysis, Statistics, and Graphics</a></div>                                </div>

                            
                        </div>

                    
                        
                        <div class="aawp-tb__row">

                            <div class="aawp-tb__head">
                                                            </div>

                            
                                <div class="aawp-tb__data aawp-tb__data--type-prime">
                                    <div class="aawp-tb-product-data-prime"><a  data-aawp-product-id="0596809158" data-aawp-product-title="R Cookbook  Proven Recipes for Data Analysis Statistics and Graphics" data-aawp-click-tracking="true" class="aawp-check-prime" href="https://www.amazon.com/gp/prime/?tag=nandeshwarinf-20" title="Amazon Prime" rel="nofollow" target="_blank"></a></div>                                </div>

                            
                        </div>

                    
                        
                        <div class="aawp-tb__row">

                            <div class="aawp-tb__head">
                                                            </div>

                            
                                <div class="aawp-tb__data aawp-tb__data--type-reviews">
                                    <div class="aawp-tb-product-data-reviews">-</div>                                </div>

                            
                        </div>

                    
                        
                        <div class="aawp-tb__row">

                            <div class="aawp-tb__head">
                                                            </div>

                            
                                <div class="aawp-tb__data aawp-tb__data--type-price">
                                    <div class="aawp-tb-product-data-price">&#36;14.80</div>                                </div>

                            
                        </div>

                    
                        
                        <div class="aawp-tb__row">

                            <div class="aawp-tb__head">
                                                            </div>

                            
                                <div class="aawp-tb__data aawp-tb__data--type-button">
                                    <div class="aawp-tb-product-data-button"><a  data-aawp-product-id="0596809158" data-aawp-product-title="R Cookbook  Proven Recipes for Data Analysis Statistics and Graphics" data-aawp-click-tracking="true" class="aawp-button aawp-button--buy aawp-button aawp-button--amazon aawp-button--icon aawp-button--icon-black" href="https://www.amazon.com/dp/0596809158?tag=nandeshwarinf-20&#038;linkCode=ogi&#038;th=1&#038;psc=1" title="Buy on Amazon" target="_blank" rel="nofollow">Buy on Amazon</a></div>                                </div>

                            
                        </div>

                    
                </div>

            
        </div>

    </div>

</div>


<span class="tve-leads-two-step-trigger tl-2step-trigger-2626"></span><span class="tve-leads-two-step-trigger tl-2step-trigger-0"></span><p>The post <a rel="nofollow" href="https://nandeshwar.info/data-science-2/how-to-automate-statistical-analysis-using-rmarkdown/">How to Automate Statistical Analysis using RMarkdown</a> appeared first on <a rel="nofollow" href="https://nandeshwar.info">nandeshwar.info</a>.</p>
]]></content:encoded>
					
					<wfw:commentRss>https://nandeshwar.info/data-science-2/how-to-automate-statistical-analysis-using-rmarkdown/feed/</wfw:commentRss>
			<slash:comments>2</slash:comments>
		
		
			</item>
		<item>
		<title>All Data Has A Story. Here’s How to Tell it</title>
		<link>https://nandeshwar.info/storytelling/all-data-has-a-story/</link>
		
		<dc:creator><![CDATA[n.ashutosh]]></dc:creator>
		<pubDate>Mon, 20 Apr 2020 16:19:28 +0000</pubDate>
				<category><![CDATA[Data Science]]></category>
		<category><![CDATA[storytelling]]></category>
		<category><![CDATA[data science]]></category>
		<guid isPermaLink="false">https://nandeshwar.info/?p=3357</guid>

					<description><![CDATA[<p>All Data Has A Story. Here’s How to Tell it&#160; As data scientists, we all know the power of data. It provides hard evidence for things that are working (or not), it allows us to test our assumptions, and it helps us make better decisions. But there’s one thing data can help with that often [&#8230;]</p>
<p>The post <a rel="nofollow" href="https://nandeshwar.info/storytelling/all-data-has-a-story/">All Data Has A Story. Here’s How to Tell it</a> appeared first on <a rel="nofollow" href="https://nandeshwar.info">nandeshwar.info</a>.</p>
]]></description>
										<content:encoded><![CDATA[<div class="thrv_wrapper thrv_text_element">
<h1 class=""><span data-preserver-spaces="true">All Data Has A Story. Here’s How to Tell it&nbsp;</span></h1>
<p><span data-preserver-spaces="true">As data scientists, we all know the power of data. It provides hard evidence for things that are working (or not), it allows us to test our assumptions, and it helps us make better decisions. But there’s one thing data can help with that often gets forgotten because of its scientific nature: storytelling.&nbsp;</span></p>
<p><span data-preserver-spaces="true">Sure, you can tell a story without data. But doing so makes it more fiction than fact. A story or presentation without data is all opinion. It is hard to sway people when you don’t have any facts, right? Data is a powerful tool in storytelling of all kinds. Good storytellers use data to back up their stories and&nbsp;</span><a href="https://ds4fr.nandeshwar.info/success-in-analytics.html#storytelling" target="_blank"><span data-preserver-spaces="true">good data scientists use storytelling techniques</span></a><span data-preserver-spaces="true">&nbsp;to tell stories obscured in their data.&nbsp;</span></p>
<h2 class=""><span data-preserver-spaces="true">Data Rarely Speaks for Itself</span></h2>
<p><span data-preserver-spaces="true">Here’s the thing about data. In its raw form, it’s not consumable or compelling to an average person. You can’t just put up a bunch of charts or data points and expect them to explain themselves. You’ve got to craft a story that will gain the reader's attention and draw her in.</span></p>
<p><span data-preserver-spaces="true">Storytelling is a natural human trait. From the time we're born until the time we die, we are told stories. We’re told stories at bedtime growing up, we’re told stories throughout the day, and we immerse ourselves in stories every time we turn on the television.&nbsp;</span></p>
<p><span data-preserver-spaces="true">Humans have been telling stories for thousands of years - even before the invention of writing. According to a study,&nbsp;</span><a href="https://www.scientificamerican.com/article/the-secrets-of-storytelling/" target="_blank" class="tve-froala" style="outline: none;"><span data-preserver-spaces="true">65% of our daily communication is based on stories</span></a><span data-preserver-spaces="true">. So how do we use the power of storytelling in </span><a href="https://nandeshwar.info/books/what-is-data-mining-analytics-data-science-and-how-to-learn-them/" target="_blank"><span data-preserver-spaces="true">data science</span></a><span data-preserver-spaces="true">? Here are some methods to try.&nbsp;</span></p>
<h2 class=""><span data-preserver-spaces="true">Don’t just tell. Show.&nbsp;</span></h2>
<p><span data-preserver-spaces="true">You know the saying, “a picture is worth a thousand words?” A compelling data visualization can bring data to life and help show relationships between data points. We’re not just talking </span><a href="https://nandeshwar.info/data-visualization/waffle-chart-vs-dot-plot-vs-pie-charts/" target="_blank" class="tve-froala" style="outline: none;"><span data-preserver-spaces="true">pie charts</span></a><span data-preserver-spaces="true">&nbsp;and line graphs here. Putting some thought and creativity into your </span><a href="https://nandeshwar.info/free-data-visualization-tips/" target="_blank" class="tve-froala" style="outline: none;"><span data-preserver-spaces="true">visuals can have a huge impact</span></a><span data-preserver-spaces="true">&nbsp;on the way data is presented. Not to say </span><a href="https://nandeshwar.info/data-visualization/pie-chart-vs-bar-chart/" target="_blank"><span data-preserver-spaces="true">pie charts</span></a><span data-preserver-spaces="true"> and line graphs are irrelevant because they do have their place. But some other&nbsp;</span><a href="https://ds4fr.nandeshwar.info/data-visualization-1.html#improving-effectiveness-of-visuals" target="_blank" class="tve-froala" style="outline: none;"><span data-preserver-spaces="true">data visualizations excel</span></a><span data-preserver-spaces="true"> when dealing with complex sets of data.</span></p>
<p><span data-preserver-spaces="true">Tools like </span><a href="https://nandeshwar.info/data-science-2/tableau-vs-r/" target="_blank" class="tve-froala" style="outline: none;"><span data-preserver-spaces="true">Tableau make visualization easier</span></a><span data-preserver-spaces="true">, as long as you identify the relationships. How does each point relate to another? What patterns do you see? Good data stories are hidden in data relationships.&nbsp;</span></p>
<p><span data-preserver-spaces="true">The example below is a </span><a href="https://nandeshwar.info/data-visualization/nyt-data-visualization-r/" target="_blank"><span data-preserver-spaces="true">visualization created</span></a><span data-preserver-spaces="true"> to help people in Japan evaluate the seismic risk of their area in the event of an earthquake. It’s not only easy to see the relationships and quickly understand, but it’s also interactive. Users can hover over individual neighborhoods to see where they fall in each category.</span></p>
</div>
<div class="thrv_wrapper tve_image_caption" data-css="tve-u-1719401d862"><span class="tve_image_frame" style="width: 100%;"><img loading="lazy" class="tve_image wp-image-3361" alt=" tableau-data-visualization-storytelling" width="999" height="899" title="tableau-data-visualization-storytelling" data-id="3361" src="https://d2py08v4b28rs4.cloudfront.net/wp-content/uploads/tableau-data-visualization-storytelling.png" style="" srcset="https://d2py08v4b28rs4.cloudfront.net/wp-content/uploads/tableau-data-visualization-storytelling.png 999w, https://d2py08v4b28rs4.cloudfront.net/wp-content/uploads/tableau-data-visualization-storytelling-300x270.png 300w, https://d2py08v4b28rs4.cloudfront.net/wp-content/uploads/tableau-data-visualization-storytelling-768x691.png 768w" sizes="(max-width: 999px) 100vw, 999px" /></span></div>
<div class="thrv_wrapper thrv_text_element">
<p><span data-preserver-spaces="true">Data visualization is as much art as it is science. Tools like Tableau can help you whip up a visual in no time, however choosing that visual, its colors, typography, and other elements are key. Here are some tips.</span></p>
<ul class="">
<li><span data-preserver-spaces="true">You <span data-preserver-spaces="true"><a href="https://ds4fr.nandeshwar.info/data-visualization-1.html#choosing-the-right-chart-types" target="_blank">don’t have to visualize everything.</a><span data-preserver-spaces="true"> What story are you telling? What points are you making? Think about the relationships and patterns you want to show.&nbsp;</span></span></span></li>
<li><span data-preserver-spaces="true">Keep it clean. Too much data can muddy your point. Use clear typography, simple labels, and stick to the data that matters.</span></li>
<li><span data-preserver-spaces="true">One size does not fit all. Choosing the right format is important to tell the story and answer key questions generated by data. For example, bar charts compare categories while bullet charts show progress. Histograms show <span data-preserver-spaces="true"><a href="https://nandeshwar.info/data-visualization/economist-data-visualization-us-map-using-r/" target="_blank" class="tve-froala fr-basic" style="outline: none;">data clusters while maps </a><span data-preserver-spaces="true">are for location-specific data.</span></span></span></li>
<li><span data-preserver-spaces="true">Use predictable patterns that are easy to understand at a glance: numerical, alphabetical, sequential, etc.</span></li>
<li><span data-preserver-spaces="true">Use color cues to highlight information. For example, reds and blues are great for visualizing temperature.&nbsp;</span></li>
<li><span data-preserver-spaces="true">Get creative by using contextual clues like shapes.&nbsp;</span></li>
</ul>
<p><span data-preserver-spaces="true">If all else fails, get design inspiration from infographics or from the Tableau blog itself. Tableau has a weekly learning project called&nbsp;</span><a href="https://www.makeovermonday.co.uk/" target="_blank"><span data-preserver-spaces="true">Makeover Monday</span></a><span data-preserver-spaces="true">&nbsp;that thousands of </span><a href="https://nandeshwar.info/data-science-2/data-scientist-training/" target="_blank"><span data-preserver-spaces="true">data scientists</span></a><span data-preserver-spaces="true"> join in on to practice their skills.</span></p>
<h2 class="">Data Visualization vs. Data Storytelling</h2>
<p><span data-preserver-spaces="true">Here’s the thing. Data visualization and data </span><a href="https://nandeshwar.info/guide-to-improve-your-speaking-instantly/" target="_blank"><span data-preserver-spaces="true">storytelling</span></a><span data-preserver-spaces="true"> are not the same things. A picture may be worth 1,000 words, but a PowerPoint presentation with no story results in a room full of bored people.&nbsp;</span></p>
<p><span data-preserver-spaces="true">You may be good at data analysis, but if you’re not good at presenting the data then you lose your audience. According to Thomas Goulding, a Northeastern Professor,&nbsp;</span><a href="https://www.northeastern.edu/graduate/blog/blog-how-to-tell-stories-with-data/" target="_blank"><span data-preserver-spaces="true">presenting your data is one of the most important skills in data science</span></a><span data-preserver-spaces="true">&nbsp;because the data is useless if the audience doesn’t understand it. Reports and dashboards can be overwhelming. You have to get them thereby adding a narrative - and that’s where data storytelling comes in.&nbsp;</span></p>
<h2 class=""><span data-preserver-spaces="true">Storytelling Techniques</span></h2>
<p><a href="https://amzn.to/2VkApWn" target="_blank"><span data-preserver-spaces="true">Data storytelling</span></a><span data-preserver-spaces="true"> humanizes your data. It’s where facts and figures make the connections easier.&nbsp;</span><a href="https://lifehacker.com/the-science-of-storytelling-why-telling-a-story-is-the-5965703" target="_blank"><span data-preserver-spaces="true">Storytelling science</span></a><span data-preserver-spaces="true">&nbsp;has proven that the language processing parts of our brain are activated when listening to stories. So how can data scientists incorporate storytelling techniques to tell more data stories? Here are some tips.</span></p>
<h3 class=""><span data-preserver-spaces="true">Use Storytelling Arcs&nbsp;</span></h3>
<p><a href="https://amzn.to/2xJBa2d" target="_blank"><span data-preserver-spaces="true">Joseph Campbell’s “Hero's Journey”</span></a><span data-preserver-spaces="true">&nbsp;is a tried and true method of storytelling. It’s used in fables, advertising, and some of our favorite movies (any Star Wars fans out there?). Campbell was a mythologist who studied the similarities of mythologies across cultures. He concluded that no matter the culture, the stories all had the same pattern. He called this the monomyth or hero's journey. The monomyth takes a character through a process in such a way that once your audience ends their journey, they understand the underlying meaning behind the tale. The journey includes 17 phases but the basic structure is this: a hero who goes on an adventure, and in a decisive crisis wins a victory, and then comes home changed or transformed.&nbsp;</span></p>
<p><span data-preserver-spaces="true">There’s also ‘Freytag’s Pyramid,’ a dramatic structure that outlines seven key steps in successful storytelling: exposition, inciting incident, rising action, climax, falling action, resolution, and denouement.&nbsp;</span></p>
<p><span data-preserver-spaces="true">Either way, these storytelling arcs can be used to help structure your data story and take the audience through a transformation that gets them to your final point.</span></p>
<h3 class=""><span data-preserver-spaces="true">Don’t jump around</span></h3>
<p><span data-preserver-spaces="true">You need your audience to follow you through the entire presentation. As fun as flashbacks are in movies, when presenting data, our brains prefer linear storytelling. Be sure there is a clear beginning, middle, and end.&nbsp;</span></p>
<h3 class=""><span data-preserver-spaces="true">Create a cast of characters</span></h3>
<p><span data-preserver-spaces="true">Did you know that telling emotional and character-driven stories boosts our levels of oxytocin? Oxytocin is the “love hormone,” it helps create a feeling of empathy. No story is complete without characters. Who are yours? Are you presenting data on literacy rates? Tell a story about someone who struggled with illiteracy and how it affected them. Are your data points related to sales? Tell a </span><a href="https://nandeshwar.info/success/milind-by-the-dam-short-story-inspirational/" target="_blank" class="tve-froala" style="outline: none;"><span data-preserver-spaces="true">success story</span></a><span data-preserver-spaces="true"> from your sales team. Pulling in real-life examples brings your data to life. It gives it that relatable, human connection. We tend to dissociate with stats and numbers but when it’s real people we’re talking about, the mood changes.</span></p>
<h3 class=""><span data-preserver-spaces="true">Create a world</span></h3>
<p><a href="https://worldbuilding.stackexchange.com/" target="_blank"><span data-preserver-spaces="true">Worldbuilding</span></a><span data-preserver-spaces="true"> is an important skill to have as a writer. Video game developers and fiction writers are excellent at building worlds for us to immerse ourselves in and escape for a bit. The same is true for data storytelling. Set the scene. Lay the groundwork. Why are we here? What problem are we trying to solve? Why does it matter?&nbsp;</span></p>
<h3 class=""><span data-preserver-spaces="true">Know your audience</span></h3>
<p><span data-preserver-spaces="true">Marketers don’t write a single word of copy until they’ve studied their audience. Before they develop campaigns and collateral they know their audience's motivations, values, interests, demographics and much more. The same is true for data storytelling. You need to know your audience and what it cares about. What level of understanding doe it already have? You need to be able to create a rich framework where your findings will be understood. You can’t do that if you don't know your audience.&nbsp;</span></p>
<p><span data-preserver-spaces="true">Although storytelling is natural to all of us, we tend to forget it while using data and statistics. But if you can build a basic structure of a story with your data, you can win over audiences and persuade them to your perspectives. Keep telling those stories!</span></p>
</div>
<div class="thrv_wrapper thrv_contentbox_shortcode thrv-content-box" data-css="tve-u-17193f9cc7f">
<div class="tve-content-box-background" data-css="tve-u-17193f9cc82"></div>
<div class="tve-cb" data-css="tve-u-17193f9cc84">
<div class="thrv_wrapper thrv_text_element" data-css="tve-u-17193fa67b2">
<p data-css="tve-u-17193fb6f4e"><em><span data-css="tve-u-17193fb6f4f" data-preserver-spaces="true">This guest post was submitted by RTS Labs, a&nbsp;</span></em><span data-css="tve-u-17193fd6449" style="color: var(--tcb-color-0);"><a href="https://rtslabs.com/data-science/" target="_blank"><em><span data-css="tve-u-17193fb6f50" data-preserver-spaces="true">data science consultancy</span></em></a></span><em><span data-css="tve-u-17193fb6f51" data-preserver-spaces="true"><span data-css="tve-u-17193fcfc59" style="color: var(--tcb-color-0);">&nbsp;</span>and software development firm.</span></em></p>
</div>
</div>
</div>
<div class="tcb_flag" style="display: none"></div>
<span class="tve-leads-two-step-trigger tl-2step-trigger-2626"></span><span class="tve-leads-two-step-trigger tl-2step-trigger-0"></span><p>The post <a rel="nofollow" href="https://nandeshwar.info/storytelling/all-data-has-a-story/">All Data Has A Story. Here’s How to Tell it</a> appeared first on <a rel="nofollow" href="https://nandeshwar.info">nandeshwar.info</a>.</p>
]]></content:encoded>
					
		
		
			</item>
		<item>
		<title>How to Use OpenAI&#8217;s GPT-2 to Create an AI Writer</title>
		<link>https://nandeshwar.info/data-science-2/use-openai-gpt-2-to-create-ai-writer/</link>
					<comments>https://nandeshwar.info/data-science-2/use-openai-gpt-2-to-create-ai-writer/#comments</comments>
		
		<dc:creator><![CDATA[n.ashutosh]]></dc:creator>
		<pubDate>Wed, 27 Nov 2019 02:59:27 +0000</pubDate>
				<category><![CDATA[Data Science]]></category>
		<category><![CDATA[deep learning]]></category>
		<category><![CDATA[deep neural network]]></category>
		<category><![CDATA[google cloud compute]]></category>
		<category><![CDATA[gpt-2]]></category>
		<category><![CDATA[natural language generation]]></category>
		<category><![CDATA[natural language processing]]></category>
		<category><![CDATA[R]]></category>
		<guid isPermaLink="false">http://nandeshwar.info/?p=3158</guid>

					<description><![CDATA[<p>In a recent article, I wrote about using Markov Chains and OpenAI's GPT-2 to generate text. After writing that article, I wondered: how can I use the GPT-2 models to train my own AI writer to mimic someone else's writing? &#160; I didn't have to search for too long. Using these fascinating and helpful articles [&#8230;]</p>
<p>The post <a rel="nofollow" href="https://nandeshwar.info/data-science-2/use-openai-gpt-2-to-create-ai-writer/">How to Use OpenAI&#8217;s GPT-2 to Create an AI Writer</a> appeared first on <a rel="nofollow" href="https://nandeshwar.info">nandeshwar.info</a>.</p>
]]></description>
										<content:encoded><![CDATA[<div class="thrv_wrapper thrv_text_element">
<p><span data-preserver-spaces="true">In a recent article, I wrote about using Markov Chains and </span><a href="https://nandeshwar.info/data-science-2/natural-language-generation-with-r-python/" target="_blank" class="tve-froala" style="outline: none;"><span data-preserver-spaces="true">OpenAI's GPT-2 to generate text</span></a><span data-preserver-spaces="true">. After writing that article, I wondered: how can I use the GPT-2 models to train my own AI writer to mimic someone else's writing?</span></p>
<p><span data-preserver-spaces="true">I didn't have to search for too long. Using these fascinating and helpful articles on&nbsp;</span><a href="https://www.gwern.net/GPT-2" target="_blank"><span data-preserver-spaces="true">creating poetry&nbsp;</span></a><span data-preserver-spaces="true">and&nbsp;</span><a href="https://medium.com/@ngwaifoong92/beginners-guide-to-retrain-gpt-2-117m-to-generate-custom-text-content-8bb5363d8b7f" target="_blank"><span data-preserver-spaces="true">writing JPop lyrics</span></a><span data-preserver-spaces="true">, I was able to quickly set the scripts up to create an AI writer.&nbsp;</span></p>
<p><span data-preserver-spaces="true">With the help of this article, you will be able to create your own AI program to write articles. Of course, when I say AI, I am building upon the fantastic&nbsp;</span><a href="https://github.com/openai/gpt-2" target="_blank"><span data-preserver-spaces="true">work of the OpenAI</span></a><span data-preserver-spaces="true">&nbsp;team and&nbsp;</span><a href="https://github.com/nshepperd/gpt-2" target="_blank"><span data-preserver-spaces="true">nshepperd</span></a><span data-preserver-spaces="true">, an anonymous programmer who made it very easy to re-train the OpenAI models. Let me also clarify that we aren't building a new </span><a href="https://nandeshwar.info/data-science-2/deep-learning-tensorflow-r-tutorial/" target="_blank"><span data-preserver-spaces="true">deep learning</span></a><span data-preserver-spaces="true"> model, but re-training the GPT-2 models on our chosen text.&nbsp;</span></p>
<p><span data-preserver-spaces="true">For this article, I chose to work Seth Godin's blog posts. I had found his book&nbsp;</span><a href="https://amzn.to/2XTHiOi" target="_blank"><span data-preserver-spaces="true">Linchpin: Are You Indispensable</span></a><span data-preserver-spaces="true">&nbsp;and&nbsp;</span><a href="https://amzn.to/2ONNSl5" target="_blank"><span data-preserver-spaces="true">Permission Marketing</span></a><span data-preserver-spaces="true">&nbsp;very insightful. I unsubscribed to his newsletter after it became repetitive. A few of my friends still enjoy Seth's blog posts, so I thought I would try to mimic his writing -- and generate wisdom. I call my writer:&nbsp;</span><strong><span data-preserver-spaces="true">godinator</span></strong><span data-preserver-spaces="true">.&nbsp;</span></p>
<p><span data-preserver-spaces="true">Let's get started then.</span></p>
</div>
<div class="thrv_wrapper thrv_text_element">
<h2 class="">Things you need</h2>
<ul class="">
<li><span data-preserver-spaces="true">R and RStudio to get the articles from Seth's blog</span></li>
<li><span data-preserver-spaces="true">Google Cloud Compute to run and re-train the GPT-2 models</span>
<ul class="">
<li><span data-preserver-spaces="true">This can get costly unless you qualify for the free tier.&nbsp;</span></li>
</ul>
</li>
<li><span data-preserver-spaces="true">Access to a powerful computer (if you want to run the programs locally)</span></li>
<li><span data-preserver-spaces="true">Familiarity with SSH and shell commands</span></li>
<li><span data-preserver-spaces="true">Some Python knowledge</span></li>
</ul>
</div>
<div class="thrv_wrapper thrv_text_element">
<h2 class=""><span data-preserver-spaces="true">Get Data</span></h2>
</div>
<div class="tcb-clear" data-css="tve-u-160c24adf60">
<div class="thrv_wrapper thrv_contentbox_shortcode thrv-content-box tcb-global-contentbox-k3glqiax" data-ct-name="Tutorial 1" data-ct="stylebox-8983" data-css="tve-u-16eaa723964">
<div class="tve-content-box-background tcb-global-contentbox-k3glqiax-bg" data-css="tve-u-16eaa723966"></div>
<div class="tve-cb tve_empty_dropzone tcb-global-contentbox-k3glqiax-cb" data-css="tve-u-16eaa723968">
<div class="thrv_wrapper thrv_text_element tve_empty_dropzone" data-css="tve-u-16eaa7157e5">
<p data-css="tve-u-16eaa7157e6"><strong><strong><span data-preserver-spaces="true">Warning and disclaimer:</span></strong></strong></p>
</div>
<div class="thrv_wrapper thrv_text_element tve_empty_dropzone">
<p data-css="tve-u-16eaa71dfc8">I don't condone stealing anyone's intellectual property. I provide the following scripts to teach you how to code and become proficient. I am not liable for your misuse of Seth's writing.</p>
</div>
</div>
</div>
</div>
<div class="thrv_wrapper thrv_text_element">
<p><a href="https://seths.blog" target="_blank"><span data-preserver-spaces="true">Seth's blog</span></a><span data-preserver-spaces="true">&nbsp;uses javascript to show&nbsp;</span><a href="https://seths.blog/archive/" target="_blank"><span data-preserver-spaces="true">archival content</span></a><span data-preserver-spaces="true">. I spent many hours trying to figure out the XPath or CSS extractors to get to the articles. I then decided to get the links from the blog's sitemap, which lists all the links on a site to help crawlers index the text.</span></p>
</div>
<div class="thrv_wrapper thrv_text_element">
<p><span data-preserver-spaces="true">I took these the steps to get the articles and save them in text files:</span></p>
<ol class="">
<li><span data-preserver-spaces="true">Get all the links to articles on the blog</span></li>
<li><span data-preserver-spaces="true">Extract the article, date, and title from a blog post</span></li>
<li><span data-preserver-spaces="true">Save each article as a separate text file</span></li>
</ol>
</div>
<div class="thrv_wrapper thrv_text_element">
<h3 class="">Here's how:</h3>
<h4 class="">Load our favorite libraries</h4>
</div>
<div class="thrv_wrapper thrv_contentbox_shortcode thrv-content-box" style="" data-ct-name="Modern 13" data-ct="stylebox-8933" data-css="tve-u-16eaa84ee2f">
<div class="tve-content-box-background" data-css="tve-u-16eaa788466"></div>
<div class="tve-cb tve_empty_dropzone" data-css="tve-u-16eaa788467">
<div class="thrv_wrapper thrv_text_element tve_empty_dropzone thrv-plain-text code" style="" data-css="tve-u-16eaa74ba57">
<div class="tcb-plain-text">library(xml2) # to deal with xml files<br />library(stringr) # to manipulate strings<br />library(dplyr) # for easier <a href="https://nandeshwar.info/ultimate-guide-r-data-manipulation/" target="_blank">data manipulation</a><br />library(readr) # to read write files<br />library(rvest) # to get data from the web</div>
</div>
</div>
</div>
<div class="thrv_wrapper thrv_text_element">
<h4 class="">Get the sitemap links</h4>
<p><span data-preserver-spaces="true">Since there are multiple sitemaps on the main sitemap page, I needed to extract all the links. Luckily, the sitemap XML files use "loc" as an identifier for the URLs.</span></p>
</div>
<div class="thrv_wrapper thrv_contentbox_shortcode thrv-content-box" style="" data-ct-name="Modern 13" data-ct="stylebox-8933" data-css="tve-u-16eaa788463">
<div class="tve-content-box-background" data-css="tve-u-16eaa788466"></div>
<div class="tve-cb tve_empty_dropzone" data-css="tve-u-16eaa788467">
<div class="thrv_wrapper thrv_text_element tve_empty_dropzone thrv-plain-text code" style="" data-css="tve-u-16eaa74ba57">
<div class="tcb-plain-text">sitemap_master_page &lt;- read_html('https://seths.blog/sitemap-index-1.xml')</p>
<p>sitemap_links &lt;- sitemap_master_page %&gt;% <br />&nbsp; &nbsp; html_nodes(xpath = "//loc") %&gt;% <br />&nbsp; &nbsp; html_text()</div>
</div>
</div>
</div>
<div class="thrv_wrapper thrv_text_element">
<p>Then we need to get all the links from all the sitemaps. This function will take a sitemap file and store the links in a data frame.</p>
</div>
<div class="thrv_wrapper thrv_contentbox_shortcode thrv-content-box tcb-global-contentbox-k3gmgms6" style="" data-ct-name="Modern 13" data-ct="stylebox-8933" data-css="tve-u-16eaad9a91a">
<div class="tve-content-box-background tcb-global-contentbox-k3gmgms6-bg" data-css="tve-u-16eaa88b62c"></div>
<div class="tve-cb tve_empty_dropzone tcb-global-contentbox-k3gmgms6-cb" data-css="tve-u-16eaa88b62d">
<div class="thrv_wrapper tve_wp_shortcode undefinedk3gmev0g undefinedk3gmev69 undefinedk3gmevb6 undefinedk3gmewrw undefinedk3gmfa6z undefinedk3gmfayp undefinedk3gmffdm undefinedk3gmffqh" data-css="tve-u-16eaa88b62e">
<div class="tve_shortcode_raw" style="display: none"></div>
<div class="tve_shortcode_rendered">
<pre lang="rsplus">get_blog_post_links <- function(sitemap_url) {
  sitemap_html <- read_html(sitemap_url)
  
  sitemap_links <- sitemap_html %>% 
    html_nodes(xpath = "//loc") %>% 
    html_text() %>% 
    as.data.frame(stringsAsFactors = FALSE) %>% 
    setNames(., "link")
  
  sitemap_links %>% 
    filter(str_detect(string = link, pattern = "[0-9]{4}/[0-2]{2}"))
}</pre>
</div>
</div>
</div>
</div>
<div class="thrv_wrapper thrv_text_element">
<p><span data-preserver-spaces="true">We will then run the get_blog_post_links function through all the sitemap links we found in the index sitemap.</span></p>
</div>
<div class="thrv_wrapper thrv_contentbox_shortcode thrv-content-box tcb-global-contentbox-k3gmgms6" style="" data-ct-name="Modern 13" data-ct="stylebox-8933" data-css="tve-u-16eaad9d08a">
<div class="tve-content-box-background tcb-global-contentbox-k3gmgms6-bg" data-css="tve-u-16eaa88b62c"></div>
<div class="tve-cb tve_empty_dropzone tcb-global-contentbox-k3gmgms6-cb" data-css="tve-u-16eaa88b62d">
<div class="thrv_wrapper tve_wp_shortcode undefinedk3gmev0g undefinedk3gmev69 undefinedk3gmevb6 undefinedk3gmewrw undefinedk3gmfa6z undefinedk3gmfayp undefinedk3gmffdm undefinedk3gmffqh" data-css="tve-u-16eaa88b62e">
<div class="tve_shortcode_raw" style="display: none"></div>
<div class="tve_shortcode_rendered">
<pre lang="rsplus">all_article_links &lt;- lapply(sitemap_links, get_blog_post_links) %&gt;% 
  bind_rows()</pre>
</div>
</div>
</div>
</div>
<div class="thrv_wrapper thrv_text_element">
<h4 class="">Get and save the articles</h4>
<p><span data-preserver-spaces="true">The blog post set up is very simple on this blog. It has a tag of has-content-area. We need to look for this tag and get the text inside this tag.</span></p>
<p><span data-preserver-spaces="true">I wrote (ahm, copied from&nbsp;</span><a href="https://github.com/django/django/blob/master/django/utils/text.py" target="_blank"><span data-preserver-spaces="true">Django</span></a><span data-preserver-spaces="true">) to make sure the created files had proper and valid names.</span></p>
</div>
<div class="thrv_wrapper thrv_contentbox_shortcode thrv-content-box" data-css="tve-u-16eaada078f">
<div class="tve-content-box-background"></div>
<div class="tve-cb">
<div class="thrv_wrapper tve_wp_shortcode" data-css="tve-u-16eaa91c9e4">
<div class="tve_shortcode_raw" style="display: none"></div>
<div class="tve_shortcode_rendered">
<pre lang="rsplus">
# https://github.com/django/django/blob/master/django/utils/text.py
get_valid_filename <- function(str) {
  str <- str_trim(string = str) %>%
    str_replace_all(pattern = ' ', replacement = '-') %>%
    str_replace_all(pattern = '(?u)[^-\\w.]', replacement = '') %>% 
    str_replace_all(pattern = "\\.", replacement = "")
  return(str)
}
</pre>
</div>
</div>
</div>
</div>
<div class="thrv_wrapper thrv_text_element">
<h4 class="">Now to the article extractor.</h4>
<p><span data-preserver-spaces="true">This code is straightforward. For every URL, it searches for the h2 tag and saves it as the blog title. Searches for has-content-area tag and saves it as the body. It also searches for the date tag for the article date. Finally, it generates a valid file name and saves the article. For some reason, I was getting an unnecessary empty line at the end of the document. I had the open the file again, remove the line, and save it again. I also added a two-second delay not to be seen as an attacker on the blog's server.</span></p>
<p><span data-preserver-spaces="true">Then I let the program run overnight.</span></p>
</div>
<div class="thrv_wrapper thrv_contentbox_shortcode thrv-content-box" data-css="tve-u-16eaada31e1">
<div class="tve-content-box-background"></div>
<div class="tve-cb">
<div class="thrv_wrapper tve_wp_shortcode" data-css="tve-u-16eaa91c9e4">
<div class="tve_shortcode_raw" style="display: none"></div>
<div class="tve_shortcode_rendered">
<pre lang="rsplus">
fetch_articles <- function(article_url) {
  
  indiv_article_html <- read_html(article_url)
  
  article_title <- indiv_article_html %>%
    html_node(xpath = "//h2//a") %>%
    html_text()
  
  article_body <- indiv_article_html %>%
    html_nodes(xpath = '//*[contains(concat( " ", @class, " " ), concat( " ", "has-content-area", " " ))]//p') %>%
    html_text(trim = TRUE)
  
  article_date <- indiv_article_html %>% 
    html_node(xpath = '//*[contains(concat( " ", @class, " " ), concat( " ", "date", " " ))]') %>% 
    html_text() %>% 
    as.Date(., format = "%B %d, %Y")
  
  file_name <- paste0('/Users/ashutosh/Documents/godin_articles/',
                      get_valid_filename(article_title), 
                      "_", 
                      article_date, 
                      ".txt")
  
  cat(
    c(article_title, article_body),
    file = file_name,
    sep = "\n\n"
  )
  
  system(command = paste("truncate -s -1", file_name))
  print(paste("Processsed", article_url, "saved here", file_name))
  Sys.sleep(2)
  return(invisible())
}

lapply(all_article_links$link, 
       fetch_articles)
</pre>
</div>
</div>
</div>
</div>
<div class="thrv_wrapper thrv_text_element">
<p><span data-preserver-spaces="true">The next morning, I saw the program had terminated: either I was kicked out from the server, or my computer went into the sleep mode. Fortunately, the program had saved more than 2,700 articles out of the 2,900+ articles from the blog. That was good enough for me.</span></p>
</div>
<div class="thrv_wrapper thrv_text_element">
<h2 class=""><span data-preserver-spaces="true">Moving on to the cloud</span></h2>
<p><span data-preserver-spaces="true">There was no way my old computer would have handled the resource demands of these huge deep learning models. For the next part, I used&nbsp;</span><a href="https://cloud.google.com/" target="_blank"><span data-preserver-spaces="true">Google's Cloud Compute</span></a><span data-preserver-spaces="true">&nbsp;services. Check out this&nbsp;</span><a href="https://medium.com/@howkhang/ultimate-guide-to-setting-up-a-google-cloud-machine-for-fast-ai-version-2-f374208be43" target="_blank"><span data-preserver-spaces="true">wonderful article</span></a><span data-preserver-spaces="true">&nbsp;on getting started on Google cloud.&nbsp;</span></p>
<p><span data-preserver-spaces="true">Here are some general steps to run a virtual machine instance on Google cloud, which, as an aside, I found easier to use than Amazon's cloud services. &nbsp;</span></p>
<h3 class="">Create a VM Instance</h3>
<p><span data-preserver-spaces="true">Once you have created your Google cloud account and logged in the Google Compute Engine area, click on Create Instance.</span></p>
</div>
<div class="thrv_wrapper tve_image_caption" data-css="tve-u-16eaa95d4f5"><span class="tve_image_frame" style="width: 100%;"><img loading="lazy" class="tve_image wp-image-3173" alt="google cloud compute create instance" width="830" height="191" title="google-cloud-compute-create-instance-1" data-id="3173" src="https://d2py08v4b28rs4.cloudfront.net/wp-content/uploads/google-cloud-compute-create-instance-1.png" style=""></span></div>
<div class="thrv_wrapper thrv_text_element">
<p><span data-preserver-spaces="true">Get the <strong>Deep Learning VM</strong> deployment from the marketplace. This VM comes with all the Python and Tensorflow packages needed to train deep learning models.&nbsp;</span></p>
</div>
<div class="thrv_wrapper tve_image_caption" data-css="tve-u-16eaa96a5d3"><span class="tve_image_frame" style="width: 100%;"><img loading="lazy" class="tve_image wp-image-3174" alt="deep learning vm instance deployment google cloud" width="922" height="241" title="deep-learning-vm-instance-deployment-google-cloud" data-id="3174" src="https://d2py08v4b28rs4.cloudfront.net/wp-content/uploads/deep-learning-vm-instance-deployment-google-cloud.png" style=""></span></div>
<div class="thrv_wrapper thrv_text_element">
<p>Fi<span data-preserver-spaces="true">ll in the details in the next Window and select "<strong>us-west1-b</strong>" as the zone. The GPU resources aren't available in other zones. You will have to request to increase the quota because by default no one gets a GPU. You should get that GPU in less than 24 hours.</span></p>
</div>
<div class="thrv_wrapper tve_image_caption" data-css="tve-u-16eaa97d611"><span class="tve_image_frame" style="width: 100%;"><img loading="lazy" class="tve_image wp-image-3175" alt="google cloud compute create instance zone machine type" width="991" height="637" title="google-cloud-compute-create-instance-3-zone-machine-type" data-id="3175" src="https://d2py08v4b28rs4.cloudfront.net/wp-content/uploads/google-cloud-compute-create-instance-3-zone-machine-type.png" style="" srcset="https://d2py08v4b28rs4.cloudfront.net/wp-content/uploads/google-cloud-compute-create-instance-3-zone-machine-type.png 991w, https://d2py08v4b28rs4.cloudfront.net/wp-content/uploads/google-cloud-compute-create-instance-3-zone-machine-type-300x193.png 300w, https://d2py08v4b28rs4.cloudfront.net/wp-content/uploads/google-cloud-compute-create-instance-3-zone-machine-type-768x494.png 768w" sizes="(max-width: 991px) 100vw, 991px" /></span></div>
<div class="thrv_wrapper thrv_text_element">
<p>In<span data-preserver-spaces="true">itially, I had selected&nbsp;</span><em>n1-highmem-4</em><span data-preserver-spaces="true"> with 4vCPU and 26GB memory, but I got out of memory errors with that machine, so I changed the machine to <em>n1-highmem-8</em> with 8vCPU and 52GB memory -- without losing my data! The wonders of modern technology!</span></p>
<p><span data-preserver-spaces="true">Once you hit create (or launch), you should see that Google will launch a new instance for you.</span></p>
<h3 class="">Get Google Cloud SDK</h3>
<p><span data-preserver-spaces="true">You will have an easier time if you have the </span><a href="https://cloud.google.com/sdk/" target="_blank"><span data-preserver-spaces="true">Google Cloud SDK</span></a><span data-preserver-spaces="true"> installed on your local machine. It makes connecting to the cloud VM easy as well as transferring files back and forth.&nbsp;</span></p>
</div>
<div class="thrv_wrapper thrv_contentbox_shortcode thrv-content-box" data-css="tve-u-16eaadaaad0">
<div class="tve-content-box-background"></div>
<div class="tve-cb">
<div class="thrv_wrapper tve_wp_shortcode" data-css="tve-u-16eaa91c9e4">
<div class="tve_shortcode_raw" style="display: none"></div>
<div class="tve_shortcode_rendered">
<pre lang="bash">
curl https://sdk.cloud.google.com | bash

exec -l $SHELL

gcloud init

gcloud auth login
</pre>
</div>
</div>
</div>
</div>
<div class="thrv_wrapper thrv_text_element">
<p>Yo<span data-preserver-spaces="true">u will then sign in with your Google account and create SSH keys to connect to your VM and project.</span></p>
<h3 class="">Get the connect to SSH command</h3>
<p><span data-preserver-spaces="true">You will see a small dropdown under SSH in your list of instances. Click on it to get the gcloud command.</span></p>
</div>
<div class="thrv_wrapper tve_image_caption" data-css="tve-u-16eaa9c771d"><span class="tve_image_frame" style="width: 100%;"><img loading="lazy" class="tve_image wp-image-3180" alt="google cloud compute gcloud ssh command" width="456" height="310" title="google-cloud-compute-gcloud-ssh-command" data-id="3180" src="https://d2py08v4b28rs4.cloudfront.net/wp-content/uploads/google-cloud-compute-gcloud-ssh-command.png" style="" srcset="https://d2py08v4b28rs4.cloudfront.net/wp-content/uploads/google-cloud-compute-gcloud-ssh-command.png 456w, https://d2py08v4b28rs4.cloudfront.net/wp-content/uploads/google-cloud-compute-gcloud-ssh-command-300x204.png 300w" sizes="(max-width: 456px) 100vw, 456px" /></span></div>
<div class="thrv_wrapper thrv_text_element">
<p><span data-preserver-spaces="true">The gcloud command will look something like:</span></p>
</div>
<div class="thrv_wrapper thrv_contentbox_shortcode thrv-content-box" data-css="tve-u-16eaadb25a8">
<div class="tve-content-box-background"></div>
<div class="tve-cb">
<div class="thrv_wrapper tve_wp_shortcode" data-css="tve-u-16eaa91c9e4">
<div class="tve_shortcode_raw" style="display: none"></div>
<div class="tve_shortcode_rendered">
<pre lang="bash">
gcloud beta compute --project <your_project_name> ssh --zone "us-west1-b" <your_instance_name>
</pre>
</div>
</div>
</div>
</div>
<div class="thrv_wrapper thrv_text_element">
<p><span data-preserver-spaces="true">Once you are authenticated to the VM from your local machine, you are ready to move on to the next steps of re-training the GPT-2 models on the VM.</span></p>
</div>
<div class="thrv_wrapper thrv_text_element">
<h2 class=""><span data-preserver-spaces="true">Get the scripts supporting GPT-2</span></h2>
<p><span data-preserver-spaces="true">While connected to your VM, now, via SSH, you will issue some commands to get the scripts and the models.</span></p>
<p><span data-preserver-spaces="true">Get nshepperd's scripts and functions</span></p>
</div>
<div class="thrv_wrapper thrv_contentbox_shortcode thrv-content-box" data-css="tve-u-16eaadb5809">
<div class="tve-content-box-background"></div>
<div class="tve-cb">
<div class="thrv_wrapper tve_wp_shortcode" data-css="tve-u-16eaa91c9e4">
<div class="tve_shortcode_raw" style="display: none"></div>
<div class="tve_shortcode_rendered">
<pre lang="bash">
wget https://github.com/nshepperd/gpt-2/archive/finetuning.zip
</pre>
</div>
</div>
</div>
</div>
<div class="thrv_wrapper thrv_text_element">
<p>Unzip the code files</p>
</div>
<div class="thrv_wrapper thrv_contentbox_shortcode thrv-content-box" data-css="tve-u-16eaadb74b8">
<div class="tve-content-box-background"></div>
<div class="tve-cb">
<div class="thrv_wrapper tve_wp_shortcode" data-css="tve-u-16eaa91c9e4">
<div class="tve_shortcode_raw" style="display: none"></div>
<div class="tve_shortcode_rendered">
<pre lang="bash">
unzip finetuning.zip
</pre>
</div>
</div>
<div class="thrv_wrapper thrv_text_element">
<p>Rename the extracted folder</p>
</div>
</div>
</div>
<div class="thrv_wrapper thrv_contentbox_shortcode thrv-content-box" data-css="tve-u-16eaadb9b27">
<div class="tve-content-box-background"></div>
<div class="tve-cb">
<div class="thrv_wrapper tve_wp_shortcode" data-css="tve-u-16eaa91c9e4">
<div class="tve_shortcode_raw" style="display: none"></div>
<div class="tve_shortcode_rendered">
<pre lang="bash">
mv gpt-2-finetuning gpt-2
</pre>
</div>
</div>
</div>
</div>
<div class="thrv_wrapper thrv_text_element">
<p>Add<span data-preserver-spaces="true"> the local user path to the PATH environment variable</span></p>
</div>
<div class="thrv_wrapper thrv_contentbox_shortcode thrv-content-box" data-css="tve-u-16eaadbadcb">
<div class="tve-content-box-background"></div>
<div class="tve-cb">
<div class="thrv_wrapper tve_wp_shortcode" data-css="tve-u-16eaa91c9e4">
<div class="tve_shortcode_raw" style="display: none"></div>
<div class="tve_shortcode_rendered">
<pre lang="bash">
PATH=/home/<your_user_name>/.local/bin:$PATH
</pre>
</div>
</div>
</div>
</div>
<div class="thrv_wrapper thrv_text_element">
<p>D<span data-preserver-spaces="true">ownload all the requirements needed for GPT-2.</span></p>
</div>
<div class="thrv_wrapper thrv_contentbox_shortcode thrv-content-box" data-css="tve-u-16eaadbdd13">
<div class="tve-content-box-background"></div>
<div class="tve-cb">
<div class="thrv_wrapper tve_wp_shortcode" data-css="tve-u-16eaa91c9e4">
<div class="tve_shortcode_raw" style="display: none"></div>
<div class="tve_shortcode_rendered">
<pre lang="bash">
pip3 install --user -r gpt-2/requirements.txt 
</pre>
</div>
</div>
</div>
</div>
<div class="thrv_wrapper thrv_text_element">
<p>Y<span data-preserver-spaces="true">ou may get an error about a library or two. Read those carefully and reinstall if required. In my case, I had to update the <em>tqdm</em> library.</span></p>
</div>
<div class="thrv_wrapper tve_image_caption" data-css="tve-u-16eaaa07123"><span class="tve_image_frame" style="width: 100%;"><img loading="lazy" class="tve_image wp-image-3187" alt="tqdm error pip3 version local user" width="628" height="167" title="tqdm-error-pip3-version-local-user" data-id="3187" src="https://d2py08v4b28rs4.cloudfront.net/wp-content/uploads/tqdm-error-pip3-version-local-user.png" style=""></span></div>
<div class="thrv_wrapper thrv_contentbox_shortcode thrv-content-box" data-css="tve-u-16eaadbfb3b">
<div class="tve-content-box-background"></div>
<div class="tve-cb">
<div class="thrv_wrapper tve_wp_shortcode" data-css="tve-u-16eaa91c9e4">
<div class="tve_shortcode_raw" style="display: none"></div>
<div class="tve_shortcode_rendered">
<pre lang="bash">
pip3 install --upgrade --user tqdm
</pre>
</div>
</div>
</div>
</div>
<div class="thrv_wrapper thrv_text_element">
<p>I&nbsp;<span data-preserver-spaces="true">also got an error later on regarding the <em>toposort</em> library. I installed it using the user argument.</span></p>
</div>
<div class="thrv_wrapper thrv_contentbox_shortcode thrv-content-box" data-css="tve-u-16eaadc0ffc">
<div class="tve-content-box-background"></div>
<div class="tve-cb">
<div class="thrv_wrapper tve_wp_shortcode" data-css="tve-u-16eaa91c9e4">
<div class="tve_shortcode_raw" style="display: none"></div>
<div class="tve_shortcode_rendered">
<pre lang="bash">
pip3 install --user toposort
</pre>
</div>
</div>
</div>
</div>
<div class="thrv_wrapper thrv_text_element">
<h2 class="">Get the GPT-2 models</h2>
<p><span data-preserver-spaces="true">Change to the gpt-2 directory and download the GPT-2 models.</span></p>
</div>
<div class="thrv_wrapper thrv_contentbox_shortcode thrv-content-box" data-css="tve-u-16eaadc2b72">
<div class="tve-content-box-background"></div>
<div class="tve-cb">
<div class="thrv_wrapper tve_wp_shortcode" data-css="tve-u-16eaa91c9e4">
<div class="tve_shortcode_raw" style="display: none"></div>
<div class="tve_shortcode_rendered">
<pre lang="bash">
cd gpt-2/ 
</pre>
</div>
</div>
</div>
</div>
<div class="thrv_wrapper thrv_text_element">
<p>I&nbsp;<span data-preserver-spaces="true">was very optimistic and very excited about the largest GPT-2 model OpenAI had&nbsp;</span><a href="https://openai.com/blog/gpt-2-1-5b-release/" target="_blank"><span data-preserver-spaces="true">released recently</span></a><span data-preserver-spaces="true">. This model has 1.5 billion hyperparameters and can generate very humanlike text.&nbsp;</span></p>
</div>
<div class="thrv_wrapper tve_image_caption" data-css="tve-u-16eaaa8adea"><span class="tve_image_frame" style="width: 100%;"><img loading="lazy" class="tve_image" alt="" width="250" height="154" src="https://media.giphy.com/media/sEULHciNa7tUQ/giphy.gif" style=""></span></p>
<p class="thrv-inline-text wp-caption-text">One billion parameters!</p>
</div>
<div class="thrv_wrapper thrv_text_element">
<p>I will save you some time: I couldn't fit these models with my 52GB memory. I thought I would try the next one, the 774M one. Nope. Fail.&nbsp;</p>
</div>
<div class="thrv_wrapper tve_image_caption" data-css="tve-u-16eaaa9ff46"><span class="tve_image_frame" style="width: 100%;"><img loading="lazy" class="tve_image wp-image-3192" alt="download gpt 2 model screenshot" width="625" height="374" title="download-gpt-2-model-screenshot" data-id="3192" src="https://d2py08v4b28rs4.cloudfront.net/wp-content/uploads/download-gpt-2-model-screenshot.png" style="" srcset="https://d2py08v4b28rs4.cloudfront.net/wp-content/uploads/download-gpt-2-model-screenshot.png 625w, https://d2py08v4b28rs4.cloudfront.net/wp-content/uploads/download-gpt-2-model-screenshot-300x180.png 300w" sizes="(max-width: 625px) 100vw, 625px" /></span></div>
<div class="thrv_wrapper tve_image_caption" data-css="tve-u-16eaaab6a27"><span class="tve_image_frame" style="width: 100%;"><img loading="lazy" class="tve_image wp-image-3193" alt="gpt 2 large model out of memory error" width="1015" height="147" title="gpt-2-large-model-out-of-memory-error" data-id="3193" src="https://d2py08v4b28rs4.cloudfront.net/wp-content/uploads/gpt-2-large-model-out-of-memory-error.png" style="" srcset="https://d2py08v4b28rs4.cloudfront.net/wp-content/uploads/gpt-2-large-model-out-of-memory-error.png 1015w, https://d2py08v4b28rs4.cloudfront.net/wp-content/uploads/gpt-2-large-model-out-of-memory-error-300x43.png 300w, https://d2py08v4b28rs4.cloudfront.net/wp-content/uploads/gpt-2-large-model-out-of-memory-error-768x111.png 768w" sizes="(max-width: 1015px) 100vw, 1015px" /></span></div>
<div class="thrv_wrapper thrv_text_element thrv-plain-text">
<div class="">""OP_REQUIRES failed at cast_op.cc:109 : Resource exhausted: OOM when allocating tensor with shape[1,20,1024,1024] and type float on /job:localhost/replica:0/task:0/device:CPU:0 by allocator cpu"</div>
</div>
<div class="thrv_wrapper thrv_text_element">
<p>F<span data-preserver-spaces="true">inally, I had success with the 355M model.&nbsp;</span></p>
<p>D<span data-preserver-spaces="true">ownload the 355M model with this command:</span></p>
</div>
<div class="thrv_wrapper thrv_contentbox_shortcode thrv-content-box" data-css="tve-u-16eaadc8c53">
<div class="tve-content-box-background"></div>
<div class="tve-cb">
<div class="thrv_wrapper tve_wp_shortcode" data-css="tve-u-16eaa91c9e4">
<div class="tve_shortcode_raw" style="display: none"></div>
<div class="tve_shortcode_rendered">
<pre lang="bash">
python3 download_model.py 355M
</pre>
</div>
</div>
</div>
</div>
<div class="thrv_wrapper thrv_text_element">
<h3 class="">Encode and train the 355M model</h3>
<p><span data-preserver-spaces="true">The first step before we re-train the GPT-2 355M model is to encode the articles. We do so with this command:</span></p>
</div>
<div class="thrv_wrapper thrv_contentbox_shortcode thrv-content-box" data-css="tve-u-16eaadca776">
<div class="tve-content-box-background"></div>
<div class="tve-cb">
<div class="thrv_wrapper tve_wp_shortcode" data-css="tve-u-16eaa91c9e4">
<div class="tve_shortcode_raw" style="display: none"></div>
<div class="tve_shortcode_rendered">
<pre lang="bash">
PYTHONPATH=src ./encode.py --model_name='355M' src/articles src/articles_encoded355.npz
</pre>
</div>
</div>
</div>
</div>
<div class="thrv_wrapper thrv_text_element">
<p>T<span data-preserver-spaces="true">he next step is to re-train the 355M model with Godin's articles. Since this process will go on for a long time, I advise you to activate a screen, so you can quit out of SSH and log in again whenever you want to check the progress. I did not do this in the beginning and my computer would sleep and kill the SSH session.&nbsp;</span></p>
<p>Y<span data-preserver-spaces="true">ou start a screen with this command:</span></p>
</div>
<div class="thrv_wrapper thrv_contentbox_shortcode thrv-content-box" data-css="tve-u-16eaadcc0c5">
<div class="tve-content-box-background"></div>
<div class="tve-cb">
<div class="thrv_wrapper tve_wp_shortcode" data-css="tve-u-16eaa91c9e4">
<div class="tve_shortcode_raw" style="display: none"></div>
<div class="tve_shortcode_rendered">
<pre lang="bash">
screen -S godinator
</pre>
</div>
</div>
</div>
</div>
<div class="thrv_wrapper thrv_text_element">
<p>N<span data-preserver-spaces="true">ow, you will be on a separate screen (session), which you can detach and resume.</span></p>
<p><span data-preserver-spaces="true">We are now ready to start the re-training.</span></p>
</div>
<div class="thrv_wrapper thrv_contentbox_shortcode thrv-content-box" data-css="tve-u-16eaadcd940">
<div class="tve-content-box-background"></div>
<div class="tve-cb">
<div class="thrv_wrapper tve_wp_shortcode" data-css="tve-u-16eaa91c9e4">
<div class="tve_shortcode_raw" style="display: none"></div>
<div class="tve_shortcode_rendered">
<pre lang="bash">
PYTHONPATH=src ./train.py --dataset src/articles_encoded355.npz --model_name='355M'
</pre>
</div>
</div>
</div>
</div>
<div class="thrv_wrapper thrv_text_element">
<p>W<span data-preserver-spaces="true">hen you start the training, you will see some errors regarding TensorFlow's functions, but thankfully, they are warnings and not errors.</span></p>
</div>
<div class="thrv_wrapper tve_image_caption" data-css="tve-u-16eaaaf5d0e"><span class="tve_image_frame" style="width: 100%;"><img loading="lazy" class="tve_image wp-image-3202" alt="gpt 2 tensorflow errors deprecated" width="631" height="419" title="gpt-2-tensorflow-errors-deprecated" data-id="3202" src="https://d2py08v4b28rs4.cloudfront.net/wp-content/uploads/gpt-2-tensorflow-errors-deprecated.png" style=""></span></div>
<div class="thrv_wrapper thrv_text_element">
<p>I<span data-preserver-spaces="true">n 30 seconds or so, you will see the first training step will be completed. You will keep seeing updates after every step, but the models won't be saved until 1,000 steps. We will be read to the samples after every 100 steps though -- so, that's awesome to see how the text progresses.&nbsp;</span></p>
<p>N<span data-preserver-spaces="true">ow, you can detach the session by hitting <strong>Control + A</strong> and pressing <strong>d</strong>. You will see a message that you are detached from the screen.</span></p>
<p>Y<span data-preserver-spaces="true">ou can check all the screens with the <strong>screen -ls</strong> command.&nbsp;</span></p>
<p>T<span data-preserver-spaces="true">o go back to the screen, you have to copy the exact screen name, which will be in this format: digits.godinator</span></p>
<p>I<span data-preserver-spaces="true">n my case, it was <strong>20371.godinator,</strong> so to resume, I used this command:</span></p>
</div>
<div class="thrv_wrapper thrv_contentbox_shortcode thrv-content-box" data-css="tve-u-16eaadd11f7">
<div class="tve-content-box-background"></div>
<div class="tve-cb">
<div class="thrv_wrapper tve_wp_shortcode" data-css="tve-u-16eaa91c9e4">
<div class="tve_shortcode_raw" style="display: none"></div>
<div class="tve_shortcode_rendered">
<pre lang="bash">
screen -r 20371.godinator
</pre>
</div>
</div>
</div>
</div>
<div class="thrv_wrapper thrv_text_element">
<p>O<span data-preserver-spaces="true">nce the model has been training at least 100 steps, you should see a sample generated in the samples/run1 directory. You don't need to resume your screen session to read these samples. Just connect to the VM using the SSH command and navigate to the directory. I used vim to read the samples like this:</span></p>
</div>
<div class="thrv_wrapper thrv_contentbox_shortcode thrv-content-box" data-css="tve-u-16eaadd2645">
<div class="tve-content-box-background"></div>
<div class="tve-cb">
<div class="thrv_wrapper tve_wp_shortcode" data-css="tve-u-16eaa91c9e4">
<div class="tve_shortcode_raw" style="display: none"></div>
<div class="tve_shortcode_rendered">
<pre lang="bash">
vi samples/run1/samples-100
</pre>
</div>
</div>
</div>
</div>
<div class="thrv_wrapper thrv_text_element">
<p>T<span data-preserver-spaces="true">o get out of vim, you have to enter "<strong>:q!</strong>"</span></p>
<p><span data-preserver-spaces="true">This was my first sample at 100 steps:</span></p>
</div>
<div class="thrv_wrapper thrv_contentbox_shortcode thrv-content-box" style="" data-css="tve-u-160c1e88a23" data-ct="stylebox-30600" data-ct-name="FAQ Box">
<div class="tve-content-box-background" data-css="tve-u-16eaab405f1"></div>
<div class="tve-cb tve_empty_dropzone" data-css="tve-u-16eaab405f2">
<div class="tcb-clear" data-css="tve-u-16eaab405f3">
<div class="thrv_wrapper thrv_contentbox_shortcode thrv-content-box" data-css="tve-u-16eaab405f4" data-value-type="percent">
<div class="tve-content-box-background" data-css="tve-u-16eaab405f5" data-clip-id="9578cf84d1f71"><svg width="0" height="0" class="tve-decoration-svg"><defs><clipPath id="clip-right-9578cf84d1f71" class="decoration-clip clip-path-right" clipPathUnits="objectBoundingBox" data-screen="" decoration-type="slanted" slanted-angle="9" style="" data-inverted="true"><polygon points="0 0, 0 1, 0.8416 1, 1 0"></polygon></clipPath><clipPath id="clip-tablet-right-9578cf84d1f71" class="decoration-clip clip-path-tablet-right" clipPathUnits="objectBoundingBox" data-screen="tablet-" decoration-type="slanted" slanted-angle="9" style="" data-inverted="true"><polygon points="0 0, 0 1, 0.8416 1, 1 0"></polygon></clipPath><clipPath id="clip-mobile-right-9578cf84d1f71" class="decoration-clip clip-path-mobile-right" clipPathUnits="objectBoundingBox" data-screen="mobile-" decoration-type="slanted" slanted-angle="9" style="" data-inverted="true"><polygon points="0 0, 0 1, 0.8416 1, 1 0"></polygon></clipPath></defs></svg></div>
<div class="tve-cb" data-css="tve-u-16eaab405f6">
<div class="thrv_wrapper thrv_text_element" data-css="tve-u-16eaab405f7">
<p data-css="tve-u-16eaab405f8">First Sample</p>
</div>
</div>
</div>
</div>
<div class="thrv_wrapper thrv-columns" data-css="tve-u-16eaab405f9" style="--tcb-col-el-width:759.188;">
<div class="tcb-flex-row v-2 tcb--cols--1" data-css="tve-u-16eaab405fa">
<div class="tcb-flex-col">
<div class="tcb-col">
<div class="thrv_wrapper thrv_toggle_shortcode" data-hover-color="rgb(236, 238, 241)" data-css="tve-u-16eaab405fb">
<div class="tve_faq">
<div class="tve_faqI">
<div class="tve_faqB"><span class="tve_not_editable tve_toggle"></span></p>
<h4 class="tve_editable" data-css="tve-u-16eaab405fc">"You know you're going too bad,"</h4>
</div>
<div class="tve_faqC" style="display: none;">
<div class="thrv_wrapper thrv_text_element">
<p data-css="tve-u-16eaab405fd">""You know you're going too bad," he said to me when his boss told him to stop putting on the hat.</p>
<p>"The problem with getting out in a hurry is that you'll never be a success. You can't get it right."</p>
<p>And the other thing is that to be a success in my business, you have to be prepared. And if you're not ready, then you're a failure. Because you can't get the product wrong if you're not ready. The question is whether you do the work necessary to make it right.</p>
<p>I don't know about you, but as my business grows there might be an issue of the first principle of my own life and my life in the field of sales, and the last of ours.</p>
<p>It starts with you.</p>
<p>If you're on your own. As a kid, and at the age of thirty. As you get older. You don't just lose it all, and we'll be okay. I've tried to teach that in a series of lectures that I did with Chris Moberly. At thirty years old, I'm in a position to make a difference.</p>
<p>That change is going to need an organization. It might be something with leaders, but it's going to require a lot of people to get on board.</p>
<p>And it's going to require a lot of work—even work I'm uncomfortable doing. But it has to happen, and it needs to happen now. It depends on you.</p>
<p>I've heard that when we get there, the first thing we're going to do is fix the problems that are already there. That's the most important question… the fact that they're there. And that doesn't mean that nothing will change.</p>
<p>If the first thing that I do now is to fix the problems we've made by fixing the issues we haven't. Then once the next crisis hits, when it becomes clear what we need to change, I'll change.</p>
<p>The next crisis can be very frightening. If you're not in danger… a little bit, and a lot.</p>
<p>On the other hand, I'm not afraid to start something that isn't at all clear cut. In a recession, for example, I used to do a lot of the work. I made sure that our payroll system had a decent enough amount of payroll. I also did what I could for the people who had been laid off."</p>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
<div class="thrv_wrapper thrv_text_element">
<p>Right away, you can see that GPT-2 is creating mostly coherent sentences. It got the opening quote right as many of Godin's articles start with a quote. It also got a very short sentence-paragraph right: an inspirational punchy-line. "It starts with you."</p>
<p>I let the program run for a few days. As you can see from the CPU chart below, the scripts used almost 100% of the CPU for most of the time. The dips you see are when GPT-2 is generating a new sample. I had set an alert on Google to let me know when the bill got closer to and over $100. I spent about $60 on this training. <img src="https://s.w.org/images/core/emoji/13.1.0/72x72/1f641.png" alt="🙁" class="wp-smiley" style="height: 1em; max-height: 1em;" /></p>
</div>
<div class="thrv_wrapper tve_image_caption" data-css="tve-u-16eaab7bcaf"><span class="tve_image_frame" style="width: 100%;"><img loading="lazy" class="tve_image wp-image-3214" alt="google cloud gpt 2 vm cpu resources chart" width="1009" height="435" title="google-cloud-gpt-2-vm-cpu-resources-chart" data-id="3214" src="https://d2py08v4b28rs4.cloudfront.net/wp-content/uploads/google-cloud-gpt-2-vm-cpu-resources-chart.png" style=""></span></div>
<div class="thrv_wrapper thrv_text_element">
<p>Here's a sample after 5,000 steps:</p>
</div>
<div class="thrv_wrapper thrv_contentbox_shortcode thrv-content-box" style="" data-css="tve-u-160c1e88a23" data-ct="stylebox-30600" data-ct-name="FAQ Box">
<div class="tve-content-box-background" data-css="tve-u-16eaab405f1"></div>
<div class="tve-cb tve_empty_dropzone" data-css="tve-u-16eaab405f2">
<div class="tcb-clear" data-css="tve-u-16eaab405f3">
<div class="thrv_wrapper thrv_contentbox_shortcode thrv-content-box" data-css="tve-u-16eaab98aa8" data-value-type="percent">
<div class="tve-content-box-background" data-css="tve-u-16eaab95f1b" data-clip-id="86f6ce0d6b85b"><svg width="0" height="0" class="tve-decoration-svg"><defs><clipPath id="clip-right-86f6ce0d6b85b" class="decoration-clip clip-path-right" clipPathUnits="objectBoundingBox" data-screen="" decoration-type="slanted" slanted-angle="9" style="" data-inverted="true"><polygon points="0 0, 0 1, 0.8416 1, 1 0"></polygon></clipPath><clipPath id="clip-tablet-right-86f6ce0d6b85b" class="decoration-clip clip-path-tablet-right" clipPathUnits="objectBoundingBox" data-screen="tablet-" decoration-type="slanted" slanted-angle="9" style="" data-inverted="true"><polygon points="0 0, 0 1, 0.8416 1, 1 0"></polygon></clipPath><clipPath id="clip-mobile-right-86f6ce0d6b85b" class="decoration-clip clip-path-mobile-right" clipPathUnits="objectBoundingBox" data-screen="mobile-" decoration-type="slanted" slanted-angle="9" style="" data-inverted="true"><polygon points="0 0, 0 1, 0.8416 1, 1 0"></polygon></clipPath></defs></svg></div>
<div class="tve-cb" data-css="tve-u-16eaab405f6">
<div class="thrv_wrapper thrv_text_element" data-css="tve-u-16eaab405f7">
<p data-css="tve-u-16eaab405f8">5,000 Steps Sample</p>
</div>
</div>
</div>
</div>
<div class="thrv_wrapper thrv-columns" data-css="tve-u-16eaab405f9" style="--tcb-col-el-width:759.188;">
<div class="tcb-flex-row v-2 tcb--cols--1" data-css="tve-u-16eaab405fa">
<div class="tcb-flex-col">
<div class="tcb-col">
<div class="thrv_wrapper thrv_toggle_shortcode" data-hover-color="rgb(236, 238, 241)" data-css="tve-u-16eaab405fb">
<div class="tve_faq">
<div class="tve_faqI">
<div class="tve_faqB"><span class="tve_not_editable tve_toggle"></span></p>
<h4 class="tve_editable" data-css="tve-u-16eaab405fc">The three riffs</h4>
</div>
<div class="tve_faqC" style="display: none;">
<div class="thrv_wrapper thrv_text_element">
<p data-css="tve-u-16eaab405fd">Question: What are the three elements of success?</p>
<p>Profit, equity and hard work. Clearly, the craftsmanship, accountability and profitability of the three elements are critical. It's where you get there that really matters, not if you want people to respect and admire you. Trust, of course.</p>
<p>Question: How do you teach cash to be profitable?</p>
<p>Same answer: Challenge people, make them think, challenge them emotionally and step up and say something. And do it again when they have the tools to make something even bigger.</p>
<p>Question: How do you scale?</p>
<p>Make it work. And then add something new when it doesn't. And then again.</p>
<p>What you're selling is something you sell, something bigger. Something bigger because you're giving people a new lease of life, giving them permission to do things they thought were off limits, to take initiative, to make connections, to fly without a permit, to make a difference.</p>
<p>Successful companies grow, they’re built on breakthroughs, on leaps forward and on breaking new ground. Cash isn’t the same as equity, but it’s the same thing.</p>
<p>I get asked all the time, "where do you get your ideas about marketing and success." There is no single answer. I’d like to point you in the right direction.</p>
<p>There. Now, pick up a book.</p>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
<div class="thrv_wrapper thrv_text_element">
<p>I feel like there are some things that make sense and could even be profound like this paragraph: "<strong>Something bigger because you're giving people a new lease of life, giving them permission to do things they thought were off limits, to take initiative, to make connections, to fly without a permit, to make a difference</strong>." But there are quite a few misses.</p>
<p><span data-preserver-spaces="true">At 6,000 steps:</span></p>
</div>
<div class="thrv_wrapper thrv_contentbox_shortcode thrv-content-box" style="" data-css="tve-u-160c1e88a23" data-ct="stylebox-30600" data-ct-name="FAQ Box">
<div class="tve-content-box-background" data-css="tve-u-16eaab405f1"></div>
<div class="tve-cb tve_empty_dropzone" data-css="tve-u-16eaab405f2">
<div class="tcb-clear" data-css="tve-u-16eaab405f3">
<div class="thrv_wrapper thrv_contentbox_shortcode thrv-content-box" data-css="tve-u-16eaab98aa8" data-value-type="percent">
<div class="tve-content-box-background" data-css="tve-u-16eaab95f1b" data-clip-id="86f6ce0d6b85b"><svg width="0" height="0" class="tve-decoration-svg"><defs><clipPath id="clip-right-86f6ce0d6b85b" class="decoration-clip clip-path-right" clipPathUnits="objectBoundingBox" data-screen="" decoration-type="slanted" slanted-angle="9" style="" data-inverted="true"><polygon points="0 0, 0 1, 0.8416 1, 1 0"></polygon></clipPath><clipPath id="clip-tablet-right-86f6ce0d6b85b" class="decoration-clip clip-path-tablet-right" clipPathUnits="objectBoundingBox" data-screen="tablet-" decoration-type="slanted" slanted-angle="9" style="" data-inverted="true"><polygon points="0 0, 0 1, 0.8416 1, 1 0"></polygon></clipPath><clipPath id="clip-mobile-right-86f6ce0d6b85b" class="decoration-clip clip-path-mobile-right" clipPathUnits="objectBoundingBox" data-screen="mobile-" decoration-type="slanted" slanted-angle="9" style="" data-inverted="true"><polygon points="0 0, 0 1, 0.8416 1, 1 0"></polygon></clipPath></defs></svg></div>
<div class="tve-cb" data-css="tve-u-16eaab405f6">
<div class="thrv_wrapper thrv_text_element" data-css="tve-u-16eaab405f7">
<p data-css="tve-u-16eaab405f8">6,000 Steps Sample</p>
</div>
</div>
</div>
</div>
<div class="thrv_wrapper thrv-columns" data-css="tve-u-16eaab405f9" style="--tcb-col-el-width:759.188;">
<div class="tcb-flex-row v-2 tcb--cols--1" data-css="tve-u-16eaab405fa">
<div class="tcb-flex-col">
<div class="tcb-col">
<div class="thrv_wrapper thrv_toggle_shortcode" data-hover-color="rgb(236, 238, 241)" data-css="tve-u-16eaab405fb">
<div class="tve_faq">
<div class="tve_faqI">
<div class="tve_faqB"><span class="tve_not_editable tve_toggle"></span></p>
<h4 class="tve_editable" data-css="tve-u-16eaab405fc">Is everyone an artist?</h4>
</div>
<div class="tve_faqC" style="display: none;">
<div class="thrv_wrapper thrv_text_element">
<p data-css="tve-u-16eaab405fd">For most people, sure they are.</p>
<p>They’re the ones who figure out how to do things better, or who are good at exploiting available opportunities.</p>
<p>But mostly, people are just average people.</p>
<p>Even people who think they’re very good musicians might not be great at being great at music.</p>
<p>Even people with natural talents like singing or jazz or piano might have field days at their current skill.</p>
<p>Playing the piano might be good at what you do, but hardly good at the music.</p>
<p>Being great at one niche can get you mediocre at another one.</p>
<p>One thing that most people have in common is that they’re organized to put up with a lot less than everyone else is willing to devote the time and emotional labor to becoming great at.</p>
<p>Organized religion is like organized politics–you get us when we’re willing to put up with a lot less.</p>
<p>Organized money is like organized organized sport–you get us when we put up with less.</p>
<p>Organized technology is like organized tribal affiliation–you get us when we put up with less.</p>
<p>Organized mythmaking is like organized folk wisdom, time travel and repeating ourselves–you can add to that list Tobias Folks.</p>
<p>And there aren’t that many people who are self-motivated enough to do something new every day.</p>
<p>At the top of our list is changing the world we see every day. Problem solving and bravery are rewarded and we all win.</p>
<p>The rest is average.</p>
<p>People who are good at one thing get better at another thing. Yes, everyone is capable of being great at something. It’s just a matter of getting better at something that’s more difficult for most of us.</p>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
<div class="thrv_wrapper thrv_text_element">
<p><span data-preserver-spaces="true">L</span><span data-preserver-spaces="true">OL. I love the snide comments: "<strong>people are just average people</strong>". I think the program is understanding Godin's way of explaining things, but this text feels off.&nbsp;</span><br /><span data-preserver-spaces="true"></span></p>
<p><span data-preserver-spaces="true">I stopped the program after 8,100 steps. I downloaded the trained models to my local computer so that I could generate more samples without incurring more charges from Google. &nbsp;</span></p>
</div>
<div class="thrv_wrapper thrv_contentbox_shortcode thrv-content-box" data-css="tve-u-16eaadd61aa">
<div class="tve-content-box-background"></div>
<div class="tve-cb">
<div class="thrv_wrapper tve_wp_shortcode" data-css="tve-u-16eaa91c9e4">
<div class="tve_shortcode_raw" style="display: none"></div>
<div class="tve_shortcode_rendered">
<pre lang="bash">
gcloud beta compute scp --recurse --zone=us-west1-b <your_instance_name>:~/gpt-2/checkpoint/run1 ~/Documents/godinator-learnedmodels
</pre>
</div>
</div>
</div>
</div>
<div class="thrv_wrapper thrv_text_element">
<p>A<span data-preserver-spaces="true">fter downloading these trained models from the Google cloud VM, I stopped the instance. </span></p>
<p><span data-preserver-spaces="true">I</span><span data-preserver-spaces="true"> also repeated the steps from the beginning to download the GPT-2 355M model as well as nshepperd's scripts.</span></p>
<p>T<span data-preserver-spaces="true">hen I created a new folder in my <strong>/gpt-2/models</strong> folder called <strong>godinator</strong>. </span></p>
<p><span data-preserver-spaces="true">I&nbsp;</span><span data-preserver-spaces="true">moved all the files from the <strong>godinator-learnedmodels</strong> directory to <strong>/gpt-2/models/godinator </strong></span></p>
<p><span data-preserver-spaces="true">I&nbsp;</span><span data-preserver-spaces="true">also copied the following files from <strong>/gpt-2/models/355M</strong> to /<strong>gpt-2/models/godinator</strong></span></p>
<ol class="">
<li><span data-preserver-spaces="true">encoder.json</span></li>
<li><span data-preserver-spaces="true">hparams.json</span></li>
<li><span data-preserver-spaces="true">vocab.bpe</span></li>
</ol>
<p><span data-preserver-spaces="true">Now, I was ready to run a local natural language generator a.k.a my AI writer.&nbsp;</span></p>
<p>T<span data-preserver-spaces="true">here are two ways of letting the AI write articles: 1)&nbsp;</span><a href="https://github.com/nshepperd/gpt-2/blob/finetuning/src/generate_unconditional_samples.py" target="_blank"><span data-preserver-spaces="true">unconditional writing</span></a><span data-preserver-spaces="true">, meaning the AI program will keep writing stories until we stop it (by Control + C) or specify the number of samples, and 2)&nbsp;</span><a href="https://github.com/nshepperd/gpt-2/blob/finetuning/src/interactive_conditional_samples.py" target="_blank"><span data-preserver-spaces="true">interactive prompt</span></a><span data-preserver-spaces="true">, in which we feed some information to the program to generate new text based on this entered text. </span></p>
<p><span data-preserver-spaces="true">I</span><span data-preserver-spaces="true"> generated unconditional samples with this command:</span></p>
</div>
<div class="thrv_wrapper thrv_contentbox_shortcode thrv-content-box" data-css="tve-u-16eaadd8407">
<div class="tve-content-box-background"></div>
<div class="tve-cb">
<div class="thrv_wrapper tve_wp_shortcode" data-css="tve-u-16eaa91c9e4">
<div class="tve_shortcode_raw" style="display: none"></div>
<div class="tve_shortcode_rendered">
<pre lang="bash">
python3 ./src/generate_unconditional_samples.py --temperature 0.8 --top_k 40 --nsamples 5 --model_name godinator > gen_samples.txt 
</pre>
</div>
</div>
</div>
</div>
<div class="thrv_wrapper thrv_text_element">
<p>I<span data-preserver-spaces="true">t created five samples and saved those to the <strong>gen_samples.txt</strong> text file. Although you hear the computer heating up, fortunately, the program terminates successfully and generates new stories.</span></p>
<p>T<span data-preserver-spaces="true">he way the deep layers of the neural networks can generate new information and write coherent (mostly) surprises and impresses me. In one of the samples, I read that the program created a fake YouTube star complete with a pseudonym and a real name. It gave a back story to how this person became a star! I had to search for his name to make sure it wasn't mentioned in the training data. Google found nothing! Nothing!</span></p>
<p>H<span data-preserver-spaces="true">ere are some other samples I found intriguing.</span></p>
</div>
<div class="thrv_wrapper thrv_contentbox_shortcode thrv-content-box" style="" data-css="tve-u-160c1e88a23" data-ct="stylebox-30600" data-ct-name="FAQ Box">
<div class="tve-content-box-background" data-css="tve-u-16eaab405f1"></div>
<div class="tve-cb tve_empty_dropzone" data-css="tve-u-16eaab405f2">
<div class="tcb-clear" data-css="tve-u-16eaab405f3">
<div class="thrv_wrapper thrv_contentbox_shortcode thrv-content-box" data-css="tve-u-16eaab98aa8" data-value-type="percent">
<div class="tve-content-box-background" data-css="tve-u-16eaab95f1b" data-clip-id="86f6ce0d6b85b"><svg width="0" height="0" class="tve-decoration-svg"><defs><clipPath id="clip-right-86f6ce0d6b85b" class="decoration-clip clip-path-right" clipPathUnits="objectBoundingBox" data-screen="" decoration-type="slanted" slanted-angle="9" style="" data-inverted="true"><polygon points="0 0, 0 1, 0.8416 1, 1 0"></polygon></clipPath><clipPath id="clip-tablet-right-86f6ce0d6b85b" class="decoration-clip clip-path-tablet-right" clipPathUnits="objectBoundingBox" data-screen="tablet-" decoration-type="slanted" slanted-angle="9" style="" data-inverted="true"><polygon points="0 0, 0 1, 0.8416 1, 1 0"></polygon></clipPath><clipPath id="clip-mobile-right-86f6ce0d6b85b" class="decoration-clip clip-path-mobile-right" clipPathUnits="objectBoundingBox" data-screen="mobile-" decoration-type="slanted" slanted-angle="9" style="" data-inverted="true"><polygon points="0 0, 0 1, 0.8416 1, 1 0"></polygon></clipPath></defs></svg></div>
<div class="tve-cb" data-css="tve-u-16eaab405f6">
<div class="thrv_wrapper thrv_text_element" data-css="tve-u-16eaab405f7">
<p data-css="tve-u-16eaab405f8">The two sides</p>
</div>
</div>
</div>
</div>
<div class="thrv_wrapper thrv-columns" data-css="tve-u-16eaab405f9" style="--tcb-col-el-width:759.188;">
<div class="tcb-flex-row v-2 tcb--cols--1" data-css="tve-u-16eaab405fa">
<div class="tcb-flex-col">
<div class="tcb-col">
<div class="thrv_wrapper thrv_toggle_shortcode" data-hover-color="rgb(236, 238, 241)" data-css="tve-u-16eaab405fb">
<div class="tve_faq">
<div class="tve_faqI">
<div class="tve_faqB"><span class="tve_not_editable tve_toggle"></span></p>
<h4 class="tve_editable" data-css="tve-u-16eaab405fc">The two sides</h4>
</div>
<div class="tve_faqC" style="display: none;">
<div class="thrv_wrapper thrv_text_element">
<p data-css="tve-u-16eaab405fd">Two sides, opposite of what we want.</p>
<p>We like to believe that our side is motivated by greed and passion and that our side is focused on creating fairness and equality.</p>
<p>But is it?</p>
<p>Organizations that work only on the foundation are fragile. They easily fall apart, they take a lot of heat and they're not particularly trustworthy.</p>
<p>The other side, the good side, the organization that scales and gets to where they care about doing great work (compared to the other side), is built on shared goals, shared vision and a mutual attraction to doing the work.</p>
<p>Opposites attract.</p>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
<div class="thrv_wrapper thrv_contentbox_shortcode thrv-content-box" style="" data-css="tve-u-160c1e88a23" data-ct="stylebox-30600" data-ct-name="FAQ Box">
<div class="tve-content-box-background" data-css="tve-u-16eaab405f1"></div>
<div class="tve-cb tve_empty_dropzone" data-css="tve-u-16eaab405f2">
<div class="tcb-clear" data-css="tve-u-16eaab405f3">
<div class="thrv_wrapper thrv_contentbox_shortcode thrv-content-box" data-css="tve-u-16eaab98aa8" data-value-type="percent">
<div class="tve-content-box-background" data-css="tve-u-16eaab95f1b" data-clip-id="86f6ce0d6b85b"><svg width="0" height="0" class="tve-decoration-svg"><defs><clipPath id="clip-right-86f6ce0d6b85b" class="decoration-clip clip-path-right" clipPathUnits="objectBoundingBox" data-screen="" decoration-type="slanted" slanted-angle="9" style="" data-inverted="true"><polygon points="0 0, 0 1, 0.8416 1, 1 0"></polygon></clipPath><clipPath id="clip-tablet-right-86f6ce0d6b85b" class="decoration-clip clip-path-tablet-right" clipPathUnits="objectBoundingBox" data-screen="tablet-" decoration-type="slanted" slanted-angle="9" style="" data-inverted="true"><polygon points="0 0, 0 1, 0.8416 1, 1 0"></polygon></clipPath><clipPath id="clip-mobile-right-86f6ce0d6b85b" class="decoration-clip clip-path-mobile-right" clipPathUnits="objectBoundingBox" data-screen="mobile-" decoration-type="slanted" slanted-angle="9" style="" data-inverted="true"><polygon points="0 0, 0 1, 0.8416 1, 1 0"></polygon></clipPath></defs></svg></div>
<div class="tve-cb" data-css="tve-u-16eaab405f6">
<div class="thrv_wrapper thrv_text_element" data-css="tve-u-16eaab405f7">
<p data-css="tve-u-16eaab405f8">Agenda</p>
</div>
</div>
</div>
</div>
<div class="thrv_wrapper thrv-columns" data-css="tve-u-16eaab405f9" style="--tcb-col-el-width:759.188;">
<div class="tcb-flex-row v-2 tcb--cols--1" data-css="tve-u-16eaab405fa">
<div class="tcb-flex-col">
<div class="tcb-col">
<div class="thrv_wrapper thrv_toggle_shortcode" data-hover-color="rgb(236, 238, 241)" data-css="tve-u-16eaab405fb">
<div class="tve_faq">
<div class="tve_faqI">
<div class="tve_faqB"><span class="tve_not_editable tve_toggle"></span></p>
<h4 class="tve_editable" data-css="tve-u-16eaab405fc">Agenda</h4>
</div>
<div class="tve_faqC" style="display: none;">
<div class="thrv_wrapper thrv_text_element">
<p data-css="tve-u-16eaab405fd">"I don't have an agenda," is not a useful excuse in our age of agenda mining.</p>
<p>No, the real question is, "do you care?"</p>
<p>We care about the truth, we care about the integrity of the process, and we care about the ability of those with different worldviews to reach agreements about how to achieve those goals.</p>
<p>The sooner we know this and act on it, the sooner you'll be on the same page as everyone else about the opportunities and the challenges of modern collaboration.</p>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
<div class="thrv_wrapper thrv_contentbox_shortcode thrv-content-box" style="" data-css="tve-u-160c1e88a23" data-ct="stylebox-30600" data-ct-name="FAQ Box">
<div class="tve-content-box-background" data-css="tve-u-16eaab405f1"></div>
<div class="tve-cb tve_empty_dropzone" data-css="tve-u-16eaab405f2">
<div class="tcb-clear" data-css="tve-u-16eaab405f3">
<div class="thrv_wrapper thrv_contentbox_shortcode thrv-content-box" data-css="tve-u-16eaab98aa8" data-value-type="percent">
<div class="tve-content-box-background" data-css="tve-u-16eaab95f1b" data-clip-id="86f6ce0d6b85b"><svg width="0" height="0" class="tve-decoration-svg"><defs><clipPath id="clip-right-86f6ce0d6b85b" class="decoration-clip clip-path-right" clipPathUnits="objectBoundingBox" data-screen="" decoration-type="slanted" slanted-angle="9" style="" data-inverted="true"><polygon points="0 0, 0 1, 0.8416 1, 1 0"></polygon></clipPath><clipPath id="clip-tablet-right-86f6ce0d6b85b" class="decoration-clip clip-path-tablet-right" clipPathUnits="objectBoundingBox" data-screen="tablet-" decoration-type="slanted" slanted-angle="9" style="" data-inverted="true"><polygon points="0 0, 0 1, 0.8416 1, 1 0"></polygon></clipPath><clipPath id="clip-mobile-right-86f6ce0d6b85b" class="decoration-clip clip-path-mobile-right" clipPathUnits="objectBoundingBox" data-screen="mobile-" decoration-type="slanted" slanted-angle="9" style="" data-inverted="true"><polygon points="0 0, 0 1, 0.8416 1, 1 0"></polygon></clipPath></defs></svg></div>
<div class="tve-cb" data-css="tve-u-16eaab405f6">
<div class="thrv_wrapper thrv_text_element" data-css="tve-u-16eaab405f7">
<p data-css="tve-u-16eaab405f8">Dreams</p>
</div>
</div>
</div>
</div>
<div class="thrv_wrapper thrv-columns" data-css="tve-u-16eaab405f9" style="--tcb-col-el-width:759.188;">
<div class="tcb-flex-row v-2 tcb--cols--1" data-css="tve-u-16eaab405fa">
<div class="tcb-flex-col">
<div class="tcb-col">
<div class="thrv_wrapper thrv_toggle_shortcode" data-hover-color="rgb(236, 238, 241)" data-css="tve-u-16eaab405fb">
<div class="tve_faq">
<div class="tve_faqI">
<div class="tve_faqB"><span class="tve_not_editable tve_toggle"></span></p>
<h4 class="tve_editable" data-css="tve-u-16eaab405fc">Dreams</h4>
</div>
<div class="tve_faqC" style="display: none;">
<div class="thrv_wrapper thrv_text_element">
<p data-css="tve-u-16eaab405fd">So, one day, I got an email from a college friend who was accepting applications for a job at her local bookstore.</p>
<p>What a wonderful way to spend a weekend. My friend and I spent an hour at the local supermarket, talking books and discussing what we would learn from each other as we delved deeper into our obsessions.</p>
<p>The best part of the weekend was when we all got together and talked about our dreams. Here were people who were making plans and taking action to make those plans a reality.</p>
<p>It was a stark reminder to me every time I went to meet someone that our dreams were just that, dreams. I could always check them off as I went.</p>
<p>There are more than a billion people on this planet (and counting in real time) who have neither been to college nor found themselves working toward a college degree. More than a billion people who, like my friend, are working to level up and make a difference.</p>
<p>If you want to get ahead, level up. If you want to make a difference, go take a level up.</p>
<p>Shared dreams aren't just a comforting notion, they're a vital tool for survival.</p>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
<div class="thrv_wrapper thrv_text_element">
<h2 class="">Conclusion</h2>
<ul class="">
<li><span data-preserver-spaces="true">Training and re-training are resource intensive. You need to spend a good amount of resources to be able to train even the "smaller" model with 355M parameters.&nbsp;</span></li>
<li><span data-preserver-spaces="true">This smaller model generates impressive, surprising, and sometimes profound text. But none of the essays I read are publication-ready. A good editor, however, can turn these articles into gold. Perhaps, the larger model is as scary as&nbsp;</span><a href="https://openai.com/blog/better-language-models/" target="_blank"><span data-preserver-spaces="true">OpenAI initially claimed.&nbsp;</span></a></li>
<li><span data-preserver-spaces="true">The best part: everyone, even beginners, can take part in re-training, finetuning these powerful natural language generators.&nbsp;</span></li>
</ul>
</div>
<div class="thrv_wrapper thrv_text_element">
<p><strong>What do you think? What uses of this technology do you see? </strong></p>
</div>
<div class="tcb_flag" style="display: none"></div>
<span class="tve-leads-two-step-trigger tl-2step-trigger-2626"></span><span class="tve-leads-two-step-trigger tl-2step-trigger-0"></span><p>The post <a rel="nofollow" href="https://nandeshwar.info/data-science-2/use-openai-gpt-2-to-create-ai-writer/">How to Use OpenAI&#8217;s GPT-2 to Create an AI Writer</a> appeared first on <a rel="nofollow" href="https://nandeshwar.info">nandeshwar.info</a>.</p>
]]></content:encoded>
					
					<wfw:commentRss>https://nandeshwar.info/data-science-2/use-openai-gpt-2-to-create-ai-writer/feed/</wfw:commentRss>
			<slash:comments>3</slash:comments>
		
		
			</item>
	</channel>
</rss>

<!--
Performance optimized by W3 Total Cache. Learn more: https://www.boldgrid.com/w3-total-cache/

Page Caching using disk: enhanced (Page is feed) 
Content Delivery Network via Amazon Web Services: CloudFront: d2py08v4b28rs4.cloudfront.net
Database Caching 47/420 queries in 0.147 seconds using disk (Request-wide modification query)

Served from: nandeshwar.info @ 2025-06-21 05:02:14 by W3 Total Cache
-->