<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" media="screen" href="/~d/styles/rss2full.xsl"?><?xml-stylesheet type="text/css" media="screen" href="http://feeds.feedburner.com/~d/styles/itemcontent.css"?><rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:wfw="http://wellformedweb.org/CommentAPI/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:sy="http://purl.org/rss/1.0/modules/syndication/" xmlns:feedburner="http://rssnamespace.org/feedburner/ext/1.0" version="2.0">

<channel>
	<title>The Iteration Station</title>
	
	<link>http://www.Iterationstation.com</link>
	<description>Drinking from the Analytics Fire Hose</description>
	<pubDate>Wed, 06 May 2009 16:46:37 +0000</pubDate>
	<generator>http://wordpress.org/?v=</generator>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<atom10:link xmlns:atom10="http://www.w3.org/2005/Atom" rel="self" href="http://feeds.feedburner.com/TheIterationStation" type="application/rss+xml" /><item>
		<title>Facebook Ad Optimization: Optimal Targeting and Comparison</title>
		<link>http://feedproxy.google.com/~r/TheIterationStation/~3/WNVswzJjGM8/</link>
		<comments>http://www.Iterationstation.com/2009/05/06/facebook-ad-optimization-optimal-targeting-and-comparison/#comments</comments>
		<pubDate>Wed, 06 May 2009 16:25:56 +0000</pubDate>
		<dc:creator>bkloss</dc:creator>
		
		<category><![CDATA[facebook]]></category>

		<category><![CDATA[ad optimization]]></category>

		<category><![CDATA[Chi-Squared]]></category>

		<category><![CDATA[demographic selection]]></category>

		<category><![CDATA[Logistic Regression]]></category>

		<category><![CDATA[predictive analytics]]></category>

		<category><![CDATA[social networking modeling]]></category>

		<guid isPermaLink="false">http://www.Iterationstation.com/?p=388</guid>
		<description><![CDATA[Here&#8217;s the third installment of my series on Facebook ad demographic optimization.  In the previous posts, we defined the problem of Facebook advertisers misspecifying demographics .  In the second post, we examined the results of Facebook ad clusters derived from text mining . This instalment picks up where we left off by showing a method [...]]]></description>
			<content:encoded><![CDATA[<p style="text-align: justify;">Here&#8217;s the third installment of my series on Facebook ad demographic optimization.  In the previous posts, we defined the problem of <a href="http://www.iterationstation.com/2009/05/03/facebook-ad-optimization/">Facebook advertisers misspecifying demographics</a> .  In the second post, we examined the results of <a href="http://www.iterationstation.com/2009/05/04/facebook-ad-optimization-ad-clustering/">Facebook ad clusters derived from text mining</a> . This instalment picks up where we left off by showing a method for determining the optimal demographics within a homogenious ad cluster.  Once the overall distribution of responses has been charted for each demographic, we explore a way to determine if a single ad shares the same response rate distributions.  Then and only then, can we determine if an ad has been misspecified.</p>
<h3 style="text-align: justify;">Optimal Targeting</h3>
<p style="text-align: justify;">Once clusters of similar ads are identified, the relationship between the probability to click and user demographics can be explored.  Certain levels of the demographic predictors would exhibit a higher click probability than others; men are more likely than women to click on get six pack abs.  Candidate variables for a click model would include, but not be limited to: location, age, sex, education, relationship and interested in.  If probability to click by demographic can be modeled for each ad group, Facebook would be able to identify the target range for an ad group based on the number of clicks per day the advertiser wanted to receive.  If the advertiser’s budget were small, Facebook could serve the ad to the most targeted range.  If the budget were large, they would have to increase the range size serving ads to less and less targeted user groups.</p>
<p>Logistic regression using a selection method can be employed to determine the significance and contribution of each demographic to predict the probability of a click.  The interested in variable may need to be dropped because of insufficient frequency counts (small number of LGBT responders) at different levels of the other demographic predictors to prevent quasi-separation.  Other variables such as keywords and workplaces have too many response levels to be considered in the model without binning or clustering the data.  Additionally, location will need to be binned by region or clustered using census data to reduce the number of predictor levels.  Age will need to be plotted by the logit to determine if it should be entered as a continuous, quadratic or cubit predictor.</p>
<p>Below is the equation for the proposed logistic model:</p>
<p><img title="Demographics Logistic Model" src="http://iterationstation.com/wp-content/uploads/2009/05/picture-12.png" alt="This model can be used to deterimine the most important facebook demographics" title="Demographics Logistic Model" width="576" height="70" /></p>
<p><span id="more-388"></span></p>
<p>Because a binary response variable exhibits non constant variance, coefficients in logistic regression are determined using maximum likelihood estimation as opposed to a least squares method.  An examination of the distribution of the predictor variables by the response coupled with variable coefficient scores and contrast statements can be used to understand how the probability of clicking changes at different levels of the predictor variable.  The results of contrasts will reveal what demographics to select to obtain an optimal click through rate for a specific ad cluster.</p>
<p>Ultimately, the optimal demographic range depends on the number of clicks an advertiser wants per day.  The larger the number of clicks desired, the larger and less targeted the optimal range would become.</p>
<h2>Ad demographic Comparison</h2>
<p style="text-align: justify;">After the optimal and overall response range has been recorded for an ad cluster, Facebook could determine if a single ad’s demographics are over targeted with a comparison of distributions.  Before recommending that a test ad change its demographic range, one would first use statistical methods to compare the distribution of responders in the test ad to a similar set of responders from all ads in the same cluster.  If results of the tests indicated that the distribution of responders in each group came from the same population, it would be appropriate to assume that the test ad could expect to obtain similar response rates for a demographic range that its parent cluster achieved.  If the demographic range exhibiting the highest response rate was not captured by the test ad’s specification, the test ad would be deemed over targeted.  To ensure an apples-to-apples comparison, a sampling would control for all demographic characteristics between the test ad and its parent ad cluster.</p>
<h3>Continuous Demographics</h3>
<p style="text-align: justify;">Assuming the ability to take a large sample, the distribution of responses for continuous variables could be compared using Kolmogorov-Smirnov test.  Age data was simulated for responders of a test credit score (CS) ad and for responders from all other CS ads using SAS 9.1 .  The distribution of response frequency by age is below:</p>
<p><img title="Facebook Demmographic conparisons" src="http://iterationstation.com/wp-content/uploads/2009/05/picture-13.png" alt="Using the Kolmogorov Smirnov test to examine facebook demographic distributions" title="Facebook Demmographic conparisons" width="582" height="339" /></p>
<p>Kolmogorov-Smirnov was used to see if the distribution of response was statistically different within the test range for the test ad and all ads.  An insignificant p-value (alpha=.05) indicated that the distributions of responders were not unequal.  This result suggests that the test ad could improve response rate by widening or shifting the specified age range to include the top of the response distribution for all CS ads.</p>
<h3>Categorical Demographics</h3>
<p style="text-align: justify;">Response frequencies of categorical demographics can also be examined to determine if the distribution of responders for a test ad is the same as the distribution of all same cluster ads.  As with continuous variables, samples would need to be carefully taken to ensure a proper comparison.  Response data was generated for a test CS ad and for responders from all other CS ads.   Relationship status values were assigned based on two samples of one thousand values chosen at random from the binomial distribution.  Below is a graph of the frequency of responders by relationship status:</p>
<p><img title="Facebook relationship status Chis Squared results graph" src="http://iterationstation.com/wp-content/uploads/2009/05/picture-14.png" alt="Use of Chi Squared Statistic to show the similarity of ad group demographics" title="Facebook relationship status Chis Squared results graph" width="554" height="369" /></p>
<p>A Chi-Square statistic was computed to determine if the frequency of response counts were significantly different for the test CS ad vs. all CS ad values within the test range.  An insignificant p-value (alpha=.05) resulted in failing to reject the null hypothesis that the samples are drawn from the same population.  These results suggest that the test ad group is over targeted and could increase response rates by including married users.</p>
<p style="text-align: justify;">In the next post, we&#8217;ll discuss the implications, extensions and pitfalls of the method laid out in the previous three posts.</p>
<img src="http://feeds.feedburner.com/~r/TheIterationStation/~4/WNVswzJjGM8" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://www.Iterationstation.com/2009/05/06/facebook-ad-optimization-optimal-targeting-and-comparison/feed/</wfw:commentRss>
		<feedburner:origLink>http://www.Iterationstation.com/2009/05/06/facebook-ad-optimization-optimal-targeting-and-comparison/</feedburner:origLink></item>
		<item>
		<title>Facebook Ad Optimization: Ad Clustering</title>
		<link>http://feedproxy.google.com/~r/TheIterationStation/~3/TvctME1Npq4/</link>
		<comments>http://www.Iterationstation.com/2009/05/04/facebook-ad-optimization-ad-clustering/#comments</comments>
		<pubDate>Mon, 04 May 2009 14:10:25 +0000</pubDate>
		<dc:creator>bkloss</dc:creator>
		
		<category><![CDATA[facebook]]></category>

		<category><![CDATA[ad clustering]]></category>

		<category><![CDATA[demographic targeting]]></category>

		<category><![CDATA[Enterprise Miner]]></category>

		<category><![CDATA[SAS]]></category>

		<category><![CDATA[SVD scores]]></category>

		<category><![CDATA[Text mining]]></category>

		<guid isPermaLink="false">http://www.Iterationstation.com/?p=370</guid>
		<description><![CDATA[This is the second post of my Facebook ad serving optimization article.  If you haven&#8217;t read my first post in the series , check it out here for a fleshed out version of the problem statement. To recap, we are looking at the problem of identifying Facebook ads that are misspecified, then giving marketers a [...]]]></description>
			<content:encoded><![CDATA[<p>This is the second post of my Facebook ad serving optimization article.  If you haven&#8217;t read my <a href="http://www.iterationstation.com/2009/05/03/facebook-ad-optimization/">first post in the series</a> , check it out here for a fleshed out version of the problem statement. To recap, we are looking at the problem of identifying Facebook ads that are misspecified, then giving marketers a gentle nudge to help them get the highest CTR.  This post employs text mining to discover similar ad groups.  This would provide Facebook with a basis of comparison to determine if an ad is not optimally targeted.</p>
<h3>Ad Text Clustering</h3>
<p style="text-align: justify;">The first step to correcting over targeting would entail identifying similar ad groups.  Once these groups are identified, Facebook can then amass demographic and response data into a large data set for later analysis.</p>
<p style="text-align: justify;">Ads are composed of text and pictures describing a product or service.  To evaluate the worth of ad clustering based on textual attributes, a sample of 47 distinct Facebook ads were copied from search result pages.  SAS Enterprise Miner (EM) was used to create and describe ad clusters.  Below is an outline of the text clustering process flow:</p>
<p><img title="Text Mining Process flow" src="http://iterationstation.com/wp-content/uploads/2009/05/picture-8.png" alt="This is an upper level overview of the text mining process" title="Text Mining Process flow" width="527" height="262" /></p>
<p><span id="more-370"></span></p>
<p style="text-align: justify;">Initial clustering was performed using Singular Value Decomposition (SVD).  SVD is most easily conceived of as a dimension reduction technique for text mining.  SVD extracts scores that represent as much of the latent structure of terms as possible between different documents.  The process is analogous to what principal component analysis does for a set of predictor variables.</p>
<p style="text-align: justify;">In the first round of clustering, only a few words (and, of, the, ect) were excluded from the list of considered terms.  All document terms minus the excluded terms constituted the start list.  Only the terms in the start list were used to extract the SVD scores used to form the document clusters.</p>
<p style="text-align: justify;">EM offers a number of tools to interact with and explore resulting cluster sets.  It is common to iterate on the clustering process by refining the start list through further term exclusion and the stemming of similar terms.  Additional terms were removed from the Facebook ad start list (free, most, ect.) to further refine the ability of the algorithm to differentiate between ad groups.  The resulting ad clusters  are shown below with a few descriptive terms listed for each cluster:</p>
<p><img title="Facebook Ad Clusters" src="http://iterationstation.com/wp-content/uploads/2009/05/picture-9.png" alt="This image shows the breakdown and description of facebook ad clusters" title="Facebook Ad Clusters" width="564" height="396" /></p>
<h2>Clustering Results</h2>
<p style="text-align: justify;">Clustering was performed on both ad body text and a combination of ad body and ad headline text.  The clusters resulting from only ad body text were more homogeneous and are described below.</p>
<p style="text-align: justify;">As a whole, the final start list did a fairly good job of separating different ad groups.  The words report, credit and score drive cluster two membership.  This cluster contains every credit score ad as well as an ad for Graduate school tests and continuing legal education.  Cluster seven is represented by the terms quote, insurance and work.  It contains both insurance ads from the sample as well as an ad for stock quotes and health and wellness.  Cluster nine contains three out of four ads for home buying or improvement.  Cluster five lacks SVD values and is composed of all unclassifiable ads.</p>
<p style="text-align: justify;">If a larger sample were obtained, purity of the ad clusters may improve. Additionally, with a larger sample, methods such as decision tree classification can be applied to the members of an ad clusters to increase the purity of the final ad segments.  For example, if after text mining, a cluster contained ads for home, car and life insurance, a decision tree could be trained to use a small set of categorical values (home, life car, protection) for the splitting criteria, resulting in smaller, more pure segments.</p>
<p style="text-align: justify;">These initial results based on a small corpus are encouraging and suggest that a more refined method with a larger sample size could prove to be quite effective to identify ad groups within Facebook’s inventory.   If pure clusters could be obtained, the next task would be to find the optimal demographic range for each ad group.  Once the optimal range were known, it could be determined if a single ad’s demographic specifications were in line with the optimal range or otherwise over targeted.</p>
<p style="text-align: justify;">In the next post, we look at a method for determining the optimal demographic ranges for Facebooks ad groups.</p>
<img src="http://feeds.feedburner.com/~r/TheIterationStation/~4/TvctME1Npq4" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://www.Iterationstation.com/2009/05/04/facebook-ad-optimization-ad-clustering/feed/</wfw:commentRss>
		<feedburner:origLink>http://www.Iterationstation.com/2009/05/04/facebook-ad-optimization-ad-clustering/</feedburner:origLink></item>
		<item>
		<title>Facebook Ad Optimization</title>
		<link>http://feedproxy.google.com/~r/TheIterationStation/~3/SG5F3pdC2b4/</link>
		<comments>http://www.Iterationstation.com/2009/05/03/facebook-ad-optimization/#comments</comments>
		<pubDate>Sun, 03 May 2009 16:56:24 +0000</pubDate>
		<dc:creator>bkloss</dc:creator>
		
		<category><![CDATA[facebook]]></category>

		<category><![CDATA[ad optimization]]></category>

		<category><![CDATA[ad serving]]></category>

		<category><![CDATA[demographic selection]]></category>

		<category><![CDATA[demographic targeting]]></category>

		<category><![CDATA[iab]]></category>

		<category><![CDATA[online advertising]]></category>

		<category><![CDATA[social networking]]></category>

		<category><![CDATA[SVD clustering]]></category>

		<category><![CDATA[Text mining]]></category>

		<guid isPermaLink="false">http://www.Iterationstation.com/?p=344</guid>
		<description><![CDATA[The Setup&#8230;
There&#8217;s been quite a bit of hubbub about Facebook as of late:  unexpected privacy updates and quick rollbacks, homepage redesign riots, the list goes on.  This series of posts presents a method to address a much more serious question.  How does Facebook ascend to be the crown prince of the advertising world?  In my [...]]]></description>
			<content:encoded><![CDATA[<h1>The Setup&#8230;</h1>
<p style="text-align: justify;">There&#8217;s been quite a bit of hubbub about Facebook as of late:  unexpected privacy updates and quick rollbacks, homepage redesign riots, the list goes on.  This series of posts presents a method to address a much more serious question.  How does Facebook ascend to be the crown prince of the advertising world?  In my opinion, the way forward is to leverage the mass amount of data they collect about their users to become a behavioral targeting GOD.  The monetization team at Facebook is chock full of really smart, talented guys working on this very thing.  I know- I&#8217;ve spoken with 3 of them over the last few months, yet with all the data about their users and a really talented team it&#8217;s clear that I&#8217;m not being served compelling advertising.  Here&#8217;s a few short examples&#8230;</p>
<p style="text-align: justify;"><img style="margin: 15px; float: right;" src="http://iterationstation.com/wp-content/uploads/2009/03/picture-15.png" alt="Mission trips" width="218" height="110" /> Here&#8217;s a little ditty asking me to join a Christian Mission trip.  I haven&#8217;t specified a religion as part of my demographic profile.  Don&#8217;t you think this ad may enjoy more success when served to an expressly Christian viewer?</p>
<p style="text-align: justify;"><br class="spacer_" /></p>
<p style="text-align: justify;"><br class="spacer_" /></p>
<p style="text-align: justify;"><span id="more-344"></span> <br class="spacer_" /></p>
<p style="text-align: justify;"><img style="margin: 15px; float: left;" src="http://iterationstation.com/wp-content/uploads/2009/03/picture-16.png" alt="" width="210" height="117" /></p>
<p style="text-align: justify;">I just had to include this ad for comic relief.  I don&#8217;t know if I can blame Facebook for this one.  If there&#8217;s a good demographic segment for the incinerator market, I know I&#8217;m not a part of it.  You almost had me with the comment about no mid cycle stirring.  Very close to the click, so close.</p>
<p style="text-align: justify;"><br class="spacer_" /></p>
<p style="text-align: justify;"><br class="spacer_" /></p>
<p style="text-align: justify;"><br class="spacer_" /></p>
<p style="text-align: justify;">In the following series of posts, I lay out a possible path to optimal ad serving based on demographic targeting.  The gist of my method involves Facebook training its advertisers to specify demographics optimally.  As a result, Facebook can expect a higher click-through rate and increased ad serving revenue.  This first post will explore the business problem in more depth.</p>
<h2 style="text-align: justify;">Problem Statement:</h2>
<p style="text-align: justify;">Facebook serves ads based on demographic targeting.  If an advertiser creates a target range that is too restrictive (over targeting), they may loose valuable clicks and impressions.  If advertisers create too broad a range (under targeting), Facebook risks serving irrelevant ads that will yield lower click through rates (CTR). This study employs a combination of text mining and statistical methods to identify and address the problem of over targeting for Facebook pay per click campaigns.<br />
 To that aim, two questions must be addressed:</p>
<p style="text-align: justify;">•    How do we identify when an ad is over targeted?</p>
<p><img src="http://iterationstation.com/wp-content/uploads/2009/03/picture-11.png" alt="" width="494" height="338" /></p>
<p style="text-align: justify;">•    What action can we take to help the advertiser improve demographic targeting for the ad group?</p>
<h2 style="text-align: justify;">Have You Targeted too Much?</h2>
<p style="text-align: justify;">An operational definition of over targeting is required to investigate this question.  An ad is over targeted if the advertiser has failed to capture user segments that will produce the desired number of clicks while maintaining the highest Click Through Rate possible.  This could happen if an advertiser selects a demographic range that is within the optimal range but is too restrictive or if an advertiser selects a range that is outside the optimal range all together.  If advertisers were made aware of the optimal range for demographics, they could specify demographics to achieve their desired number of clicks in with the fewest impressions.</p>
<p style="text-align: justify;">Ignoring real world constraints, Facebook could run a designed experiment serving each ad to all its users to determine the distributions of CTR by demographic.  This approach would come with a huge opportunity cost because Facebook would be displaying millions of ads for free.</p>
<p style="text-align: justify;">A more realistic and cost effective approach would be to look at an aggregate set of demographics for responders to ‘similar’ ads from a comparable time period.  Response data for the combined set would be analyzed to determine the optimal ad demographics to maximize yield.  Once the optimal range were determined, Facebook can compare that range to the specified range of recently created similar ads to determine if they are over targeted.  Armed with this data, Facebook can recommend advertisers to modify their demographic specifications resulting in a higher performing campaign.  This series of posts addresses each step of the above approach in detail by applying analytical methods to real world and simulated data.</p>
<p style="text-align: justify;">I hope you readers will jump into the discussion and propose ways to improve the initial approach.  In the next post I&#8217;ll <a href="http://www.iterationstation.com/2009/05/04/facebook-ad-optimization-ad-clustering/">describe the text mining process I used to cluster documents</a> &#8230;</p>
<p style="text-align: justify;">Enjoy!</p>
<img src="http://feeds.feedburner.com/~r/TheIterationStation/~4/SG5F3pdC2b4" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://www.Iterationstation.com/2009/05/03/facebook-ad-optimization/feed/</wfw:commentRss>
		<feedburner:origLink>http://www.Iterationstation.com/2009/05/03/facebook-ad-optimization/</feedburner:origLink></item>
		<item>
		<title>Are YOU a Predictably Irrational Analyst?</title>
		<link>http://feedproxy.google.com/~r/TheIterationStation/~3/-k-_PNr4pwo/</link>
		<comments>http://www.Iterationstation.com/2009/02/21/are-you-a-predictably-irrational-analyst/#comments</comments>
		<pubDate>Sat, 21 Feb 2009 02:31:04 +0000</pubDate>
		<dc:creator>bkloss</dc:creator>
		
		<category><![CDATA[Analytics]]></category>

		<category><![CDATA[YouTube]]></category>

		<category><![CDATA[behavioral economics]]></category>

		<category><![CDATA[cognitive psychology]]></category>

		<category><![CDATA[Dan Ariely]]></category>

		<category><![CDATA[linkedin]]></category>

		<category><![CDATA[marketing tips]]></category>

		<category><![CDATA[Predictably Irrational]]></category>

		<guid isPermaLink="false">http://www.Iterationstation.com/?p=239</guid>
		<description><![CDATA[Let&#8217;s begin the conversation with a little attention test.  Please count the number of passes the white team makes in the video below.
IT IS VERY IMPORTANT THAT YOU COUNT THE NUMBER OF PASSES!!!!!!!!   


Yep, it&#8217;s easy to miss what&#8217;s right in front of your face.  If you can&#8217;t trust your finely tuned visual [...]]]></description>
			<content:encoded><![CDATA[<p>Let&#8217;s begin the conversation with a little attention test.  Please count the number of passes the white team makes in the video below.</p>
<p><span style="font-size: small;"><span style="text-decoration: underline;"><span style="color: #ff0000;"><strong>IT IS VERY IMPORTANT THAT YOU COUNT THE NUMBER OF PASSES!!!!!!!!</strong> </span> </span> </span></p>
<p><object width="425" height="344" height="344" width="425" data="http://www.youtube.com/v/2pK0BQ9CUHk&amp;hl=en&amp;fs=1" type="application/x-shockwave-flash"><param name="allowFullScreen" value="true" /><param name="allowscriptaccess" value="always" /><param name="src" value="http://www.youtube.com/v/2pK0BQ9CUHk&amp;hl=en&amp;fs=1" /><param name="allowfullscreen" value="true" /></object></p>
<p><span id="more-239"></span></p>
<p>Yep, it&#8217;s easy to miss what&#8217;s right in front of your face.  If you can&#8217;t trust your finely tuned visual perception system to spot bears with an affinity for Michael Jackson, perhaps you&#8217;re overlooking other important things in your day-to-day life as a data analyst as well.</p>
<p>Last night I attended a talk by <a href="http://www.predictablyirrational.com/">Dan Ariely</a> , author of the New York Times best seller Predictably Irrational.  Throughout the talk, Dan offered evidence that the human brain a hardwired to make irrational choices that seem intuitively logical.  Dan provided a multitude of examples drawn from cognitive and social psychology and connected them to the seemingly irrational decisions made in economics.  Seemingly irrational that is&#8230;</p>
<p>once the listener was made aware of the biological and environmental factors that work against the voice of reason in subtle and undetectable ways.</p>
<p>I&#8217;m going to extend Dan&#8217;s ideas to the sphere of data analysis to explore how some of our innate biases influence results.</p>
<h2>Brain Jam</h2>
<p>Dan spoke about a study conducted in a grocery store.  People were given the opportunity to sample jam from two tasting booths.  Customers were presented with a booth sampling 6 or 24 types of jam depending on which store entrance they walked in through.   The booth with 24 different samples did a great job of drawing people in.  60% of those who entered the store stopped to sample the jam vs. 40% for the booth with 6 flavors.  At both booths, the people who stopped for a taste-test tried around two different types of jam before continuing to shop.</p>
<p>Which group do you think bought jam from the store shelves more often after leaving the booth?  The group that sampled from the 6 jam table bought jam off the shelf 30% of the time.  The group that sampled from the table of 24&#8230; 3%.  On average, 3% of customers <span style="text-decoration: underline;"><img style="margin: 10px; float: right;" src="http://iterationstation.com/wp-content/uploads/2009/02/picture-6.png" alt="" width="166" height="212" /> </span> have jam on the list when they walk in, so the 3% doesn&#8217;t really count anyway.</p>
<p>So, why the 27% difference?  The more complicated a decision, the more likely someone is to go with the default- in this case, not buy jam.  As analysts, we value parsimonious, intuitive models.  We like one solid number to go off of, be it P-values or R<sup>2</sup> , so we become locked in tunnel vision. We have to make a decision about which model is better so there&#8217;s a little voice whispering in the background, softly telling us to ignore crucial factors and make things &#8216;easy&#8217;.</p>
<p>Did you check the assumptions?</p>
<p>Did you look at the distributions of the input variables?</p>
<p>Many practitioners fall into the trap of model first, ask questions later.  Once we have found a significant set of predictors, we then go back and say&#8230; well that distribution looks &#8216;pretty&#8217; normal or  &#8216;It&#8217;s significant if we impute missing values with the mean&#8217;.  Of course we all know this is the wrong approach.  Don&#8217;t let yourself fall victim to irrational decisions in the face of a complex model (sorry neural nets <img src='http://www.Iterationstation.com/wp-includes/images/smilies/icon_wink.gif' alt=';)' class='wp-smiley' /> , not this time!)<sup> </sup></p>
<h2>Abstraction = Cheating</h2>
<p>Dan was asked to explain the prevalence of stock brokers using *fuzzy math* to inflate their quarterly bonuses.  Now it&#8217;s easy to see the conflict of interest that arises from performance based bonuses.  There are, however, a number of more subtle factors that lead to an increase or decrease in the tendency of people to cheat.</p>
<p>Dan set up a study where subjects were asked to complete 20 pen and paper math problems in a short period of time.  After the time elapsed, subjects scored their own quiz.  Here&#8217;s the twist, for each correct answer, participants either received a dollar or a token that they exchanged for a dollar three feet away from where they received the token.  I concluded that it shouldn&#8217;t make a difference if they were in the token or the dollar group, essentially they are the same thing.  Wrong!  Those who got tokens for right answers had MUCH higher self-reported scores than subjects who got dollars.  Does the presence of tokens increase mathematical ability?  I&#8217;m afraid not.  The token group was cheating, plain and simple.</p>
<p>It turns out that people are more willing to cheat if a level of abstraction is involved.  The more steps that distance a person from the money or goal he/she seeks, the more likely they are to cheat the system given the opportunity.  People were willing to cheat in the study because they were getting tokens, not dollars.  Perhaps even more interesting is that a great number of subjects in the study were willing to cheat just a little bit. It was not the case, as many people assume that a few people cheated by a wide margin.</p>
<p>In data analysis, we take real world components of a business problem such as money or time and add a level of abstraction.  Dollars become probabilities and people turn into ROC curves.  With each added step of abstraction, it becomes easier for an analyst to justify fudging the numbers.  After all, it&#8217;s not <strong>real people</strong> being turned down for a loan by your credit scoring model, it&#8217;s just a slightly more significant set of predictor coefficients.</p>
<p>Moral of the story- test your new found prediction system on a validation set!  Analysts have the tendency to create predictive model that are shaped and bias toward achieving the end goal.  If you want to ensure that you haven&#8217;t tweaked the process to get a significant outcome, expose your model to a new set of similar observations and see if you come out with the same result.</p>
<p>People who cheat are not born with a mental defect. Instead,  the are created, in part, by abstract situations.  Would you deflate a P-value?  Given the right set of circumstances, it&#8217;s more probable that you&#8217;d like to believe.</p>
<p>How do you stop people from cheating?  Inflate their awareness of a value system.  People were much less likely to cheat in previous study if at the beginning they swore on a bible.  Yes, this method reduces the likelihood of cheating for all subjects, from Born again Christian to self-reported Atheist.</p>
<p>Similarly, remember that your model has real world consequences on the process or people it predicts.  That may hinder your hardwired inclination to act like a hedge fund manager.</p>
<h2>Free Marketing Tip</h2>
<p>Although somewhat unrelated, this last paragraph outlines a cool marketing tip that Dan shared and I can&#8217;t help but pass along.  For those of you who aren&#8217;t in marketing, this tip can be adapted for the task of picking a wingman for Saturday night.</p>
<p>Consider the following subscription options from an Economist Magazine landing page.</p>
<p><img src="http://iterationstation.com/wp-content/uploads/2009/02/picture-16.png" alt="" width="460" height="45" /></p>
<p>When Dan asked a class of grad students what they would buy, how many do you think chose print access for $125?  Nobody of course; grad students can count!  Here&#8217;s the buying breakdown:</p>
<p><img src="http://iterationstation.com/wp-content/uploads/2009/02/picture-17.png" alt="" width="444" height="46" /></p>
<p>Nice work Economist- most people went for the costly option.  So What happened when the &#8217;silly&#8217; print only choice was excluded?</p>
<p><img src="http://iterationstation.com/wp-content/uploads/2009/02/picture-18.png" alt="" width="337" height="45" /></p>
<p>Wow!  Looks like the previous addition of a seemingly inconsequential choice tipped the scales in your favor.  Without the print only option, the grad students overwhelmingly prefer the low cost online access plan.  Hey, what do you expect- they&#8217;re living on a stipend.</p>
<p>Here&#8217;s the key takeaway.  If you want to make an option more attractive, include a similar option that is slightly less desirable.</p>
<p>You may be saying gee, that&#8217;s great but how do I increase my chances to pick up the girl of my dreams Saturday night?  Easy- simply bring along a slightly less desirable version of yourself.  OK - I admit it - that may be pretty difficult to pull off.  But honestly people, if you want dating advise, you&#8217;ve come to the wrong blog.</p>
<p>I recommend you read Dan&#8217;s title &#8216;Predictably Irrational&#8217; or any of my recent favorite books listed below.<br />
<object width="600" height="200" height="200" width="600" data="http://ws.amazon.com/widgets/q?ServiceVersion=20070822&amp;MarketPlace=US&amp;ID=V20070822%2FUS%2Fiterastati-20%2F8010%2F4bd3a4a8-845e-4fd0-a722-82fd1309217f&amp;Operation=GetDisplayTemplate" type="application/x-shockwave-flash"><param name="id" value="Player_4bd3a4a8-845e-4fd0-a722-82fd1309217f" /><param name="quality" value="high" /><param name="bgcolor" value="#FFFFFF" /><param name="allowscriptaccess" value="always" /><param name="src" value="http://ws.amazon.com/widgets/q?ServiceVersion=20070822&amp;MarketPlace=US&amp;ID=V20070822%2FUS%2Fiterastati-20%2F8010%2F4bd3a4a8-845e-4fd0-a722-82fd1309217f&amp;Operation=GetDisplayTemplate" /><param name="name" value="Player_4bd3a4a8-845e-4fd0-a722-82fd1309217f" /><param name="align" value="middle" /></object> <noscript>&amp;amp;lt;A HREF=&amp;amp;quot;http://ws.amazon.com/widgets/q?ServiceVersion=20070822&amp;amp;amp;MarketPlace=US&amp;amp;amp;ID=V20070822%2FUS%2Fiterastati-20%2F8010%2F4bd3a4a8-845e-4fd0-a722-82fd1309217f&amp;amp;amp;Operation=NoScript&amp;amp;quot; mce_HREF=&amp;amp;quot;http://ws.amazon.com/widgets/q?ServiceVersion=20070822&amp;amp;amp;amp;MarketPlace=US&amp;amp;amp;amp;ID=V20070822%2FUS%2Fiterastati-20%2F8010%2F4bd3a4a8-845e-4fd0-a722-82fd1309217f&amp;amp;amp;amp;Operation=NoScript&amp;amp;quot;&amp;amp;gt;Amazon.com Widgets&amp;amp;lt;/A&amp;amp;gt;</noscript></p>
<img src="http://feeds.feedburner.com/~r/TheIterationStation/~4/-k-_PNr4pwo" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://www.Iterationstation.com/2009/02/21/are-you-a-predictably-irrational-analyst/feed/</wfw:commentRss>
		<feedburner:origLink>http://www.Iterationstation.com/2009/02/21/are-you-a-predictably-irrational-analyst/</feedburner:origLink></item>
		<item>
		<title>Creating a Cartesian Product in SAS</title>
		<link>http://feedproxy.google.com/~r/TheIterationStation/~3/aubUdeHnPFY/</link>
		<comments>http://www.Iterationstation.com/2009/02/13/creating-a-cartesian-product-in-sas/#comments</comments>
		<pubDate>Fri, 13 Feb 2009 06:49:48 +0000</pubDate>
		<dc:creator>bkloss</dc:creator>
		
		<category><![CDATA[SAS]]></category>

		<category><![CDATA[Cartisian Product]]></category>

		<category><![CDATA[Join all rows]]></category>

		<category><![CDATA[Proc SQL]]></category>

		<guid isPermaLink="false">http://www.Iterationstation.com/?p=253</guid>
		<description><![CDATA[Here&#8217;s the first of several question/answer posts.
Feel free to write me with your SAS question and I&#8217;ll provide the answer in a post.
Question
Suppose I have two SAS datasets, set A contains 5 variables having 3000 unique records and set B contains just 1 variable having 500 unique records (just a bunch of ID numbers).  [...]]]></description>
			<content:encoded><![CDATA[<p>Here&#8217;s the first of several question/answer posts.</p>
<p>Feel free to write me with your SAS question and I&#8217;ll provide the answer in a post.</p>
<h1>Question</h1>
<p>Suppose I have two SAS datasets, set A contains 5 variables having 3000 unique records and set B contains just 1 variable having 500 unique records (just a bunch of ID numbers).  Suppose I would like to create a new dataset where each of the records from set A would be duplicated so that the first record would be repeated 500 times where each of those would be paired with a unique ID from set B.  The goal is to have the new dataset with 6 variables (the five from A and the one from B) containing a total of 1,500,000 records produced in this way.</p>
<p>Any help or suggestions you can provide is greatly appreciated.</p>
<p><span id="more-253"></span></p>
<h1>Answer</h1>
<p>What you are asking for is called a Cartesian Product.  All possible combination of rows in all tables.</p>
<p>This can be accomplished with the following SAS code</p>
<p><span style="color: #0000ff;">PROC</span> SQL noprint;</p>
<p><span style="color: #0000ff;">Create Table <span style="color: #000000;">&lt;new data set&gt; as</span> </span></p>
<p><span style="color: #0000ff;">select</span> *<br />
<span style="color: #0000ff;">from</span> &lt;data set A&gt;, &lt;data set B&gt;;<br />
<span style="color: #0000ff;">quit</span> ;</p>
<img src="http://feeds.feedburner.com/~r/TheIterationStation/~4/aubUdeHnPFY" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://www.Iterationstation.com/2009/02/13/creating-a-cartesian-product-in-sas/feed/</wfw:commentRss>
		<feedburner:origLink>http://www.Iterationstation.com/2009/02/13/creating-a-cartesian-product-in-sas/</feedburner:origLink></item>
		<item>
		<title>Musing on Twitter monetization strategies…</title>
		<link>http://feedproxy.google.com/~r/TheIterationStation/~3/I0rNz4a68po/</link>
		<comments>http://www.Iterationstation.com/2009/02/08/thoughts-on-twitter-monetization-strategy/#comments</comments>
		<pubDate>Sun, 08 Feb 2009 06:25:57 +0000</pubDate>
		<dc:creator>bkloss</dc:creator>
		
		<category><![CDATA[Business Models]]></category>

		<category><![CDATA[Google]]></category>

		<category><![CDATA[Monetization Strategy]]></category>

		<category><![CDATA[privacy]]></category>

		<category><![CDATA[Rants and Raves]]></category>

		<category><![CDATA[Text mining]]></category>

		<category><![CDATA[Twitter]]></category>

		<category><![CDATA[YouTube]]></category>

		<guid isPermaLink="false">http://www.Iterationstation.com/?p=188</guid>
		<description><![CDATA[ 
The last few years have proved this bird can fly .  But can Twitter
dodge arrows?  Only time will tell.

I&#8217;ve been on Twitter for a few years now but it&#8217;s never managed to capture my full attention.  What has been taking up my attention is reading the seemingly never ending stream of  recent posts [...]]]></description>
			<content:encoded><![CDATA[<address> </address>
<address>The last few years have proved <a rel="nofollow" href="http://www.alleyinsider.com/2009/1/twitters-web-traffic-growing-like-a-weed/page/1#comment-49771f1d14b9b9b50075e9ef">this bird can fly</a> .  But can Twitter</address>
<address>dodge arrows?  Only time will tell.</address>
<p><img style="margin: 10px; float: right;" src="http://iterationstation.com/wp-content/uploads/2009/02/picture-2.png" alt="" width="332" height="201" /></p>
<p>I&#8217;ve been on Twitter for a few years now but it&#8217;s never managed to capture my full attention.  What <em>has</em> been taking up my attention is reading the seemingly never ending stream of  recent posts where <a rel="nofollow" href="http://www.alleyinsider.com/2009/1/announcing-the-create-a-twitter-revenue-model-contest"><em>everyone</em> knows the   best way to monetize Twitter</a> .  One idea that&#8217;s been bandied about is the potential to leverage their mass amount of personal data as a monetization strategy through targeted adserving or third party partnerships.  Yes, sites like Twitter and Facebook probably have more data about us than our own mothers but their ability to turn this data into dollars is hampered by three factors:</p>
<p><span id="more-188"></span></p>
<p style="padding-left: 30px;"><span style="text-decoration: underline;">Privacy Concerns</span> – Would you still use Twitter if they were directly providing your tweets to advertisers?  The answer for me is maybe; since the micro blogs are public at present.  Be that as it may, Twitter is a darling in the public eye, so there would be some serious negative press associated with any such move.</p>
<p style="padding-left: 30px;"><span style="text-decoration: underline;">A lack of tools</span> - Textual data mining, more specifically semantic analysis, has quite a ways to go before it can be used to successfully understand our thoughts and feelings at mass scale.  Look no further than the difference in meaning between the phrase &#8216;I love this weather!&#8217; uttered by a sarcastic Seattle resident on a rainy day and the same words stated by a Mountain View resident when it&#8217;s sunny and in the mid 70s.</p>
<p style="padding-left: 30px;"><span style="text-decoration: underline;">Visitor goals</span> - Contrast Google&#8217;s Adwords model to Facebooks targeted advertising.  Facebook has the most targeted adserving platform in the world, yet last time I checked, they are still operating in the red.  Not like it matters because VCs seem to offer such companies ridiculous evaluations as I&#8217;m sure twitter recently enjoyed.   Still, people do not login to Facebook to buy or research products- context is key!  That’s the distinction between their site and the search engines that have enjoyed such robust advertising revenues.</p>
<p>I&#8217;m not going to offer my thoughts on potential ways for Twitter to monetize because it seems like everything&#8217;s been said and hey- isn&#8217;t that <a rel="nofollow" href="http://blog.twitter.com/2009/01/its-business-time.html">Kevin Thau&#8217;s job</a> .  If you want to call me a negative Nancy for just being critical, I guess I&#8217;m guilty as charged.  For the time being, I think it&#8217;s OK that Twitter is on a cash burn.  They should be focused on building a sustainable infrastructure that can grow with their traffic instead of trying to squeeze a buck out of its new loyal fanbase.  Once the halo dust settles, Twitter can get down to business on converting their audience into dollars.</p>
<p>Google struck gold with AdwordsKeep but still flounders to make money off youtube.  On that note, I&#8217;ll coin the adage &#8216;an eyeball does not a dollar make&#8217;.  Speaking of eyeballs, keep one or two on the success of Facebook Connect for future directions of social media revenue models.  That could be the goose that lays 2.0&#8217;s golden egg.</p>
<img src="http://feeds.feedburner.com/~r/TheIterationStation/~4/I0rNz4a68po" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://www.Iterationstation.com/2009/02/08/thoughts-on-twitter-monetization-strategy/feed/</wfw:commentRss>
		<feedburner:origLink>http://www.Iterationstation.com/2009/02/08/thoughts-on-twitter-monetization-strategy/</feedburner:origLink></item>
		<item>
		<title>SAS Resume Tools</title>
		<link>http://feedproxy.google.com/~r/TheIterationStation/~3/15xY3aC7DAQ/</link>
		<comments>http://www.Iterationstation.com/2009/02/04/analytics-resume-tools/#comments</comments>
		<pubDate>Wed, 04 Feb 2009 22:00:37 +0000</pubDate>
		<dc:creator>bkloss</dc:creator>
		
		<category><![CDATA[Data Visualisation]]></category>

		<category><![CDATA[SAS]]></category>

		<category><![CDATA[Analytics Resume tools]]></category>

		<category><![CDATA[linkedin]]></category>

		<category><![CDATA[SAS jobs]]></category>

		<category><![CDATA[SAS programming]]></category>

		<category><![CDATA[SQL]]></category>

		<category><![CDATA[Visualization tools]]></category>

		<guid isPermaLink="false">http://www.Iterationstation.com/?p=141</guid>
		<description><![CDATA[Although you may consider yourself an Analyst, during your job search you need to think like a salesman . Every interaction with a prospective employer is a golden opportunity to sell yourself.  Submitting a resume is equivalent to cold calling a client so stick your foot in the door by connecting quickly with the [...]]]></description>
			<content:encoded><![CDATA[<p>Although you may consider yourself an Analyst, during your job search you need to think like a <em>salesman</em> . Every interaction with a prospective employer is a golden opportunity to sell yourself.  Submitting a resume is equivalent to cold calling a client so stick your foot in the door by connecting quickly with the hiring manager.  At a glance, this person needs to see what they&#8217;re looking for or else your resume will hit the bottom of the pile, or worse, the paper shredder.</p>
<p>The simple tools provided below turn any SAS Job resume into a targeted sales weapon.   The first visualization displays the words that occur most commonly among 450 SAS job postings from <a rel="nofollow" href="http://http://sasjobs.blogspot.com/">SASJobs.BlogSpot.com</a> and <a title="SAS Jobs" rel="nofollow" href="http://sasjobs.com/" title="SAS Jobs">SASJobs.com</a> .  Make sure you address each of these terms appropriately in your resume and personal statement.  The bigger the word, the more commonly it was used- act accordingly.  If you want to drill down even further, click on the layout drop down to increase the maximum number of terms shown.</p>
<p style="text-align: center;"><span style="font-size: x-small;"><strong><span style="text-decoration: underline;"><span style="color: #ff0000;">These Visualizations work best in Fire Fox    &#8212;&#8212;&#8212;&#8212;-      Make sure to have Java enabled</span> </span> </strong> </span></p>
<p><script src="http://manyeyes.alphaworks.ibm.com/manyeyes/visualizations/579d12bef2ea11dd84db000255111976/comments/57a7107af2ea11dd84db000255111976.js?width=400&amp;height=350" type="text/javascript"></script> <span id="more-141"></span> So, you know what terms to target, now the question becomes, what do I say about them?  Well, have no fear. The next tool displays exactly how keywords were used in our set of job descriptions.  Take an important term and type it into the search bar.  Hit enter and a tree will appear that shows you how the phrase was used in the job postings.  As an example, a search for &#8216;SQL&#8217; yields a tree that indicates companies are looking for people with knowledge of proc SQL, SQL programming and sql server.  If I click on the next term in a brach the tree will again expand and I can see that job posts also reference oracle databases in the same sentence.  This tool provides many examples of many related skills that company are looking for and can help the job searcher represent thier experience in a way that will have HR salivating.              <script src="http://manyeyes.alphaworks.ibm.com/manyeyes/visualizations/4463e1f6f2e911dd9fae000255111976/comments/448b3b2af2e911dd9fae000255111976.js" type="text/javascript"></script></p>
<img src="http://feeds.feedburner.com/~r/TheIterationStation/~4/15xY3aC7DAQ" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://www.Iterationstation.com/2009/02/04/analytics-resume-tools/feed/</wfw:commentRss>
		<feedburner:origLink>http://www.Iterationstation.com/2009/02/04/analytics-resume-tools/</feedburner:origLink></item>
		<item>
		<title>Youtube Cashes in on Rampant Piracy</title>
		<link>http://feedproxy.google.com/~r/TheIterationStation/~3/8tUXvo6j9eg/</link>
		<comments>http://www.Iterationstation.com/2009/01/26/youtube-cashes-in-on-rampant-piracy/#comments</comments>
		<pubDate>Mon, 26 Jan 2009 01:54:38 +0000</pubDate>
		<dc:creator>bkloss</dc:creator>
		
		<category><![CDATA[YouTube]]></category>

		<category><![CDATA[Amazon]]></category>

		<category><![CDATA[Monetization Strategy]]></category>

		<category><![CDATA[Mp3]]></category>

		<category><![CDATA[Piracy]]></category>

		<guid isPermaLink="false">http://www.Iterationstation.com/2009/01/26/youtube-cashes-in-on-rampant-piracy/</guid>
		<description><![CDATA[
 
Rumors of a Google - Amazon mp3 partnership surfaced mid last year (Googlezon! ).  Apparently, the relationship was cemented because Youtube now serves its visitors offers to buy Amazon mp3s related to the content they are viewing.  Let me be clear, you should read &#8216;related to the content&#8217; in the previous sentence to mean [...]]]></description>
			<content:encoded><![CDATA[<p><a title="Googlezon unite!!" rel="nofollow" href="http://www.Iterationstation.com/wp-content/uploads/2009/01/picture-11.png" title="Googlezon unite!!"></a></p>
<p style="text-align: center"><a title="Googlezon unite!!" href="http://www.Iterationstation.com/wp-content/uploads/2009/01/picture-11.png" title="Googlezon unite!!"><img src="http://www.Iterationstation.com/wp-content/uploads/2009/01/picture-11.png" alt="Googlezon unite!!" width="249" height="91" /> </a></p>
<p>Rumors of a <a rel="nofollow" href="http://blogs.telegraph.co.uk/claudine_beaumont/blog/2008/02/15/youtube_amazon_mp3_and_itunes_in_mnage_a_trois">Google - Amazon mp3 partnership</a> surfaced mid last year (<a rel="nofollow" href="http://robinsloan.com/epic/" target="_blank">Googlezon!</a> ).  Apparently, the relationship was cemented because Youtube now serves its visitors offers to buy Amazon mp3s related to the content they are viewing.  Let me be clear, you should read &#8216;related to the content&#8217; in the previous sentence to mean <strong>the exact same content </strong> you are viewing.  To put it bluntly, YouTube is making a pitch for you to buy the song you&#8217;re listening to, the song that someone illegally uploaded.</p>
<p><span id="more-103"></span></p>
<p>For example:</p>
<p><a title="Amazon youtube affiliate relationship in action" rel="nofollow" href="http://www.Iterationstation.com/wp-content/uploads/2009/01/picture-10.png" title="Amazon youtube affiliate relationship in action"><img src="http://www.Iterationstation.com/wp-content/uploads/2009/01/picture-10.png" alt="Amazon youtube affiliate relationship in action" width="483" height="278" /> </a></p>
<p>Problem?  If Youtube is smart enough to know what Mp3 to recommend for a pirated piece of its content, it should ALSO be smart enough to flag the content for removal.  So, in this case, YouTube&#8217;s business model is in direct conflict with its privacy policy. The best situation to recommend a song for purchase would be when someone is viewing the pirated video for that song.</p>
<p>Here&#8217;s what the <em>Tube</em> has to say&#8230;</p>
<blockquote>
<h2 class="yt-static">What Will Happen If You Upload Infringing Content</h2>
<p class="yt-static" style="margin-top: 0pt"><span style="color: #ff0000;">Anytime YouTube becomes aware that a video or any part of a video on our site infringes the copyrights of a third party, we will take it down from the site. We are required to do so by law.</span> If you believe that a video on the site infringes your copyright, send us a copyright notice and we will take it down. If you believe that we have removed a video that you uploaded in error and that you are the copyright owner or have permission, you can file a counter notice and let us know. Accounts determined to be repeat infringers may be subject to termination. Users with suspended or terminated accounts are prohibited from creating new accounts or accessing YouTube&#8217;s community features.</p>
</blockquote>
<p class="yt-static" style="margin-top: 0pt">In the interest of comedy, let&#8217;s step back for a second and consider the sales pitch here.</p>
<p class="yt-static" style="margin-top: 0pt">Tube:  Hi I&#8217;m YouTube, do you like the song you&#8217;re listening to for free?  Well, now you can buy it.</p>
<p class="yt-static" style="margin-top: 0pt">User:  Ahhh&#8230; I&#8217;m OK thanks, I&#8217;d prefer to keep enjoying it for free.  I mean come on YouTube, if I really wanted it, I&#8217;d just <a title="Download video from youtube" rel="nofollow" href="http://www.techcrunch.com/get-youtube-movie/" target="_blank" title="Download video from youtube">download it from you</a> <img src='http://www.Iterationstation.com/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' /> </p>
<p class="yt-static" style="margin-top: 0pt">There is certainly the appeal to quality.  Quite a bit of YouTube&#8217;s copyrighted content is not at the quality level of a purchased product.  Still, are those people willing to listen to a pirated video unwilling to download a high quality mp3 from a file sharing site?</p>
<p class="yt-static" style="margin-top: 0pt">I applaud YouTube for putting forth this innovative monetization strategy.  Yet, the irony of the approach is not lost on the Iteration Station.</p>
<p class="yt-static" style="margin-top: 0pt">
<p class="yt-static" style="margin-top: 0pt">
<img src="http://feeds.feedburner.com/~r/TheIterationStation/~4/8tUXvo6j9eg" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://www.Iterationstation.com/2009/01/26/youtube-cashes-in-on-rampant-piracy/feed/</wfw:commentRss>
		<feedburner:origLink>http://www.Iterationstation.com/2009/01/26/youtube-cashes-in-on-rampant-piracy/</feedburner:origLink></item>
		<item>
		<title>Sorting Data in SAS with Proc Sort</title>
		<link>http://feedproxy.google.com/~r/TheIterationStation/~3/LUvZD6Be6P8/</link>
		<comments>http://www.Iterationstation.com/2009/01/18/sorting-data-in-sas-with-proc-sort/#comments</comments>
		<pubDate>Sun, 18 Jan 2009 19:54:52 +0000</pubDate>
		<dc:creator>bkloss</dc:creator>
		
		<category><![CDATA[SAS]]></category>

		<category><![CDATA[ascending]]></category>

		<category><![CDATA[by=]]></category>

		<category><![CDATA[descending]]></category>

		<category><![CDATA[Proc sort]]></category>

		<category><![CDATA[SAS NODUPKEY]]></category>

		<guid isPermaLink="false">http://www.Iterationstation.com/2009/01/18/sorting-data-in-sas-with-proc-sort/</guid>
		<description><![CDATA[The SAS Sort Procedure
Sorting data in SAS is an everyday task accomplished for a variety of reasons.


Certain types of reports require that data be presorted to present results accurately.
Programs that include a where statement can execute more quickly on a sorted data set.
Unlike SQL, SAS requires data sets to be sorted by a common variable [...]]]></description>
			<content:encoded><![CDATA[<h2>The SAS Sort Procedure</h2>
<p>Sorting data in SAS is an everyday task accomplished for a variety of reasons.</p>
<ul></ul>
<ol>
<li>Certain types of reports require that data be presorted to present results accurately.</li>
<li>Programs that include a where statement can execute more quickly on a sorted data set.</li>
<li>Unlike SQL, SAS requires data sets to be sorted by a common variable before a join operation can occur.</li>
</ol>
<ul></ul>
<p>These examples are just a few situations where a presorted data set is necessary or essential.  Luckily, PROC SORT is a relatively simple procedure.  The Basic Syntax is below:</p>
<p><span style="color: #000080;">Proc Sort</span> <span style="color: #0000ff;">data</span> =&lt;data set&gt;;</p>
<p><span style="color: #0000ff;">By</span> &lt;variable-1 &#8230; variable-n&gt;<span style="color: #0000ff;">;</span></p>
<p><span style="color: #000080;">Run</span> ;</p>
<p><span id="more-98"></span></p>
<p>In the code above, &lt;data set&gt; is both the input and output data set of the procedure.  Essentially, SAS replaces your old data set with the new sorted version.  If data=&lt;data set&gt; is not specified, SAS will sort the last data set in use.</p>
<h2>By Statement</h2>
<p>The <span style="color: #0000ff;">By</span> statement specifies the variables to use as sort keys.  You may include as many by variables as are contained in the data set.  If you include more than one by variable, SAS will sort the data by the first <span style="color: #0000ff;">by</span> variable, then sort the groups of observations produced from the previous sort by the second variable.  SAS automatically sorts data in ascending order, meaning it stores values from A to Z or in the case of numbers, from lowest to highest.  To sort in descending order (from Z to A), specify the keyword <span style="color: #0000ff;">Descending</span> before the variable that you would like to sort by.</p>
<p>This sample code produces the example below:</p>
<p><span style="color: #000080;">Proc Sort</span> <span style="color: #0000ff;">data</span> =example;</p>
<p><span style="color: #0000ff;">By</span> Cost <span style="color: #0000ff;">Descending</span> Class<span style="color: #0000ff;">;</span></p>
<p><span style="color: #000080;">Run</span> ;</p>
<p><a title="SAS data sorted ascending and descending" href="http://www.Iterationstation.com/wp-content/uploads/2009/01/picture-2.png" title="SAS data sorted ascending and descending"></a></p>
<p style="text-align: center"><a title="Data set sorted ascending and descending by proc sort" href="http://www.Iterationstation.com/wp-content/uploads/2009/01/picture-3.png" title="Data set sorted ascending and descending by proc sort"><img src="http://www.Iterationstation.com/wp-content/uploads/2009/01/picture-3.png" alt="Data set sorted ascending and descending by proc sort" /> </a></p>
<p>Notice how within each group of ascending cost values, the class values are listed in descending order.  Also note that the missing value for the last row is considered a low value and is placed after C.  SAS treats missing values for character and numeric data as very small values.</p>
<h2>OUT=</h2>
<p>you wanted to retain the original unsorted data while creating a new sorted data set, you can  include the <span style="color: #ff0000;"><span style="color: #0000ff;">out</span> <span style="color: #000000;">=&lt;output data set&gt;</span> <span style="color: #000000;">as shown below:</span> </span></p>
<p><span style="color: #000080;">Proc Sort</span> <span style="color: #0000ff;">data</span> =&lt;original data set&gt; <span style="color: #000000;"><span style="color: #0000ff;">out</span> =&lt;output data set&gt;</span> ;</p>
<p><span style="color: #0000ff;">By</span> Cost <span style="color: #0000ff;">Descending</span> Class<span style="color: #0000ff;">;</span></p>
<p><span style="color: #000080;">Run</span> ;</p>
<p><span style="color: #ff0000;"><span style="color: #000000;">This option saves the sorted output data set in a new location and retain the original unsorted data set.</span> </span></p>
<h2>NODUPKEY</h2>
<p>If you wanted to include only unique values in your output data set.  The NODUPKEY option will eliminate all duplicate values during the sort. Let us return to our example, this time incorporating all that we have learned.  We will sort the data set &#8216;Old&#8217; by the variables cost and descending class to produce an output data set &#8216;New&#8217;.  The code below produces the example picture:</p>
<p><span style="color: #000080;">Proc Sort</span> <span style="color: #0000ff;">data</span> =&lt;original data set&gt; <span style="color: #000000;"><span style="color: #0000ff;">out</span> =&lt;output data set&gt; <span style="color: #0000ff;">NODUPKEY</span> </span> ;</p>
<p><span style="color: #0000ff;">By</span> Cost <span style="color: #0000ff;">Descending</span> Class<span style="color: #0000ff;">;</span></p>
<p><span style="color: #000080;">Run</span> ;</p>
<p><a title="Proc Sort with NODUPKEY option" href="http://www.Iterationstation.com/wp-content/uploads/2009/01/picture-4.png" title="Proc Sort with NODUPKEY option"></a></p>
<p style="text-align: center"><a title="Proc Sort with NODUPKEY option" href="http://www.Iterationstation.com/wp-content/uploads/2009/01/picture-4.png" title="Proc Sort with NODUPKEY option"><img src="http://www.Iterationstation.com/wp-content/uploads/2009/01/picture-4.png" alt="Proc Sort with NODUPKEY option" /> </a></p>
<p>Notice how the duplicate observation (43, D) from the previous picture has been removed.  A message was printed to the SAS log indicating that the duplicate record was removed:</p>
<p><a title="SAS log message" href="http://www.Iterationstation.com/wp-content/uploads/2009/01/picture-5.png" title="SAS log message"><img src="http://www.Iterationstation.com/wp-content/uploads/2009/01/picture-5.png" alt="SAS log message" width="643" height="31" /> </a></p>
<p>That wrap things up for the first entry of the SAS tip series.  Please feel free to leave comments if you have anything to add about the Proc Sort procedure.</p>
<img src="http://feeds.feedburner.com/~r/TheIterationStation/~4/LUvZD6Be6P8" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://www.Iterationstation.com/2009/01/18/sorting-data-in-sas-with-proc-sort/feed/</wfw:commentRss>
		<feedburner:origLink>http://www.Iterationstation.com/2009/01/18/sorting-data-in-sas-with-proc-sort/</feedburner:origLink></item>
		<item>
		<title>GeoSpatial Analytics Gallary</title>
		<link>http://feedproxy.google.com/~r/TheIterationStation/~3/iA7OnwgDVgA/</link>
		<comments>http://www.Iterationstation.com/2008/12/01/geospatial-analytics-gallary/#comments</comments>
		<pubDate>Mon, 01 Dec 2008 18:54:18 +0000</pubDate>
		<dc:creator>bkloss</dc:creator>
		
		<category><![CDATA[Geospatial Analytics]]></category>

		<category><![CDATA[ArcGIS]]></category>

		<category><![CDATA[ARCMAP]]></category>

		<category><![CDATA[Excel]]></category>

		<category><![CDATA[SAS]]></category>

		<guid isPermaLink="false">http://www.Iterationstation.com/2008/12/01/geospatial-analytics-gallary/</guid>
		<description><![CDATA[One of the most difficult, yet rewarding, parts of my job is communicating complex analysis results in a compelling, intuitive manner.   Often times, numbers are not enough.  Instead, you need a visual representation to hammer home your main points and insights to upper level stakeholders.
GeoSpatial analytics is on a meteoric rise because of its [...]]]></description>
			<content:encoded><![CDATA[<p>One of the most difficult, yet rewarding, parts of my job is communicating complex analysis results in a compelling, intuitive manner.   Often times, numbers are not enough.  Instead, you need a visual representation to hammer home your main points and insights to upper level stakeholders.</p>
<p>GeoSpatial analytics is on a meteoric rise because of its ability to do just that. Look no further than the flashy touch screen maps used by CNN during the election to convey numeric data in a sticky fashion.</p>
<p>Please enjoy a gallery of my most recent visualizations below.   Feel free to ask questions if you would like to know how any of these maps were produced. (click any image to see a slideshow).</p>
<div class="ngg-galleryoverview" id="ngg-gallery-2">


	
	<!-- Thumbnails -->
		
	<div id="ngg-image-1" class="ngg-gallery-thumbnail-box"  >
		<div class="ngg-gallery-thumbnail" >
			<a href="http://www.Iterationstation.com/wp-content/gallery/geospatial-analytics-pictures/picture-2.png" title="The RailHawk Soccer team issued a marketing survey throughout NC.  The response frequency by county is displayed." class="thickbox" rel="geospatial-analytics-pictures" >
				<img title="RailHawk.png" alt="RailHawk.png" src="http://www.Iterationstation.com/wp-content/gallery/geospatial-analytics-pictures/thumbs/thumbs_picture-2.png" width="100" height="75" />
			</a>
		</div>
	</div>
	 		
	<div id="ngg-image-2" class="ngg-gallery-thumbnail-box"  >
		<div class="ngg-gallery-thumbnail" >
			<a href="http://www.Iterationstation.com/wp-content/gallery/geospatial-analytics-pictures/HSNG_mortgage_NY.jpg" title="The Frequency of home ownership in New York is displayed.  Dark red indicates more mortgages.  Unsurprisingly, New York City has the highest frequency." class="thickbox" rel="geospatial-analytics-pictures" >
				<img title="HSNG_mortgage_NY.jpg" alt="HSNG_mortgage_NY.jpg" src="http://www.Iterationstation.com/wp-content/gallery/geospatial-analytics-pictures/thumbs/thumbs_HSNG_mortgage_NY.jpg" width="100" height="75" />
			</a>
		</div>
	</div>
	 		
	<div id="ngg-image-6" class="ngg-gallery-thumbnail-box"  >
		<div class="ngg-gallery-thumbnail" >
			<a href="http://www.Iterationstation.com/wp-content/gallery/geospatial-analytics-pictures/cotton_production.jpg" title="Cotton is an essential export for many nations around the world. It&amp;#039;s main centers of production are displayed above." class="thickbox" rel="geospatial-analytics-pictures" >
				<img title="cotton_production.jpg" alt="cotton_production.jpg" src="http://www.Iterationstation.com/wp-content/gallery/geospatial-analytics-pictures/thumbs/thumbs_cotton_production.jpg" width="100" height="75" />
			</a>
		</div>
	</div>
	 		
	<div id="ngg-image-3" class="ngg-gallery-thumbnail-box"  >
		<div class="ngg-gallery-thumbnail" >
			<a href="http://www.Iterationstation.com/wp-content/gallery/geospatial-analytics-pictures/Picture 1.png" title="The Cotton map was reproduced using ESRI&amp;#039;s SAS bridge technology as a way to experiment with their data integration platform." class="thickbox" rel="geospatial-analytics-pictures" >
				<img title="SAS_Bridge_cotton_join.png" alt="SAS_Bridge_cotton_join.png" src="http://www.Iterationstation.com/wp-content/gallery/geospatial-analytics-pictures/thumbs/thumbs_Picture 1.png" width="100" height="75" />
			</a>
		</div>
	</div>
	 		
	<div id="ngg-image-4" class="ngg-gallery-thumbnail-box"  >
		<div class="ngg-gallery-thumbnail" >
			<a href="http://www.Iterationstation.com/wp-content/gallery/geospatial-analytics-pictures/Picture 3.png" title="Area computations were used to determine the number of horses that could be sustained in each of several fields within the Petersburg, VA National Park." class="thickbox" rel="geospatial-analytics-pictures" >
				<img title="Hay_Field.png" alt="Hay_Field.png" src="http://www.Iterationstation.com/wp-content/gallery/geospatial-analytics-pictures/thumbs/thumbs_Picture 3.png" width="100" height="75" />
			</a>
		</div>
	</div>
	 		
	<div id="ngg-image-7" class="ngg-gallery-thumbnail-box"  >
		<div class="ngg-gallery-thumbnail" >
			<a href="http://www.Iterationstation.com/wp-content/gallery/geospatial-analytics-pictures/Picture 5.png" title="A similar horse field map is positioned within satellite imagery of its urban surroundings." class="thickbox" rel="geospatial-analytics-pictures" >
				<img title="Hay_field_city.png" alt="Hay_field_city.png" src="http://www.Iterationstation.com/wp-content/gallery/geospatial-analytics-pictures/thumbs/thumbs_Picture 5.png" width="100" height="75" />
			</a>
		</div>
	</div>
	 		
	<div id="ngg-image-5" class="ngg-gallery-thumbnail-box"  >
		<div class="ngg-gallery-thumbnail" >
			<a href="http://www.Iterationstation.com/wp-content/gallery/geospatial-analytics-pictures/Pennathur.jpg" title="During a six week span in late summer of 2001, dengue fever spread through Pennathur,
India. The Disease spread from its epicenter in the east.  The color coding represents hot-spot clustering during the outbreak." class="thickbox" rel="geospatial-analytics-pictures" >
				<img title="Pennathur_Disease.jpg" alt="Pennathur_Disease.jpg" src="http://www.Iterationstation.com/wp-content/gallery/geospatial-analytics-pictures/thumbs/thumbs_Pennathur.jpg" width="100" height="75" />
			</a>
		</div>
	</div>
	 		
	<div id="ngg-image-8" class="ngg-gallery-thumbnail-box"  >
		<div class="ngg-gallery-thumbnail" >
			<a href="http://www.Iterationstation.com/wp-content/gallery/geospatial-analytics-pictures/Picture 8.png" title="The conservation valuation for 41 NC counties is displayed.  Darker greens indicate more valuable property." class="thickbox" rel="geospatial-analytics-pictures" >
				<img title="Conservation_map.png" alt="Conservation_map.png" src="http://www.Iterationstation.com/wp-content/gallery/geospatial-analytics-pictures/thumbs/thumbs_Picture 8.png" width="100" height="75" />
			</a>
		</div>
	</div>
	 	 	
	<!-- Pagination -->
 	<div class='ngg-clear'></div>
 	
</div>

<h3><u><br />
Tools Used </u></h3>
<p>ArcGIS, SAS/Base, Excel, SAS Bridge</p>
<img src="http://feeds.feedburner.com/~r/TheIterationStation/~4/iA7OnwgDVgA" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://www.Iterationstation.com/2008/12/01/geospatial-analytics-gallary/feed/</wfw:commentRss>
		<feedburner:origLink>http://www.Iterationstation.com/2008/12/01/geospatial-analytics-gallary/</feedburner:origLink></item>
	</channel>
</rss><!-- Dynamic page generated in 13.336 seconds. --><!-- Cached page generated by WP-Super-Cache on 2009-07-05 10:03:50 -->
