<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" media="screen" href="/~d/styles/rss2full.xsl"?><?xml-stylesheet type="text/css" media="screen" href="http://feeds.feedburner.com/~d/styles/itemcontent.css"?><rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:feedburner="http://rssnamespace.org/feedburner/ext/1.0" version="2.0"><channel><description>The world is drowning in data.  Learn to swim in it.

This blog is a dive into the world of data science and analytics. A look at the fascinating things people are doing with data to remake the world in exciting new ways.

Curated by Ed Goodwin.</description><title>Data Dive</title><generator>Tumblr (3.0; @datadive)</generator><link>http://datadive.tumblr.com/</link><atom10:link xmlns:atom10="http://www.w3.org/2005/Atom" rel="self" type="application/rss+xml" href="http://feeds.feedburner.com/tumblr/ddFeed" /><feedburner:info uri="tumblr/ddfeed" /><atom10:link xmlns:atom10="http://www.w3.org/2005/Atom" rel="hub" href="http://pubsubhubbub.appspot.com/" /><atom10:link xmlns:atom10="http://www.w3.org/2005/Atom" rel="hub" href="http://tumblr.superfeedr.com/" /><item><title>Soda vs. Pop vs. Coke on Twitter</title><description>&lt;a href="http://flowingdata.com/2012/07/09/soda-versus-pop-on-twitter/"&gt;Soda vs. Pop vs. Coke on Twitter&lt;/a&gt;: &lt;p&gt;&lt;img src="http://flowingdata.com/wp-content/uploads/2012/07/Soda-vs-pop-on-Twitter-625x336.png" size="500"/&gt;&lt;/p&gt;

&lt;p&gt;via &lt;a href="http://flowingdata.com/2012/07/09/soda-versus-pop-on-twitter/" target="_blank"&gt;Nathan Yau at flowingdata&lt;/a&gt; we find Edwin Chen using twitter to gain insights into current cultural trends, we see that “Coke” is the dominant phrase in ROW (Rest Of World), but has limited penetration in the US.  This makes sense, since Coke dominates the international markets with 50%+ market share versus the 20%+ that Pepsico has.&lt;/p&gt;&lt;img src="http://feeds.feedburner.com/~r/tumblr/ddFeed/~4/ydjodOenqmA" height="1" width="1"/&gt;</description><link>http://feedproxy.google.com/~r/tumblr/ddFeed/~3/ydjodOenqmA/27349745798</link><guid isPermaLink="false">http://datadive.tumblr.com/post/27349745798</guid><pubDate>Mon, 16 Jul 2012 14:47:00 -0500</pubDate><category>soda</category><category>twitter</category><category>flowingdata</category><feedburner:origLink>http://datadive.tumblr.com/post/27349745798</feedburner:origLink></item><item><title>Venture Capital Returns by Vintage</title><description>&lt;p&gt;&lt;img src="http://www.iaventures.com/wp-content/uploads/2011/06/calpers_plot1.png" size="500"/&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="http://www.drewconway.com/zia/" target="_blank"&gt;Drew Conway&amp;#8217;s blog&lt;/a&gt; is worth following if you&amp;#8217;re interested in the intersection of politics and data science.  But I came across this &lt;a href="http://www.iaventures.com/venture-and-pe-fund-performance-by-vintage-year" target="_blank"&gt;old post he did of VC returns by vintage for a publicly released Calpers data set&lt;/a&gt; and thought I would link to it here.&lt;/p&gt;&lt;img src="http://feeds.feedburner.com/~r/tumblr/ddFeed/~4/mKVFuTcdGYw" height="1" width="1"/&gt;</description><link>http://feedproxy.google.com/~r/tumblr/ddFeed/~3/mKVFuTcdGYw/27128403261</link><guid isPermaLink="false">http://datadive.tumblr.com/post/27128403261</guid><pubDate>Fri, 13 Jul 2012 11:07:00 -0500</pubDate><category>VC</category><category>dataviz</category><feedburner:origLink>http://datadive.tumblr.com/post/27128403261</feedburner:origLink></item><item><title>Exploring the Tour de France with R and ggplot2</title><description>&lt;a href="http://www.statisticalskier.com/tag/tour-de-france/"&gt;Exploring the Tour de France with R and ggplot2&lt;/a&gt;: &lt;p&gt;&lt;img src="http://www.statisticalskier.com/wp-content/uploads/2012/07/stage9_yr.png" width="500"/&gt;&lt;/p&gt;

&lt;p&gt;Some stunning visualizations of the Tour de France.&lt;/p&gt;&lt;img src="http://feeds.feedburner.com/~r/tumblr/ddFeed/~4/TwpbTQvpzlM" height="1" width="1"/&gt;</description><link>http://feedproxy.google.com/~r/tumblr/ddFeed/~3/TwpbTQvpzlM/27088972081</link><guid isPermaLink="false">http://datadive.tumblr.com/post/27088972081</guid><pubDate>Thu, 12 Jul 2012 19:57:00 -0500</pubDate><category>ggplot</category><category>rstat</category><category>tourdefrance</category><feedburner:origLink>http://datadive.tumblr.com/post/27088972081</feedburner:origLink></item><item><title>The Banking Industry Returns Thru the Financial Crisis</title><description>&lt;p&gt;I recently took some high level data from the companies covered in the
Value Line Investment Survey (both the Standard edition and the Small
and Mid-Cap edition) and decided to plot it out to see how bank
valuations have flexed over the last several years.   All data is
courtesy of &lt;a href="http://valueline.com" target="_blank"&gt;ValueLine&lt;/a&gt; (from 2003-2011) and all plots were made
with &lt;a href="http://had.co.nz/ggplot2/" target="_blank"&gt;ggplot&lt;/a&gt; and
&lt;a href="http://r-project.org" target="_blank"&gt;R&lt;/a&gt;.  This isn&amp;#8217;t the most rigorous of analysis,
but it was pretty useful for some projects I&amp;#8217;m working on. Since this
only analyzes banks that are &lt;em&gt;currently&lt;/em&gt; in the survey, it suffers
from survivorship bias.  Given the state of the banking sector today
and its recent history, this is an issue.&lt;/p&gt;

&lt;p&gt;The first graph charts Price to Book versus Returns on Equity for all
the banks in the survey.  I&amp;#8217;ve stratified out the banks by large cap,
midcap, and smallcap.&lt;/p&gt;

&lt;p&gt;&lt;img src="http://farm8.staticflickr.com/7138/7558973024_d23f257b1f_c.jpg" alt="PB vs ROE"/&gt;&lt;/p&gt;

&lt;p&gt;You can see that there are a large number of companies that trade
above or below the trend line, but the trend is pretty strong
regardless of market cap size.&lt;/p&gt;

&lt;p&gt;Next we look at Price to Book versus Returns on Equity by year.&lt;/p&gt;

&lt;p&gt;&lt;img src="http://farm8.staticflickr.com/7127/7558972616_7a95b5f5e7_b.jpg" alt="PB vs ROE by Year"/&gt;&lt;/p&gt;

&lt;p&gt;Again, the trend is very apparent, but the slope of the effect is
 diminished as valuations get compressed post-crisis.  You can see that as we went through the financial crisis, the
 market began paying less of a premium for superior performance. As
 investors became less trusting of bank balance sheets, they started
 discounting the price.  The idea of paying 4x book value for a bank
 is laughable today, but in 2003 or 2004 seemed entirely rational.&lt;/p&gt;

&lt;p&gt;The next chart is a histogram of Price to Book by year.&lt;/p&gt;

&lt;p&gt;&lt;img src="http://farm8.staticflickr.com/7117/7558972948_cf017c9f3b_c.jpg" alt="PB by Year"/&gt;&lt;/p&gt;

&lt;p&gt;This gives you a good sense of the intense clustering of value in the
banking industry.  In short, regardless of capabilities, most banks
trade the same today on a multiple basis.  This could mean there are opportunities for
savvy investors to play a long-short strategy effectively here.&lt;/p&gt;

&lt;p&gt;As further evidence of this, check out a boxplot of Price to Book
values by year, stratified out by small, mid, and large cap.  Not only
have the standard deviations compressed, but so have the number of outliers.&lt;/p&gt;

&lt;p&gt;&lt;img src="http://farm8.staticflickr.com/7110/7558972752_45592605a2_c.jpg" alt="PB Boxplot by Year and Market Cap"/&gt;&lt;/p&gt;

&lt;p&gt;Finally, let&amp;#8217;s see how much ROEs have flexed through the crisis.&lt;/p&gt;

&lt;p&gt;&lt;img src="http://farm8.staticflickr.com/7252/7558972806_910942c5bc_c.jpg" alt="ROE vs MktCap by Year"/&gt;&lt;/p&gt;

&lt;p&gt;Couple of interesting points to make here.  First, large caps seem to
have a better ROE performance than midcaps and smallcaps as evidenced
by the tighter variation and higher median.  ROEs ranged in the
mid-teens for all banks in the 2003-2005 range and then large caps
really outperformed in 2006.  2007 was the year when write-downs and
capital impairments began to surface and you see the deviations of all
market cap sizes explode and by 2008 you begin to see the large cap
banks coming to grip with reality before everyone else and take huge
writedowns.  The 2009-2011 period looks like the beginning of a return
to normal performance.  If you believe in the loan quality of the
banks, they would be a steal here.  Obviously, the Street doesn&amp;#8217;t
agree with that assessment.&lt;/p&gt;&lt;img src="http://feeds.feedburner.com/~r/tumblr/ddFeed/~4/P2L2HZJVEbI" height="1" width="1"/&gt;</description><link>http://feedproxy.google.com/~r/tumblr/ddFeed/~3/P2L2HZJVEbI/27087507957</link><guid isPermaLink="false">http://datadive.tumblr.com/post/27087507957</guid><pubDate>Thu, 12 Jul 2012 19:35:00 -0500</pubDate><category>ggplot</category><category>rstat</category><category>banks</category><feedburner:origLink>http://datadive.tumblr.com/post/27087507957</feedburner:origLink></item><item><title>Analysis of the court strength of the Miami Heat vs. the OC Thunder</title><description>&lt;a href="http://www.nytimes.com/interactive/2012/06/11/sports/basketball/nba-shot-analysis.html"&gt;Analysis of the court strength of the Miami Heat vs. the OC Thunder&lt;/a&gt;: &lt;p&gt;Via the New York Times &lt;a href="http://www.nytimes.com/interactive/2012/06/11/sports/basketball/nba-shot-analysis.html" target="_blank"&gt;comes this stunning visual analysis of the courtside strengths of the respective teams&lt;/a&gt;.  I think what’s most amazing is how close the teams are statistically (47% accuracy for Heat vs. 47.1% accuracy for the Thunder) on a team wide basis.  However, when you start diving into the stats you see that the Heat plays the midrange more than the Thunder.&lt;/p&gt;

&lt;p&gt;Also, Durant and Westbrook have taken almost 50% of the shots for the Thunder. The Heat seems more well-rounded as a team, offensively, though LeBron has taken almost 20% of this season’s shots.  Given his 53% accuracy, this makes sense.&lt;/p&gt;&lt;img src="http://feeds.feedburner.com/~r/tumblr/ddFeed/~4/_JIZdH3P_1I" height="1" width="1"/&gt;</description><link>http://feedproxy.google.com/~r/tumblr/ddFeed/~3/_JIZdH3P_1I/24974929585</link><guid isPermaLink="false">http://datadive.tumblr.com/post/24974929585</guid><pubDate>Tue, 12 Jun 2012 15:57:16 -0500</pubDate><category>dataviz</category><category>rstats</category><feedburner:origLink>http://datadive.tumblr.com/post/24974929585</feedburner:origLink></item></channel></rss>
