<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" media="screen" href="/~d/styles/atom10full.xsl"?><?xml-stylesheet type="text/css" media="screen" href="http://feeds.feedburner.com/~d/styles/itemcontent.css"?><feed xmlns="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:thr="http://purl.org/syndication/thread/1.0" xmlns:feedburner="http://rssnamespace.org/feedburner/ext/1.0">
    <title>Datawocky</title>
    
    <link rel="hub" href="http://hubbub.api.typepad.com/" />
    <link rel="alternate" type="text/html" href="http://anand.typepad.com/datawocky/" />
    <id>tag:typepad.com,2003:weblog-294063</id>
    <updated>2009-04-15T19:02:00-07:00</updated>
    <subtitle>On Teasing Patterns from Data, with Applications to Search, Social Media, and Advertising</subtitle>
    <generator uri="http://www.typepad.com/">TypePad</generator>
    <atom10:link xmlns:atom10="http://www.w3.org/2005/Atom" rel="self" type="application/atom+xml" href="http://feeds.feedburner.com/Datawocky" /><feedburner:emailServiceId>Datawocky</feedburner:emailServiceId><feedburner:feedburnerHostname>http://feedburner.google.com</feedburner:feedburnerHostname><atom10:link xmlns:atom10="http://www.w3.org/2005/Atom" rel="hub" href="http://pubsubhubbub.appspot.com" /><entry>
        <title>Creating a Culture of Innovation: Why 20% Time is not Enough</title>
        <link rel="alternate" type="text/html" href="http://feedproxy.google.com/~r/Datawocky/~3/jLutROajikM/creating-a-culture-of-innovation-why-20-time-is-not-enough.html" />
        <link rel="replies" type="text/html" href="http://anand.typepad.com/datawocky/2009/04/creating-a-culture-of-innovation-why-20-time-is-not-enough.html" thr:count="12" thr:updated="2009-06-22T09:44:47-07:00" />
        <id>tag:typepad.com,2003:post-65524673</id>
        <published>2009-04-15T19:02:00-07:00</published>
        <updated>2009-04-15T19:02:00-07:00</updated>
        <summary>Google has garnered a lot of attention and some success with its "20% time" idea, which enables every engineer to spend one day a week working on projects that don't fit in their job description. In my observation, just announcing...</summary>
        <author>
            <name>anand_rajaraman</name>
        </author>
        <category scheme="http://www.sixapart.com/ns/types#category" term="Entrepreneurship: views from the trenches" />
        <category scheme="http://www.sixapart.com/ns/types#category" term="kosmix" />
        
        
<content type="html" xml:lang="en-US" xml:base="http://anand.typepad.com/datawocky/">
&lt;div xmlns="http://www.w3.org/1999/xhtml"&gt;&lt;p&gt;Google has garnered a lot of attention and some success with its &lt;a title="&amp;quot;20% time&amp;quot; idea" href="http://googleblog.blogspot.com/2006/05/googles-20-percent-time-in-action.html" id="y977"&gt;"20% time" idea&lt;/a&gt;,
which enables every engineer to spend one day a week working on
projects that don't fit in their job description. In my observation,
just announcing that every engineer is expected to spend a certain
fraction of their time on innovative ideas won't magically lead to
innovation. Plus, it's very hard to implement the 20% time model at a
startup: most startups just don't have the luxury of 20% excess
engineering capacity. &lt;/p&gt;&lt;p&gt;At my (startup) company &lt;a title="Kosmix" href="http://www.kosmix.com" id="ap9i"&gt;Kosmix&lt;/a&gt;, we take a somewhat different approach to create a culture of innovation, which I described to Taylor Buley of Forbes in a &lt;a title="recent video interview" href="http://video.forbes.com/fvn/cio/technically-chaotic" id="hz1_"&gt;recent video interview&lt;/a&gt;. I think the video is terrific, and
encourage you to watch it (it's also embedded at the bottom of this post), but there's only so much that can be said in
a 90-second video. So I collected together some of my thoughts into
this blog post.&lt;/p&gt;At Kosmix, we don't specify a set fraction of time for people to spend
on new ideas. Instead, we have focused on creating a culture that
engenders new ideas and rewards innovators, encouraging them to tackle
new projects &lt;strong&gt;above and beyond&lt;/strong&gt; their 100% contribution to mainline
company execution. The three key building blocks that we've used to create a culture of innovation at Kosmix are &lt;strong&gt;Team&lt;/strong&gt;, &lt;strong&gt;Environment&lt;/strong&gt;, and &lt;strong&gt;Incentives&lt;/strong&gt;.&lt;strong&gt;&lt;br&gt;&lt;br&gt;Team. &lt;/strong&gt;It always starts with the people. At Kosmix we are
fortunate to have a team of rock-star Computer Science graduates from
top universities; it's hard to throw a brick without hitting a PhD.
Since CS skills are taken for granted, the interview process emphasizes
creativity, problem-solving skills, and teamwork skills.&lt;br&gt;&lt;br&gt;Very importantly, many Kosmixers are multi-dimensional people with
interests and passions that extend well beyond work. For example, one
of our Operations gurus has a deep interesting in (hold your breath)
knitting, and runs knitting classes at work (they're called Knitting
Knights). Our office manager also happens to teach Art History. &lt;strong&gt;&lt;br&gt;&lt;br&gt;Environment.&lt;/strong&gt; There's something about the graduate school
environment that seems to bring out great ideas. Many of the great
technology companies (e.g.,Yahoo and Google) have been created by
graduate students. We have strived to maintain a grad school
environment at Kosmix. Wall around and you'll hear plenty of heated
hallway discussions and intellectual free for alls; nerf gun fights
erupt over details of relevance algorithms. &lt;br&gt;&lt;br&gt;When I was a grad student, I used to get ideas for whole new lines of
research by attending talks by other students and faculty. The &lt;a title="Infolab" href="http://infolab.stanford.edu/" id="alxy"&gt;Infolab&lt;/a&gt;,
the research group I was part of Stanford, has a tradition of Friday
lunches where a student leads a discussion on their ongoing work. We
have copied this model at Kosmix: every Friday, we have a communal
lunch gathering, and a Kosmixer leads a discussion -- either on
something cool they've been working on, or on some topic that's just
cool but completely unrelated to Kosmix -- such as muscle cars,
alternative fuels, or astronomy.&lt;strong&gt;&lt;br&gt;&lt;br&gt;Incentives.&lt;/strong&gt; Given the right environment, the next piece is
incentives for people to go above and beyond the call of duty. At
Kosmix the biggest reward is peer recognition through a system of
awards:&lt;br&gt;
&lt;ul&gt;
&lt;li&gt;The &lt;em&gt;Kosmix Kreed &lt;/em&gt;award is peer recognition at its purest.
Any Kosmixer can nominate any other for doing something interesting and
inventive that helps Kosmix users, or for going out of their way to
help out another team or person on working on a different project.
Giving this award is as easy as sending an email to HR, with a clear
description of the achievement that merits the award.&lt;br&gt;
 &lt;/li&gt;
&lt;li&gt;The &lt;em&gt;Just Do It!&lt;/em&gt; award is given by management, and
recognizes an individual who did a substantial project that goes above
and beyond their job description. We stole this idea from Amazon.com,
where some of us used to work. For example, one recent awardee dreamt
up, designed, and implemented the feature that allows users to
customize the Kosmix homepage, without any directive from management. Another implemented the ability to edit any topic page on Kosmix.&lt;/li&gt;
&lt;/ul&gt;
We also have other awards that recognize teams that execute really well
on their core priorities. These awards are read out at monthly company
meetings, to warm applause in front of the entire Kosmix team. Each
award is also posted on the internal Kosmix blog, which is read by
everyone at Kosmix. Awards carry nominal prizes, such as gift
certificates; but the real prize is the peer recognition, which acts as
a terrific incentive in a high-octane team.&lt;br&gt;&lt;br&gt;One of the big successes of the Kosmix culture of innovation has been &lt;a title="Meehive" href="http://www.meehive.com" id="cvm4"&gt;Meehive&lt;/a&gt;.
A while back, a Kosmix developer thought it would be cool to take
Kosmix's core categorization technology and apply it to the problem of
filtering news and blogs. He worked on it for a bit to create a first
version, which convinced management that this was important enough to
create a full team around. We then staffed an official Kosmix project
to create Meehive, a personalized newspaper, which we launched last
month. You can specify your interests very easily (I have over 40,
including technology and cricket), and Meehive scours thousands of
newspapers and millions of blogs to create your own personalized
newspaper. Early adopters love Meehive; I now use it as my main source
of news every day. Check out &lt;a title="what people are saying about Meehive" href="http://search.twitter.com/search?q=meehive" id="nf5o"&gt;what people are saying about Meehive&lt;/a&gt; on Twitter.&lt;br&gt;&lt;br&gt;Oh, and by the way, the most recent Just Do It! Award went to a
developer on the Meehive team who took it upon himself to create the
Meehive iPhone app. It's now rising in popularity among News
applications in the app store, and has been a bigger success than any
of us imagined. Best of all, no told the developer to do it.&lt;br&gt;
&lt;br&gt;
&lt;p&gt;
&lt;iframe src='http://www.forbes.com/video/embed/embed.html?show=77&amp;format=frame&amp;height=496&amp;width=336&amp;video=fvn/cio/technically-chaotic&amp;mode=render' width='336px' height='496px' frameborder='0' scrolling='no' marginwidth='0' marginheight='0'&gt;&lt;/iframe&gt;&lt;/div&gt;
&lt;div class="feedflare"&gt;
&lt;a href="http://feeds.feedburner.com/~ff/Datawocky?a=jLutROajikM:WFI5XBypTM0:V_sGLiPBpWU"&gt;&lt;img src="http://feeds.feedburner.com/~ff/Datawocky?i=jLutROajikM:WFI5XBypTM0:V_sGLiPBpWU" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/Datawocky?a=jLutROajikM:WFI5XBypTM0:cGdyc7Q-1BI"&gt;&lt;img src="http://feeds.feedburner.com/~ff/Datawocky?d=cGdyc7Q-1BI" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/Datawocky?a=jLutROajikM:WFI5XBypTM0:gIN9vFwOqvQ"&gt;&lt;img src="http://feeds.feedburner.com/~ff/Datawocky?i=jLutROajikM:WFI5XBypTM0:gIN9vFwOqvQ" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/Datawocky?a=jLutROajikM:WFI5XBypTM0:F7zBnMyn0Lo"&gt;&lt;img src="http://feeds.feedburner.com/~ff/Datawocky?i=jLutROajikM:WFI5XBypTM0:F7zBnMyn0Lo" border="0"&gt;&lt;/img&gt;&lt;/a&gt;
&lt;/div&gt;&lt;img src="http://feeds.feedburner.com/~r/Datawocky/~4/jLutROajikM" height="1" width="1"/&gt;</content>


    <feedburner:origLink>http://anand.typepad.com/datawocky/2009/04/creating-a-culture-of-innovation-why-20-time-is-not-enough.html</feedburner:origLink></entry>
    <entry>
        <title>Reboot: How to Reinvent a Technology Startup </title>
        <link rel="alternate" type="text/html" href="http://feedproxy.google.com/~r/Datawocky/~3/zhUIPoW7oTs/reboot-how-to-reinvent-a-technology-startup-.html" />
        <link rel="replies" type="text/html" href="http://anand.typepad.com/datawocky/2009/02/reboot-how-to-reinvent-a-technology-startup-.html" thr:count="7" thr:updated="2009-03-14T00:20:52-07:00" />
        <id>tag:typepad.com,2003:post-63299651</id>
        <published>2009-02-24T13:34:31-08:00</published>
        <updated>2009-02-24T17:24:45-08:00</updated>
        <summary>Three years ago, Odeo was a struggling startup on a path to nowhere. Odeo's core offering--a set of tools for users to create, record and share podcasts--was facing serious competition from Apple and other heavyweights. The management team made a...</summary>
        <author>
            <name>anand_rajaraman</name>
        </author>
        <category scheme="http://www.sixapart.com/ns/types#category" term="Entrepreneurship: views from the trenches" />
        <category scheme="http://www.sixapart.com/ns/types#category" term="Venture Capital" />
        
        
<content type="html" xml:lang="en-US" xml:base="http://anand.typepad.com/datawocky/">
&lt;div xmlns="http://www.w3.org/1999/xhtml"&gt;&lt;o:smarttagtype name="country-region" namespaceuri="urn:schemas-microsoft-com:office:smarttags"&gt;&lt;/o:smarttagtype&gt;&lt;o:smarttagtype name="place" namespaceuri="urn:schemas-microsoft-com:office:smarttags"&gt;&lt;/o:smarttagtype&gt;

&lt;p class="MsoNormal"&gt;Three years ago, &lt;a href="http://en.wikipedia.org/wiki/Odeo"&gt;Odeo&lt;/a&gt;
was a struggling startup on a path to nowhere. Odeo&amp;#39;s core offering--a set of
tools for users to create, record and share podcasts--was facing serious
competition from Apple and other heavyweights. The management team made a
radical decision to &amp;quot;reboot&amp;quot; the company, and Twitter was born.&lt;br /&gt;
&lt;br /&gt;
As I read the &lt;a href="http://www.140characters.com/2009/01/30/how-twitter-was-born/" title="Twitter story"&gt;Twitter story&lt;/a&gt;, narrated eloquently by Dom Sagolla, I
can&amp;#39;t help but look back over the many startups that I&amp;#39;ve been associated with
over the past twelve years.&amp;#0160; In my various roles as a founder, an
investor, a board member, and an advisor to startups in Silicon
 Valley&lt;st1:place w:st="on"&gt;&lt;/st1:place&gt;,&amp;#0160;I&amp;#39;m constantly fascinated by the mechanics of
reinvention. Which approaches to reinvention succeed and which ones fail?&lt;br /&gt;
&lt;br /&gt;
Startups flounder for countless reasons. Perhaps the market opportunity is not
as big as imagined, or perhaps there is a mismatch between the technology and
the market. Maybe the world changed in some significant way, invalidating the
key assumptions on which the startup was based. For example, an established
company such as Google or Microsoft might enter the market. Or perhaps the
deepest recession in recent history dried up demand for the original product or
service. In these cases, the founders and management team have to ask
themselves the question: should we push ahead, assuming superior execution will
win the day against long odds? Or should we change what we&amp;#39;re doing?&amp;#0160;&lt;br /&gt;
&lt;br /&gt;
Companies that decide to reinvent need to&lt;span style="font-size: 10pt; font-family: Verdana;"&gt; &lt;/span&gt;&lt;span&gt;acknowledge
&lt;/span&gt;the bad news first: most startups fail, even the reincarated ones.&amp;#0160;
Those are just the odds. The good news is that certain approaches to reinvention
work better than others, and companies can increase their chance of success by
carefully calculating their reboot strategy.&lt;br /&gt;
&lt;br /&gt;
Every technology startup has four core components: team, technology/product,
market, and business model. Rebooting involves changing at least one of these
components, while leaving the other factors unchanged. Let us look at each
component in turn:&lt;br /&gt;
&lt;br /&gt;
1. &lt;strong&gt;Team&lt;/strong&gt;. Reinvention usually leads to changes in the team. To qualify as
a reboot rather than an entirely new company, however, there must be at least
part of the team -- and usually at least one of the founding members -- who
continues to remain with the company through the transition. In my experience,
one model that usually does &lt;em&gt;not &lt;/em&gt;work is when VC investors replace the
entire founding team with new management. I&amp;#39;ve never seen a startup with none
of its founders remaining succeed.&lt;br /&gt;
&lt;br /&gt;
2. &lt;strong&gt;Market.&lt;/strong&gt; Many startups try the most tempting option: to keep the same
technology/product and look for a new market.&amp;#0160; After all, the investment
in product development has already been made.&amp;#0160; Unfortunately, while this
approach seems the most logical, it is also the &lt;em&gt;least&lt;/em&gt; likely to succeed.
Why? The hardest part of a startup is understanding the requirements of the
market, not building the product. After the dot-com bust in 2000, many consumer
internet startups tried to reinvent themselves as enterprise technology
providers (remember &lt;a href="http://en.wikipedia.org/wiki/Chemdex"&gt;Chemdex&lt;/a&gt;?).
The startup junkyard is littered with the carcasses of dot-coms that took this
route and failed.&lt;br /&gt;
&lt;br /&gt;
3. &lt;strong&gt;Business Model.&lt;/strong&gt; A very attractive strategy is to keep the same
product and market, but change the business model. In my experience, this is
the most likely option to succeed. For example, enterprise software companies
can reinvent themselves by open-sourcing their software and providing
consulting services, or a premium version. A software vendor can reboot as a
software as a service (SaaS) provider on the Web. Consumer websites can move to
a subscription model from an advertising model, or vice versa.&lt;br /&gt;
&lt;br /&gt;
4. &lt;strong&gt;Product. &lt;/strong&gt;Another smart reinvention approach is to addressing the same
market (or a closely related one), but change the product or the business
model. This option works best when the market need is real, but the product
does not adequately address the opportunity. I&amp;#39;ve found that the key to success
is to throw away the old product completely and start from scratch, using the
hard-won learnings about the market acquired from the first iteration. In some
cases, it makes sense to move the old product to &amp;quot;maintenance mode&amp;quot;
and reassign the bulk of the team to developing the new product.&lt;br /&gt;
&lt;br /&gt;
I&amp;#39;ve applied this particular model of reinvention to both companies where I
have been a founder -- &lt;a href="http://norvig.com/junglee/"&gt;Junglee&lt;/a&gt;
in 1997 and &lt;a href="http://www.kosmix.com/"&gt;Kosmix&lt;/a&gt; ten years later, in
2007. &lt;br /&gt;
&lt;br /&gt;
We started Junglee in 1996 to create virtual databases that integrated data
from multiple websites. Although we had some initial success, we quickly
realized that the architecture of our first product limited our ability to deal
with rapidly-changing information, a key success factor in certain markets. We
completely rebuilt the product from scratch in 1997, and created the world&amp;#39;s
first comparison shopping service.&amp;#0160; This service was enormously popular
and led to &lt;a href="http://news.cnet.com/Amazon-to-buy-two-companies/2100-1001_3-214057.html"&gt;Junglee&amp;#39;s acquisition by Amazon.com&lt;/a&gt; in 1998.&lt;br /&gt;
&lt;br /&gt;
We introduced Kosmix as a vertical search engine, initially in the health
sector.&amp;#0160; Our idea was to find a better way to help users understand
open-ended queries such as &amp;quot;diabetes&amp;quot;, which have no single right
answer; that is, explore topics rather than find the &lt;a href="http://anand.typepad.com/datawocky/2008/06/searching-for-a-needle-or-exploring-the-haystack.html"&gt;needle
in the haystack&lt;/a&gt;. We&amp;#39;d planned to take a vertical-by-vertical strategy,
launching sites named RightHealth, RightAutos and RightTrips. Very soon,
however, we realized that the vertical approach carries severe limitations,
because it&amp;#39;s hard for consumers to remember to go to different sites for
different topics of interest. We decided to rewrite the product from scratch,
and we &lt;a href="http://anand.typepad.com/datawocky/2008/12/kosmix-adds-rocketfuel-to-power-voyage-of-exploration.html"&gt;relaunched
Kosmix.com&lt;/a&gt; as a horizontal site.&amp;#0160; Kosmix lets you explore any topic
and gives you a 360 degree view of anything than interests you -- including information from the &lt;a href="http://www.nytimes.com/2009/02/23/technology/internet/23search.html"&gt;Deep Web&lt;/a&gt; that is inaccessible to the usual search engines.
This transition from vertical to horizontal was much harder than it sounds; it
required us to rewrite our technology from scratch. But we did it because of
our passionate belief that the problem is real and the market opportunity is
vast.&lt;br /&gt;
&lt;br /&gt;
While most startup reboots involve rethinking only one or two of the four core
components, in some rare cases it makes sense to go the whole hog. Sometimes it
pays to be bold: go after an entirely new market opportunity, create a new
product, find a new business model, and make large-scale team changes. This
approach is fraught with risk; but there have been a couple of spectacular
successes. One clear example is Twitter. Another is Twitter&amp;#39;s cousin &lt;a href="http://anand.typepad.com/datawocky/2008/06/why-sms-gupshup-is-bigger-than-twitter.html"&gt;SMS
GupShup&lt;/a&gt;, a similar service in India&lt;st1:place w:st="on"&gt;&lt;st1:country-region w:st="on"&gt;&lt;/st1:country-region&gt;&lt;/st1:place&gt;. SMS GupShup was born as &lt;a href="http://en.wikipedia.org/wiki/Webaroo"&gt;Webaroo&lt;/a&gt;, a company that wanted
to create offline copies of large parts of the web so you could browse while
offline. A couple of engineers there launched the SMS GupShup service as a lark
and it took off; once the management team saw the traction of GupShup, they
re-oriented the company around the new idea.&lt;br /&gt;
&lt;br /&gt;
Some startups are born great: the right team starts with the right idea at the
right time, and the rest is history. Some have greatness thrust upon them: the
right conjunction of market forces propels an unlikely startup to dizzying
heights. Other startups, not so lucky as those in the first two categories,
need to earn their greatness. And sometimes that requires a reboot.&lt;/p&gt;&lt;/div&gt;
&lt;div class="feedflare"&gt;
&lt;a href="http://feeds.feedburner.com/~ff/Datawocky?a=zhUIPoW7oTs:MDmyCeOnHz8:V_sGLiPBpWU"&gt;&lt;img src="http://feeds.feedburner.com/~ff/Datawocky?i=zhUIPoW7oTs:MDmyCeOnHz8:V_sGLiPBpWU" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/Datawocky?a=zhUIPoW7oTs:MDmyCeOnHz8:cGdyc7Q-1BI"&gt;&lt;img src="http://feeds.feedburner.com/~ff/Datawocky?d=cGdyc7Q-1BI" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/Datawocky?a=zhUIPoW7oTs:MDmyCeOnHz8:gIN9vFwOqvQ"&gt;&lt;img src="http://feeds.feedburner.com/~ff/Datawocky?i=zhUIPoW7oTs:MDmyCeOnHz8:gIN9vFwOqvQ" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/Datawocky?a=zhUIPoW7oTs:MDmyCeOnHz8:F7zBnMyn0Lo"&gt;&lt;img src="http://feeds.feedburner.com/~ff/Datawocky?i=zhUIPoW7oTs:MDmyCeOnHz8:F7zBnMyn0Lo" border="0"&gt;&lt;/img&gt;&lt;/a&gt;
&lt;/div&gt;&lt;img src="http://feeds.feedburner.com/~r/Datawocky/~4/zhUIPoW7oTs" height="1" width="1"/&gt;</content>


    <feedburner:origLink>http://anand.typepad.com/datawocky/2009/02/reboot-how-to-reinvent-a-technology-startup-.html</feedburner:origLink></entry>
    <entry>
        <title>Oscar Halo: Academy Awards and the Matthew Effect</title>
        <link rel="alternate" type="text/html" href="http://feedproxy.google.com/~r/Datawocky/~3/x9Q2YOO1MCo/oscar-halo-academy-awards-and-the-matthew-effect.html" />
        <link rel="replies" type="text/html" href="http://anand.typepad.com/datawocky/2009/02/oscar-halo-academy-awards-and-the-matthew-effect.html" thr:count="13" thr:updated="2009-02-24T13:20:51-08:00" />
        <id>tag:typepad.com,2003:post-63177423</id>
        <published>2009-02-21T17:15:54-08:00</published>
        <updated>2009-02-22T23:14:43-08:00</updated>
        <summary>Slumdog Millionaire is one my favorite movies of all time. And I have followed the career of A.R. Rahman, who composed the movie's music, for several years ever since his debut in 1992. So I was quite thrilled when Slumdog...</summary>
        <author>
            <name>anand_rajaraman</name>
        </author>
        <category scheme="http://www.sixapart.com/ns/types#category" term="Data Mining" />
        
        
<content type="html" xml:lang="en-US" xml:base="http://anand.typepad.com/datawocky/">&lt;div xmlns="http://www.w3.org/1999/xhtml"&gt;&lt;p&gt;&lt;a href="http://www.kosmix.com/topic/slumdog_millionaire" id="amhg" title="Slumdog Millionaire"&gt;Slumdog Millionaire&lt;/a&gt; is one my favorite movies of all time. And I have followed the career of &lt;a href="http://www.kosmix.com/topic/A._R._Rahman" id="i1-j" title="A.R. Rahman"&gt;A.R. Rahman&lt;/a&gt;,&#xD;
who composed the movie's music, for several years ever since his debut&#xD;
in 1992. So I was quite thrilled when Slumdog was nominated for 10&#xD;
academy awards -- and Rahman in two categories, Original Score and&#xD;
Original Song. Thrilled, and a little surprised: while I like Rahman's&#xD;
work in Slumdog, I don't think it's his best work. There is of course&#xD;
nothing wrong with that, as long as Rahman's work is better than that&#xD;
of his competitors this year.&lt;/p&gt;&lt;p&gt;But it got me to thinking: if&#xD;
Rahman had composed the same music for an obscure film this year,&#xD;
rather than for Slumdog Millionaire, would he have been nominated? And&#xD;
even if he had been nominated, what are his chances of winning? In&#xD;
other words, is there a &lt;a href="http://en.wikipedia.org/wiki/Matthew_effect" id="vqs_" title="Matthew Effect"&gt;Matthew Effect&lt;/a&gt; &#xD;
in Oscar nominations -- to them that have, more shall be given? And, once&#xD;
nominated, is there a halo surrounding movies with many nominations&#xD;
that improves the odds of winning across many award categories? I&#xD;
thought it might be fun to run the numbers based on past years'&#xD;
nominees and winners to see if I could find answers to these questions;&#xD;
it turned out to be somewhat instructive as well, since it required an&#xD;
extension of the standard &lt;a href="http://stanford.edu/class/cs345a/assoc-rules1.pdf" id="ibuu" title="Market Basket analysis"&gt;Market Basket analysis&lt;/a&gt; from the world of data mining.&lt;/p&gt;&lt;p&gt;To get the data, I went straight to the source: the &lt;a href="http://awardsdatabase.oscars.org/ampas_awards/BasicSearchInput.jsp" id="e8mf" title="official Academy Awards database"&gt;official Academy Awards database&lt;/a&gt; ,&#xD;
which lists all the nominations and winners for the past 80 years.&#xD;
Unfortunately there is not a single page that lists all this&#xD;
information, but it was fairly straightforward to write python scripts&#xD;
that queried the website a few times and collated the data in tabular&#xD;
form. The result: a table that lists every nomination and winner in&#xD;
every category beteen 1927 and 2007. There were 8616 nominations in the&#xD;
period, representing 4215 distinct movies; so each movie was nominated&#xD;
on average for 2 award categories. &lt;/p&gt;&lt;p&gt;&#xD;
Let's start first with the nominations, to see if there is any evidence&#xD;
of the Matthew Effect. Let's say N(k) is the number of movies with&#xD;
exactly k nominations. The table below shows k and N(k) for k between 1&#xD;
and 10. If we ignore two outliers (k=1 and k=7), it appears that&#xD;
N(k+1)/N(k) is close to 0.6 for k between 2 and 10; the decay is&#xD;
certainly much slower than exponential. This indicates that the number&#xD;
of nominations roughly follows a &lt;a href="http://www-personal.umich.edu/%7Emejn/courses/2006/cmplxsys899/powerlaws.pdf" id="dq92" title="power-law"&gt;power-law&lt;/a&gt;;&#xD;
and a power-law is the classic embodiment of of the Matthew Effect,&#xD;
arising in contexts such as income and wealth distribution.&lt;span&gt; The table below summarizes the data. &lt;/span&gt;&lt;/p&gt;&#xD;
&lt;div align="center"&gt;&lt;table border="1" cellpadding="3" cellspacing="0" class="zeroBorder" id="aq57"&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td width="50%"&gt;&lt;strong&gt;Nominations&lt;br&gt;&lt;/strong&gt;&lt;/td&gt;&lt;td width="50%"&gt;&lt;strong&gt;Movies&lt;br&gt;&lt;/strong&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td width="50%"&gt;1&lt;br&gt;&lt;/td&gt;&lt;td width="50%"&gt;2796&lt;br&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td width="50%"&gt;2&lt;br&gt;&lt;/td&gt;&lt;td width="50%"&gt;513&lt;br&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td width="50%"&gt;3&lt;br&gt;&lt;/td&gt;&lt;td width="50%"&gt;260&lt;br&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td width="50%"&gt;4&lt;br&gt;&lt;/td&gt;&lt;td width="50%"&gt;195&lt;br&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td width="50%"&gt;5&lt;br&gt;&lt;/td&gt;&lt;td width="50%"&gt;128&lt;br&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td width="50%"&gt;6&lt;br&gt;&lt;/td&gt;&lt;td width="50%"&gt;81&lt;br&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td width="50%"&gt;7&lt;br&gt;&lt;/td&gt;&lt;td width="50%"&gt;87&lt;br&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td width="50%"&gt;8&lt;br&gt;&lt;/td&gt;&lt;td width="50%"&gt;50&lt;br&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td width="50%"&gt;9&lt;br&gt;&lt;/td&gt;&lt;td width="50%"&gt;31&lt;br&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td width="50%"&gt;10&lt;br&gt;&lt;/td&gt;&lt;td width="50%"&gt;29&lt;br&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;/div&gt;&lt;p&gt;&lt;br&gt;&#xD;
The next step is to enquire whether there are Oscar categories for&#xD;
which the effect is much stronger than for other categories. To study&#xD;
this, we divide the nominated movies into two groups: movies with 4 or&#xD;
fewer nominations (the "poor" group) and movies with 5 or more&#xD;
nominations (the "rich" group). Overall, &lt;span&gt;5382 nominations, or&#xD;
62.5%, went to movies in the poor group and 3234 nominations, or 37.5%,&#xD;
went to movies in the rich group. Now, let's look at the major Oscar&#xD;
categories. The major outliers are Best Picture and Best Director -- both&#xD;
nominations went overwhelmingly to movies in the rich category (70% and&#xD;
73%, respectively, compared to the average of 37.5%). This is not&#xD;
surprising, because the best picture is typically one that is strong in&#xD;
many disciplines. There is some bias in the acting categories as well,&#xD;
but the big surprise is Film Editing: 68% of the nominations in this&#xD;
category are "rich" movies. At other extreme are Music and Special&#xD;
Effects: approximately 70% of the nominated movies are in the "poor"&#xD;
category. So it appears that in these categories at least, talent gets&#xD;
its due without help from Matthew.&lt;br&gt;&#xD;
&lt;br&gt;&#xD;
Moving from nominations to actual winners, the obvious question is:&#xD;
does being nominated in many categories boost the chances of winning in&#xD;
a disproportionate manner? To study this, I used the Market Baskets&#xD;
approach from Data Mining. In a classic Market Baskets scenario, we ask&#xD;
which items are often purchased together: such as milk and eggs. In&#xD;
this case, we model each movie as a basket: the contents of a movie's&#xD;
basket are its nominations and wins. Do movies with many nominations in&#xD;
their baskets have a disproportionate number of wins? &lt;br&gt;&lt;/span&gt;&lt;/p&gt;&lt;p&gt;&lt;span&gt;We must first deal with a technicality. In a normal&#xD;
market basket scenario, the contents of each basket are independent of&#xD;
every other basket, but in this case there are dependencies. Consider&#xD;
the set of market baskets of the movies that have all been nominated in&#xD;
a single award category in a particular year; clearly, one of these has&#xD;
to be the winner in that category, and so the basket of that movie will&#xD;
also contain a win in that category. &lt;br&gt;&#xD;
&lt;br&gt;It's easy to extend the Market Baskets model to capture&#xD;
this idea. I'll call the new model Constrained Market Baskets. Consider&#xD;
a subset S of market baskets; say, the set of market baskets&#xD;
corresponding to the "rich" movies with 5 or more nominations. Suppose&#xD;
movie M is in this set, and has been nominated in award category C. If&#xD;
there are (say) a total of 5 nominees in this category, then the prior&#xD;
probability of movie M's basket containing a win is 1/5 or 0.2. We can&#xD;
repeat this for all the categories M is nominated in, and add up the&#xD;
priors; this gives the "prior expected value" of the number of wins in&#xD;
M's basket. We add up the expected wins for all the movies in set S to&#xD;
get the total number of wins we expect the set S of movies to have;&#xD;
call this EW. Now, if OW is the actual number of "Observed Wins" across&#xD;
the movies in set S, we want to see if there is a discrepancy between&#xD;
EW and OW. In particular, we define the "win boost" of set S to be&#xD;
OW/EW. If the win boost is higher than 1, then the set S of market&#xD;
baskets has a disproportionate number of wins, and if it's much less&#xD;
than 1, then it has fewer wins than expected.&lt;br&gt;&#xD;
&lt;br&gt;&#xD;
When we do the analysis, the set of "poor" movies, with 4 or fewer nominations, had a total of &lt;/span&gt;&lt;span&gt;5382 nominations, with 1143 "expected wins" but only 840 "observed wins"; a win boost of 0.73. The "rich" movies, by contrast, &lt;/span&gt;&lt;span&gt;with 3234 nominations, were expected to win 657 Oscars but actually won 958, a win boost of 1.46. &lt;strong&gt;In&#xD;
other words: the rich movies, which represent only 37.5% of all&#xD;
nominations, actually won more than half of all the actual Oscar awards!&#xD;
&lt;/strong&gt;Matthew!&lt;br&gt;&#xD;
&lt;br&gt;&#xD;
Once again, we can break up the results by category, and look at the&#xD;
win boosts for specific categories of awards. For most major award&#xD;
categories, the win boosts for the rich and poor categories are in line&#xD;
with the overall average boosts. As in the case of nominations, the&#xD;
effect is very significant in the best picture and best director&#xD;
categories: in these categories, the "poor" movies have a win boost of&#xD;
just 0.30! We noted that the Music category seemed resilient to Matthew&#xD;
in the case of nominations; but in the case of wins, this category has&#xD;
a win boost of 1.7 for the rich movies, in line with the overall&#xD;
average. The surprising and significant outlier in this case is the&#xD;
Best Supporting Actor category, with win boosts very close to 1.0 for&#xD;
both the rich and the poor movies. It appears that the Best Supporting&#xD;
Actor award shows no evidence of Matthew; the other acting categories,&#xD;
however, are in line with the overall averages.&lt;br&gt;&#xD;
&lt;br&gt;&#xD;
I don't have a deep enough understanding of the movie industry and the&#xD;
Academy Awards process to speculate on the reasons for these effects.&#xD;
Perhaps great talent attracts other great talent, and the Awards&#xD;
reflect that reality. And perhaps the difference between the behavior&#xD;
of wins and of nominations has to do with the fact that the former uses&#xD;
simple plurality voting while the latter uses a preferential voting&#xD;
scheme. In any case, I'm happy on two counts. The statistics on the&#xD;
Music category say that the Matthew effect likely did not help Mr&#xD;
Rahman in securing his nominations; but now that he has been nominated,&#xD;
his chances of winning are greatly boosted because he is associated&#xD;
with Slumdog's 10 nominations. &lt;a href="http://www.kosmix.com/topic/jai_ho"&gt;Jai Ho!&lt;/a&gt;&lt;/span&gt;&lt;/p&gt;&lt;p&gt;&lt;strong&gt;Update:&lt;/strong&gt; A big night for Slumdog, winning 8 awards, including both the music and song awards for A. R. Rahman. While 8 awards is not the best Oscar performance ever, it is the most number of awards won by a movie with 10 nominations (the ones that won more awards had more nominations). Matthew must be pleased. &lt;/p&gt;&lt;/div&gt;&lt;div class="feedflare"&gt;
&lt;a href="http://feeds.feedburner.com/~ff/Datawocky?a=x9Q2YOO1MCo:4x1HQCirzHI:V_sGLiPBpWU"&gt;&lt;img src="http://feeds.feedburner.com/~ff/Datawocky?i=x9Q2YOO1MCo:4x1HQCirzHI:V_sGLiPBpWU" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/Datawocky?a=x9Q2YOO1MCo:4x1HQCirzHI:cGdyc7Q-1BI"&gt;&lt;img src="http://feeds.feedburner.com/~ff/Datawocky?d=cGdyc7Q-1BI" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/Datawocky?a=x9Q2YOO1MCo:4x1HQCirzHI:gIN9vFwOqvQ"&gt;&lt;img src="http://feeds.feedburner.com/~ff/Datawocky?i=x9Q2YOO1MCo:4x1HQCirzHI:gIN9vFwOqvQ" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/Datawocky?a=x9Q2YOO1MCo:4x1HQCirzHI:F7zBnMyn0Lo"&gt;&lt;img src="http://feeds.feedburner.com/~ff/Datawocky?i=x9Q2YOO1MCo:4x1HQCirzHI:F7zBnMyn0Lo" border="0"&gt;&lt;/img&gt;&lt;/a&gt;
&lt;/div&gt;&lt;img src="http://feeds.feedburner.com/~r/Datawocky/~4/x9Q2YOO1MCo" height="1" width="1"/&gt;</content>


    <feedburner:origLink>http://anand.typepad.com/datawocky/2009/02/oscar-halo-academy-awards-and-the-matthew-effect.html</feedburner:origLink></entry>
    <entry>
        <title>Kosmix Adds Rocketfuel to Power Voyage of Exploration</title>
        <link rel="alternate" type="text/html" href="http://feedproxy.google.com/~r/Datawocky/~3/j4y8gYC-_oU/kosmix-adds-rocketfuel-to-power-voyage-of-exploration.html" />
        <link rel="replies" type="text/html" href="http://anand.typepad.com/datawocky/2008/12/kosmix-adds-rocketfuel-to-power-voyage-of-exploration.html" thr:count="3" thr:updated="2008-12-29T11:08:19-08:00" />
        <id>tag:typepad.com,2003:post-59712390</id>
        <published>2008-12-08T17:34:01-08:00</published>
        <updated>2008-12-08T17:34:01-08:00</updated>
        <summary>Today I'm delighted to share some fantastic news. My company Kosmix has raised $20 million in new financing to power our growth. Even more than the amount of financing, I'm especially proud that the lead investor in this round is...</summary>
        <author>
            <name>anand_rajaraman</name>
        </author>
        <category scheme="http://www.sixapart.com/ns/types#category" term="Data Mining" />
        <category scheme="http://www.sixapart.com/ns/types#category" term="kosmix" />
        <category scheme="http://www.sixapart.com/ns/types#category" term="Search" />
        
        
<content type="html" xml:lang="en-US" xml:base="http://anand.typepad.com/datawocky/">&lt;div xmlns="http://www.w3.org/1999/xhtml"&gt;&lt;p&gt;&lt;span style="text-decoration: underline;"&gt;&lt;a href="http://anand.typepad.com/.a/6a00d83471bc3153ef01053650ea27970c-pi" style="display: inline;"&gt;&lt;img alt="Kosmix_logo_betaish" class="at-xid-6a00d83471bc3153ef01053650ea27970c " src="http://anand.typepad.com/.a/6a00d83471bc3153ef01053650ea27970c-120wi"&gt;&lt;/img&gt;&lt;/a&gt;&#xD;
 &lt;/span&gt; &lt;/p&gt;&lt;p&gt;Today I'm delighted to share some fantastic news. My company &lt;a href="http://www.kosmix.com" id="wntq" title="Kosmix"&gt;Kosmix&lt;/a&gt;&#xD;
has raised $20 million in new financing to power our growth. Even more&#xD;
than the amount of financing, I'm especially proud that the lead&#xD;
investor in this round is Time Warner, the world's largest media&#xD;
company. Our existing investors Lightspeed, Accel, and DAG participated&#xD;
in the round as well. The Kosmix team also is greatly strengthened by&#xD;
the addition of &lt;a href="http://www.iocom.com/io/advisory_zander.html" id="qljm" title="Ed Zander"&gt;Ed Zander&lt;/a&gt;&#xD;
as investor and strategic advisor. In an amazing career that spans Sun&#xD;
Microsystems and Motorola, Ed has repeatedly demonstrated leadership&#xD;
that grew good ideas into great products and businesses. His counsel&#xD;
will be invaluable as we take Kosmix to the next level as a business.&lt;/p&gt;&lt;p&gt;&#xD;
In these perilous economic times, the funding is a big vote of&#xD;
confidence in Kosmix's product and business. Kosmix web sites attract&#xD;
11 million visits every month, and we have a proven revenue model with&#xD;
significant revenues and robust growth. &lt;a href="http://www.righthealth.com" id="xgs3" title="RightHealth"&gt;RightHealth&lt;/a&gt;,&#xD;
the proof-of-concept we launched in 2007, grew with astonishing&#xD;
rapidity to become the #2 health web site in the US. These factors&#xD;
played a big role in helping us close this round of funding with a&#xD;
healthy uptick in valuation from our prior round. Together with the&#xD;
money already in the bank from our prior rounds, we now have more than&#xD;
enough runway to take the company to profitability and beyond.&lt;/p&gt;&lt;p&gt;&#xD;
A few months ago, &lt;a href="http://anand.typepad.com/datawocky/2008/06/searching-for-a-needle-or-exploring-the-haystack.html" id="s3tc" title="we put out an alpha version"&gt;we put out an alpha version&lt;/a&gt;&#xD;
of Kosmix.com. Many people used it and gave us valuable feedback; thank&#xD;
you! We listened, and made changes. Lots of changes. The result is the&#xD;
beta version of Kosmix.com, which we launched today. What's changed?&#xD;
More information sources (many thousands), huge improvements in our&#xD;
relevance algorithms, a much-improved user interface, and a completely&#xD;
new homepage. Give it a whirl and let us know what you think.&lt;/p&gt;&lt;p&gt;&#xD;
To those of you new to Kosmix, the easiest way to explain what Kosmix&#xD;
does is by analogy. Google and Yahoo are search engines; Kosmix is an &lt;em&gt;explore engine&lt;/em&gt;.&#xD;
Search engines work really well if your goal is to find a specific&#xD;
piece of information -- a train schedule, a company website, and so on.&#xD;
In other words, they are great at finding needles in the haystack. When&#xD;
you're looking for a single fact, a single definitive web page, or the&#xD;
answer to a specific question, then the needle-in-haystack search&#xD;
engine model works really well. Where it breaks down is when the&#xD;
objective is to learn about, explore, or understand a broad topic. For&#xD;
example:&lt;/p&gt;&#xD;
&lt;ul&gt;&#xD;
&lt;li&gt;&#xD;
 Looking to bake a &lt;a href="http://www.kosmix.com/topic/chocolate_cake" id="fjii" title="chocolate cake"&gt;chocolate cake&lt;/a&gt;?&#xD;
We have recipes, nutrition information, a dessert burn rate calculator,&#xD;
blog posts from chow.com, even a how-to video from Martha Stewart! &lt;/li&gt;&#xD;
&lt;li&gt;&#xD;
 Loved one diagnosed with &lt;a href="http://www.kosmix.com/topic/diabetes" id="ecgf" title="diabetes"&gt;diabetes&lt;/a&gt;?&#xD;
Doctor-reviewed guide, blood sugar and insulin pump slide shows,&#xD;
calculators and risk checkers, quizzes, alternative medications,&#xD;
community. &lt;/li&gt;&#xD;
&lt;li&gt;&#xD;
 Traveling to &lt;a href="http://www.kosmix.com/topic/san_francisco" id="obf5" title="San Francisco"&gt;San Francisco&lt;/a&gt;? Maps, hotels, events, sports teams, attractions, travel blogs, trip plans, guidebooks, videos.&#xD;
 &lt;/li&gt;&#xD;
&lt;li&gt;&#xD;
 Writing an article on &lt;a href="http://www.kosmix.com/topic/Hillary_Clinton" id="j5tm" title="Hillary Clinton"&gt;Hillary Clinton&lt;/a&gt;? Bio, news, CNN videos, personal financial assets and lawmaker stats, Wonkette posts, even satire from The Onion.&#xD;
 &lt;/li&gt;&#xD;
&lt;li&gt;&#xD;
 Into &lt;a href="http://www.kosmix.com/topic/radiohead" id="vnrb" title="Radiohead"&gt;Radiohead&lt;/a&gt;? Bio, team members, albums, tracks, music player, concert schedule, videos, similar artists, news and gossip from TMZ.&#xD;
 &lt;/li&gt;&#xD;
&lt;li&gt;&#xD;
 Follow the &lt;a href="http://www.kosmix.com/topic/san_francisco_49ers" id="x6hf" title="San Francisco 49ers"&gt;San Francisco 49ers&lt;/a&gt;?&#xD;
Players, news from Yahoo Sports and other sources, official NFL videos&#xD;
and team profiles, tickets, and the official NFL standings widget.&lt;br&gt;&#xD;
 &lt;/li&gt;&#xD;
&lt;/ul&gt;&#xD;
&lt;p&gt;&lt;br&gt;&#xD;
In the examples above, I'm especially pleased about the way Kosmix&#xD;
picks great niche sources for topics. For example, I hadn't heard about&#xD;
chow.com or known that Martha Stewart has how-to videos on her website.&#xD;
Other "gems" of this kind include Jambase, TMZ, The Onion, DailyPlate,&#xD;
MamaHerb, and Wonkette. Part of the goal of Kosmix is to bring you such&#xD;
gems: information sources or sites you have either not heard of, or&#xD;
just not thought about in the current context. &lt;/p&gt;&lt;p&gt;&#xD;
In other words: &lt;strong&gt;Google = Search + Find.&lt;/strong&gt; &lt;strong&gt;Kosmix = Explore + Browse.  &lt;/strong&gt;Browsing&#xD;
sometimes uncovers surprising connections that you might not even have&#xD;
thought about. The power of the model was brought home to me last week&#xD;
as I was traveling around in England. I'd heard a lot about Stonehenge&#xD;
and wanted to visit; so of course I went to the &lt;a href="http://www.kosmix.com/topic/Stonehenge" id="xnbn" title="Kosmix topic page on Stonehenge"&gt;Kosmix topic page on Stonehenge&lt;/a&gt;.&#xD;
In addition to the usual comprehensive overview of Stonehenge, the&#xD;
topic page showed me places to stay in Bath, Somerset (which happens to&#xD;
be the best place to stay when you're visiting Stonehenge). It also&#xD;
showed me other ancient monuments in the same area I could visit while&#xD;
I was there. Score one for serendipity.  &lt;/p&gt;&lt;p&gt;&#xD;
Some of us remember the early days of the World Wide Web: the thrill of&#xD;
just browsing around, following links, and discovering new sites that&#xD;
surprise, entertain, and sometimes even inform. We have lost some of&#xD;
that joy now with our workmanlike use of search engines for&#xD;
precision-guided information finding. We built the &lt;a href="http://www.kosmix.com" id="e9n8" title="new Kosmix homepage"&gt;new Kosmix homepage&lt;/a&gt;&#xD;
to capture some of the pleasure of aimless browsing -- exploring for&#xD;
pure pleasure. The homepage shows you the hot news, topics, videos,&#xD;
slide shows, and gossip of the moment. If you find something&#xD;
interesting you can dive right in and start browsing around that topic.&#xD;
We compile this page in the same manner as our topic pages: by&#xD;
aggregating information for many other sources and then applying a&#xD;
healthy dose of algorithms. Dig in; who knows what surprises await? &lt;/p&gt;&lt;p&gt;&#xD;
How does Kosmix work its magic? &lt;a href="http://anand.typepad.com/datawocky/2008/06/searching-for-a-needle-or-exploring-the-haystack.html" id="w9tr" title="As I wrote"&gt;As I wrote&lt;/a&gt;&#xD;
when we put out the alpha, the problem we're solving is fundamentally&#xD;
different from search, and we've taken a fundamentally different&#xD;
approach. The &lt;a href="http://anand.typepad.com/datawocky/2008/05/why-yahoo-glue.html"&gt;web has evolved&lt;/a&gt;&#xD;
from a collection of documents that neatly fit in a search engine index&#xD;
to a collection of rich interactive applications. Applications such as&#xD;
Facebook, MySpace, YouTube, and Yelp. Instead of serving results from&#xD;
an index, Kosmix builds topic pages by querying these applications and&#xD;
assembling the results on-the-fly into a 2-dimensional grid. We have&#xD;
partnered with many of the services that appear in the results pages,&#xD;
and use publicly available APIs in other cases. The secret sauce is our&#xD;
algorithmic categorization technology. Given a topic, categorization&#xD;
tells us where the topic fits in a really big taxonomy, what the&#xD;
related topics are, and so on. In turn, other algorithms use this&#xD;
information to figure out the right set of information sources for a&#xD;
topic from among the thousands we know about. And then other algorithms&#xD;
figure out how to lay the information on the page in a 2-dimensional&#xD;
grid.&lt;/p&gt;&lt;p&gt;&#xD;
While we are proud of what we have built, we know there is still a long&#xD;
way to go. And we cannot do it without your feedback. So join the USS&#xD;
Kosmix on our maiden voyage. Our mission: to explore strange newtopics; to discover surprising new connections; to boldly go where no search engine has gone before!&lt;/p&gt;&lt;p&gt;&lt;strong&gt;Update:&lt;/strong&gt; Vijay Chittoor has posted more details on the &lt;a href="http://blog.kosmix.com/uncategorized/kosmix-goes-beta-ish/"&gt;new product features&lt;/a&gt; on the Kosmix blog. Coverage on &lt;a href="http://www.techcrunch.com/2008/12/08/kosmix-raises-20-million-more-for-its-universal-search-engine/"&gt;TechCrunch&lt;/a&gt;, &lt;a href="http://gigaom.com/2008/12/08/kosmix-gets-20m/"&gt;GigaOM&lt;/a&gt;, &lt;a href="http://venturebeat.com/2008/12/08/search-site-kosmix-raises-20m-from-time-warner/"&gt;VentureBeat&lt;/a&gt;. I'm particularly pleased that Om Malik thinks &lt;a href="http://www.kosmix.com/topic/om_malik"&gt;his page on Kosmix&lt;/a&gt; is better than the bio on his site!&lt;/p&gt;&lt;/div&gt;&lt;div class="feedflare"&gt;
&lt;a href="http://feeds.feedburner.com/~ff/Datawocky?a=j4y8gYC-_oU:yLRfBHCR33Y:V_sGLiPBpWU"&gt;&lt;img src="http://feeds.feedburner.com/~ff/Datawocky?i=j4y8gYC-_oU:yLRfBHCR33Y:V_sGLiPBpWU" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/Datawocky?a=j4y8gYC-_oU:yLRfBHCR33Y:cGdyc7Q-1BI"&gt;&lt;img src="http://feeds.feedburner.com/~ff/Datawocky?d=cGdyc7Q-1BI" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/Datawocky?a=j4y8gYC-_oU:yLRfBHCR33Y:gIN9vFwOqvQ"&gt;&lt;img src="http://feeds.feedburner.com/~ff/Datawocky?i=j4y8gYC-_oU:yLRfBHCR33Y:gIN9vFwOqvQ" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/Datawocky?a=j4y8gYC-_oU:yLRfBHCR33Y:F7zBnMyn0Lo"&gt;&lt;img src="http://feeds.feedburner.com/~ff/Datawocky?i=j4y8gYC-_oU:yLRfBHCR33Y:F7zBnMyn0Lo" border="0"&gt;&lt;/img&gt;&lt;/a&gt;
&lt;/div&gt;&lt;img src="http://feeds.feedburner.com/~r/Datawocky/~4/j4y8gYC-_oU" height="1" width="1"/&gt;</content>


    <feedburner:origLink>http://anand.typepad.com/datawocky/2008/12/kosmix-adds-rocketfuel-to-power-voyage-of-exploration.html</feedburner:origLink></entry>
    <entry>
        <title>For Startups, Survival is not a Strategy</title>
        <link rel="alternate" type="text/html" href="http://feedproxy.google.com/~r/Datawocky/~3/wodZqGSeAA0/for-startups-survival-is-not-a-strategy.html" />
        <link rel="replies" type="text/html" href="http://anand.typepad.com/datawocky/2008/11/for-startups-survival-is-not-a-strategy.html" thr:count="2" thr:updated="2009-02-05T02:08:26-08:00" />
        <id>tag:typepad.com,2003:post-58751000</id>
        <published>2008-11-21T18:28:45-08:00</published>
        <updated>2008-11-21T18:28:45-08:00</updated>
        <summary>Note: As I was working on this post, I ran into Om Malik and showed him a draft. He liked it and asked to post it simultaneously on GigaOM. If you've read it on GigaOM, you can skip reading it...</summary>
        <author>
            <name>anand_rajaraman</name>
        </author>
        <category scheme="http://www.sixapart.com/ns/types#category" term="Venture Capital" />
        
        
<content type="html" xml:lang="en-US" xml:base="http://anand.typepad.com/datawocky/">
&lt;div xmlns="http://www.w3.org/1999/xhtml"&gt;&lt;p&gt;&lt;em&gt;&lt;strong&gt;Note:&lt;/strong&gt; As I was working on this post, I ran into &lt;a href="http://www.kosmix.com/topic/om_malik"&gt;Om Malik&lt;/a&gt; and showed him a draft. He liked it and asked to post it simultaneously on GigaOM. If you&amp;#39;ve &lt;a href="http://gigaom.com/2008/11/21/survival-is-not-a-strategy/"&gt;read it on GigaOM&lt;/a&gt;, you can skip reading it here.&lt;/em&gt;&lt;/p&gt;

&lt;p class="MsoBodyText"&gt;&lt;span&gt;In these
perilous economic times, the layoff memos often follow a familiar refrain: We
have cut costs by 20%. That gives us an additional year&amp;#39;s runway. Or two. Yes,
startups can cut costs and thereby survive for longer. But just because they
can, does not mean they should.&lt;br /&gt;
&lt;br /&gt;
Let me state at the very outset that this article applies only to
venture-backed startups, which are a small minority of businesses in the
economy. The sole purpose of most businesses is to create a steady income
stream for their owners and operators. Venture-backed startups, on the other
hand, are created with the sole purpose of leading to a meaningful exit for
founders, investors, and employees. Such an exit might be either an IPO or an
acquisition. &lt;br /&gt;
&lt;br /&gt;
The raison d&amp;#39; etre for such startups is therefore a successful exit, not mere
survival. And the lifeblood of any startup is growth. Growth along some
dimension: customers, usage, revenues, or profits. Under most economic
conditions, an IPO is impossible without revenue and profit growth&amp;#0160; -- and
we are unlikely to see a return soon of the times when it was. From an
acquisition point of view, stagnant companies are valued at low multiples of
revenue -- say 1x to 2x. The comparables are utilities.&lt;br /&gt;
&lt;/span&gt;&lt;/p&gt;&lt;p class="MsoBodyText"&gt;&lt;span&gt;A popular meme suggests that &amp;quot;&lt;a href="http://www.newsweek.com/id/146407"&gt;flat is the new up&lt;/a&gt;.&amp;quot; Given the downturn in the
economy, the argument goes, even keeping revenues flat is sufficient. This
argument, however, does not apply to startups. By definition, startups are
supposed to be attacking nascent market opportunities and unsaturated markets,
and so should be able to grow even through a downturn. If a startup cannot find
growth in this environment, it&amp;#39;s a clear message that the market opportunity
might be better served by an established company. Of course, growth in profits
or revenues are way better than growth just in usage; but even growth in usage
is better than stagnation on all three fronts. There is at least the
possibility that a company with strong usage growth might one day be attractive
to an acquirer with a good monetization engine.&lt;br /&gt;
&lt;br /&gt;
From a subjective point of view, it&amp;#39;s no fun to work at a startup that is not
growing along some dimension. Growth is necessary for everyone to enjoy the
experience, and feel they are accomplishing something. Stagnation leads to low
morale, and people sit around waiting for the axe to fall. It&amp;#39;s a slow,
agonizing way to die. Rather than let the company become a zombie, management
would be doing their investors and employees a favor by advocating in such
cases that they pull the plug on the company and return the remaining capital
to investors. &lt;o:p&gt;&lt;/o:p&gt;&lt;/span&gt;&lt;/p&gt;



&lt;p class="MsoBodyText"&gt;&lt;strong&gt;&lt;span&gt;Why VCs don’t put the zombies out of their
misery&lt;br /&gt;
&lt;o:p&gt;&lt;/o:p&gt;&lt;/span&gt;&lt;/strong&gt;&lt;/p&gt;



&lt;p class="MsoBodyText"&gt;&lt;span&gt;Founders and
executives have a lot of emotional capital invested in their companies, and so
it is understandable that they shy away from making the ultimate decision. However,
the surprising thing is that VCs often allow the zombies to survive for far too
long. The reason for this is a subtle misalignment of interests between VCs and
their investors. As long as a startup is still alive, VCs can carry the company
on their books at the valuation set by the last round of financing. Once they
pull the plug, the fund will receive pennies on the dollar, a loss that has to be
recorded on the books and doesn&amp;#39;t look good when the firm goes to raise their
next fund. That’s why every VC portfolio has its fair share of zombies. &lt;br /&gt;
&lt;br /&gt;
Another contributing factor is excessive preference overhangs. Investors
receive preferred stock with the right to get back their invested capital ahead
of common shareholders in an exit; in some cases they have the right receive a
multiple of their invested capital ahead of common shareholders. The total
amount that investors need to receive before common shareholders can
participate in an exit is called the &amp;quot;preference overhang.&amp;quot;&amp;#0160; &lt;o:p&gt;&lt;br /&gt;&lt;/o:p&gt;&lt;/span&gt;&lt;/p&gt;

&lt;p class="MsoBodyText"&gt;&lt;span&gt;If a company
has raised so much capital that any realistic acquisition will be below the
overhang, then common shareholders stand to receive nothing from the sale; and
so company management has no incentive to look for such an exit. In such cases,
it&amp;#39;s important for the VCs and management to agree to restructure the
preference overhangs to make such exits attractive to management. Otherwise the
company is destined to become a zombie.&lt;br /&gt;
&lt;o:p&gt;&lt;/o:p&gt;&lt;/span&gt;&lt;/p&gt;

&lt;p class="MsoBodyText"&gt;&lt;span&gt;Every startup
founder and employee has to consider three possible outcomes. Success, failure,
and zombiehood. Success is much better than failure, but quick failure beats
wasting years of your life on a zombie. If you are a company founder, and you
are considering layoffs to extend the runway (perhaps on the advice of your
venture investor), you should look at yourself in the mirror and ask whether
you are cutting away your growth opportunity and just choosing a lingering
death over a quick one.&lt;o:p&gt;&lt;/o:p&gt;&lt;/span&gt;&lt;/p&gt;&lt;/div&gt;
&lt;div class="feedflare"&gt;
&lt;a href="http://feeds.feedburner.com/~ff/Datawocky?a=wodZqGSeAA0:fJeZXjxdW1k:V_sGLiPBpWU"&gt;&lt;img src="http://feeds.feedburner.com/~ff/Datawocky?i=wodZqGSeAA0:fJeZXjxdW1k:V_sGLiPBpWU" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/Datawocky?a=wodZqGSeAA0:fJeZXjxdW1k:cGdyc7Q-1BI"&gt;&lt;img src="http://feeds.feedburner.com/~ff/Datawocky?d=cGdyc7Q-1BI" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/Datawocky?a=wodZqGSeAA0:fJeZXjxdW1k:gIN9vFwOqvQ"&gt;&lt;img src="http://feeds.feedburner.com/~ff/Datawocky?i=wodZqGSeAA0:fJeZXjxdW1k:gIN9vFwOqvQ" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/Datawocky?a=wodZqGSeAA0:fJeZXjxdW1k:F7zBnMyn0Lo"&gt;&lt;img src="http://feeds.feedburner.com/~ff/Datawocky?i=wodZqGSeAA0:fJeZXjxdW1k:F7zBnMyn0Lo" border="0"&gt;&lt;/img&gt;&lt;/a&gt;
&lt;/div&gt;&lt;img src="http://feeds.feedburner.com/~r/Datawocky/~4/wodZqGSeAA0" height="1" width="1"/&gt;</content>


    <feedburner:origLink>http://anand.typepad.com/datawocky/2008/11/for-startups-survival-is-not-a-strategy.html</feedburner:origLink></entry>
    <entry>
        <title>Google Chrome: A Masterstroke or a Blunder?</title>
        <link rel="alternate" type="text/html" href="http://feedproxy.google.com/~r/Datawocky/~3/Fp686Y88p-w/google-chrome-a-masterstroke-or-a-blunder.html" />
        <link rel="replies" type="text/html" href="http://anand.typepad.com/datawocky/2008/09/google-chrome-a-masterstroke-or-a-blunder.html" thr:count="18" thr:updated="2008-11-24T15:50:27-08:00" />
        <id>tag:typepad.com,2003:post-55272814</id>
        <published>2008-09-07T17:53:02-07:00</published>
        <updated>2008-09-07T17:53:02-07:00</updated>
        <summary>The internet world has been agog over Google's entry into the browser wars with Chrome. When we look back to this event several years from now with the benefit of hindsight, we might see it either as a master stroke,...</summary>
        <author>
            <name>anand_rajaraman</name>
        </author>
        <category scheme="http://www.sixapart.com/ns/types#category" term="Advertising" />
        <category scheme="http://www.sixapart.com/ns/types#category" term="Search" />
        
        
<content type="html" xml:lang="en-US" xml:base="http://anand.typepad.com/datawocky/">&lt;p&gt;The internet world has been agog over Google's entry into the browser wars with &lt;a href="http://googleblog.blogspot.com/2008/09/fresh-take-on-browser.html"&gt;Chrome&lt;/a&gt;. When we look back to this event  several years from now with the benefit of hindsight, we might see it either as a master stroke, or as Google's biggest strategic misstep.&lt;/p&gt;&lt;p&gt;The potential advantages to the internet community as a whole are considerable. The web has evolved beyond its roots as a collection of HTML documents and dumb frontends to database applications. We now expect everything from a web application that we do from a desktop application, and then some more: the added bonus of connectivity to vast computing resources in the cloud. In this context, browsers need to  evolve from HTML renderers to runtime containers, much as web servers evolved from simple servers of static files  and cgi scripts to modern application servers with an array of plugins that provide a variety of services. Chrome is the first browser to explicitly acknowledge this transition and make it the centerpiece of their efforts, and will force other browsers to follow suit. We will all benefit.&lt;/p&gt;&lt;p&gt;The potential advantages to Google also are considerable. If the stars and planets align, they can challenge Microsoft's dominance on the desktop by making the desktop irrelevant. Even otherwise, they can hope to use their dominance in search to promote Chrome, gaining significant browser marketshare and ensuring that Microsoft cannot challenge Google's search dominance by building features into Internet Explorer and Windows that integrate MSN's search and other services.&lt;/p&gt;&lt;p&gt;Therein, however, lies the first and perhaps the biggest risk to Google. Until now, Microsoft has been unable to really use IE and Windows to funnel traffic to MSN services and choke off Google. Given their antitrust woes, they have been treading carefully on this matter. Any overt attempt by them will evoke cries of foul from many market participants. Google has been in a great position to lead the outcry, because it has been purely a service accessible from the browser, without any toehold in browser market itself.&lt;/p&gt;&lt;p&gt;Chrome, however, eases some of the pressure on Microsoft. If Microsoft integrates MSN search or other services tightly into IE, it will be harder for Google to cry foul -- Microsoft could point to Chrome, and any steps taken by Google to integrate their services into Chrome, as counter-arguments. In addition, any outcry from Google can now be characterized as sour grapes from a loser -- Microsoft can say, we both have browsers out there, they have one too, ours is just better, and let consumers decide for themselves.&lt;/p&gt;&lt;p&gt;In some sense, regardless of the actual market penetration of Chrome, Google has lost the moral high ground in future arguments with Microsoft. I wonder whether Google might have achieved all their aims better not by releasing a Google-branded browser, but by working with Mozilla to improve Firefox from within. &lt;/p&gt;&lt;p&gt;Second, while Google has shown impressive technological wizardry in search and advertising, the desktop application game is very different from the internet service game. While users are very forgiving about beta tags that stay for years on services such as gmail, user expectations on matters such as &lt;a href="http://www.informationweek.com/news/internet/google/showArticle.jhtml?articleID=210500034"&gt;compatibility and security bugs&lt;/a&gt; are very high for desktop applications. It remains to be seen whether Google has the culture to succeed in this game, going beyond providing whiz bang features that thrill developers -- such as a blazingly fast Javascript engine -- to deliver a mainstream browser that competes on stability, security, and features.&lt;/p&gt;&lt;p&gt;The third problem is one of data contagion. Google has the largest "&lt;a href="http://battellemedia.com/archives/000063.php"&gt;database of intentions&lt;/a&gt;" in the world today: our search histories, which form the basis of Google's ad targeting. The thing that keeps me from freaking out that Google knows so much about me is that I access Google using a third-party browser. If Google has access to my desktop, and can tie my search history to that, the company can learn much about me that I keep isolated from my search behavior.&lt;em&gt; The cornerstone of privacy on the web today is that we can use products from different companies to create isolation&lt;/em&gt;: desktop from Microsoft, browser from Mozilla, search from Google. These companies have no incentive to share information. This is one instance where information silos serve us well as consumers. Any kind of vertical integration has the potential to erode privacy.&lt;/p&gt;&lt;p&gt;I'm not suggesting that Google would do anything evil with this data, or indeed that the thought even crossed their minds; thus far Google has behaved with admirable restraint is their usage of the database of intentions, staying away for example from behavioral targeting. But we should all be cognizant of the fact that companies are in business purely to benefit their shareholders. At some point, someone at Google might realize that the contents of my desktop can be used to target advertising, and it might be prove tempting in a period of slow revenue growth under a different management team.&lt;/p&gt;&lt;p&gt;Two striking historical parallels come to mind, one a masterstroke and the other a blunder, in both cases setting into motion events that could not be undone. In 49 BC, Julius Caesar &lt;a href="http://en.wikipedia.org/wiki/Caesar%27s_civil_war"&gt;crossed the Rubicon&lt;/a&gt; with his army, triggering a civil war where he triumphed over the forces of Pompey and became the master of Rome. And in 1812, Napoleon Bonaparte had Europe at his feet when he made the fateful decision to &lt;a href="http://en.wikipedia.org/wiki/French_invasion_of_Russia"&gt;invade Russia&lt;/a&gt;, greatly weakening his power and leading ultimately to his defeat at Waterloo. It will be interesting to see whether Chrome ends up being Google's Rubicon or its St. Petersburg. &lt;a href="http://en.wikipedia.org/wiki/Alea_iacta_est"&gt;Alea iacta est&lt;/a&gt;.&lt;/p&gt;&lt;div class="feedflare"&gt;
&lt;a href="http://feeds.feedburner.com/~ff/Datawocky?a=Fp686Y88p-w:WgyaJMWGOZo:V_sGLiPBpWU"&gt;&lt;img src="http://feeds.feedburner.com/~ff/Datawocky?i=Fp686Y88p-w:WgyaJMWGOZo:V_sGLiPBpWU" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/Datawocky?a=Fp686Y88p-w:WgyaJMWGOZo:cGdyc7Q-1BI"&gt;&lt;img src="http://feeds.feedburner.com/~ff/Datawocky?d=cGdyc7Q-1BI" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/Datawocky?a=Fp686Y88p-w:WgyaJMWGOZo:gIN9vFwOqvQ"&gt;&lt;img src="http://feeds.feedburner.com/~ff/Datawocky?i=Fp686Y88p-w:WgyaJMWGOZo:gIN9vFwOqvQ" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/Datawocky?a=Fp686Y88p-w:WgyaJMWGOZo:F7zBnMyn0Lo"&gt;&lt;img src="http://feeds.feedburner.com/~ff/Datawocky?i=Fp686Y88p-w:WgyaJMWGOZo:F7zBnMyn0Lo" border="0"&gt;&lt;/img&gt;&lt;/a&gt;
&lt;/div&gt;&lt;img src="http://feeds.feedburner.com/~r/Datawocky/~4/Fp686Y88p-w" height="1" width="1"/&gt;</content>


    <feedburner:origLink>http://anand.typepad.com/datawocky/2008/09/google-chrome-a-masterstroke-or-a-blunder.html</feedburner:origLink></entry>
    <entry>
        <title>Bridging the Gap between Relational Databases and MapReduce: Three New Approaches</title>
        <link rel="alternate" type="text/html" href="http://feedproxy.google.com/~r/Datawocky/~3/f_hMj0Af_F8/bridging-the-gap-between-relational-databases-and-mapreduce-three-new-approaches.html" />
        <link rel="replies" type="text/html" href="http://anand.typepad.com/datawocky/2008/09/bridging-the-gap-between-relational-databases-and-mapreduce-three-new-approaches.html" thr:count="14" thr:updated="2009-01-09T18:40:18-08:00" />
        <id>tag:typepad.com,2003:post-55206458</id>
        <published>2008-09-05T19:04:33-07:00</published>
        <updated>2008-09-05T19:04:33-07:00</updated>
        <summary>Popularized by Google, the MapReduce paradigm has proven to be a powerful way to analyze large datasets by harnessing the power of commodity clusters. While it provides a straightforward computational model, the approach suffers from certain key limitations, as discussed...</summary>
        <author>
            <name>anand_rajaraman</name>
        </author>
        <category scheme="http://www.sixapart.com/ns/types#category" term="Data Mining" />
        
        
<content type="html" xml:lang="en-US" xml:base="http://anand.typepad.com/datawocky/">&lt;div xmlns="http://www.w3.org/1999/xhtml"&gt;&lt;p&gt;Popularized by Google, the &lt;a href="http://labs.google.com/papers/mapreduce.html"&gt;MapReduce paradigm&lt;/a&gt; has proven to be a powerful way to analyze large datasets by harnessing the power of commodity clusters. While it provides a straightforward computational model, the approach suffers from certain key limitations, as discussed in a &lt;a href="http://anand.typepad.com/datawocky/2008/05/why-the-world-needs-a-new-database-system.html"&gt;prior post&lt;/a&gt;:&lt;span style="font-family: Georgia;"&gt;&lt;/span&gt;&lt;/p&gt;&lt;ul&gt;&#xD;
&lt;li&gt;The restriction to a rigid data flow model (Map followed by Reduce). Sometimes you need other flows e.g., map-reduce-map, union-map-reduce, join-reduce.&lt;/li&gt;&#xD;
&lt;li&gt;Common data analysis operations, which are provided by database systems as primitives, need to be recoded by hand each time in Java or C/C++: e.g., join, filter, common aggregates, group by, union, distinct. &lt;/li&gt;&#xD;
&lt;li&gt;The programmer has to hand-optimize the execution plan, for example by deciding how many map and reduce nodes are needed. For complex chained flows, this can become a nightmare. Databases provide query optimizers for this purpose -- the precise sequence of operations is decided by the optimizer rather than by a programmer.&lt;/li&gt;&#xD;
&lt;/ul&gt;&#xD;
&lt;p&gt;Three approaches have emerged to bridge the gap between relational databases and Map Reduce. Let's examine each approach in turn and then discuss their pros and cons.&lt;/p&gt;&lt;p&gt;The first approach is to create a new higher-level scripting language that uses Map and Reduce as primitive operations. Using such a scripting language, one can express operations that require multiple map reduce steps, together with joins and other set-oriented data processing operations. This approach is exemplified by &lt;a href="http://research.yahoo.com/node/90"&gt;Pig Latin&lt;/a&gt;, being developed by a team at Yahoo. PigLatin provides primitive operations that are commonly found in database systems, such as Group By, Join, Filter, Union, ForEach, and Distinct. Each PigLatin operator can take a User Defined Function (UDF) as a parameter.&lt;/p&gt;&lt;p&gt;The programmer creates a script that chains these operators to achieve the desired effect. In effect, the programmer codes by hand the query execution plan that might have been generated by a SQL engine. The effect of a single Map Reduce can be simulated by a Filter step followed by a Group By step. In many common cases, we don't even need to use UDFs, if the filtering and grouping criteria are straightforward ones that are supported in PigLatin. The PigLatin engine translates each script into a sequence of jobs on a Hadoop cluster. The PigLatin team reports that 25% of Hadoop jobs on Yahoo today originate as PigLatin scripts. That's impressive adoption.&lt;/p&gt;&lt;p&gt;Another interesting solution in this category is &lt;a href="http://research.google.com/archive/sawzall.html"&gt;Sawzall&lt;/a&gt;, a new scripting language developed at Google. Sawzall allows map reduce operations to be coded using a language that is reminiscent of awk. If your computation fits the Sawzall model, the code is much shorter and more elegant than C/C++/Java Map and Reduce functions. Sawzall, however, suffers from two drawbacks: it limits the programmer to a prefined set of aggregations in the Reduce phase (although it supplies a big library of these); and it offers no support for data analysis that goes beyond a single Map Reduce step, as PigLatin does. Most important, Sawzall is not available outside of Google, while PigLatin has been open-sourced by Yahoo.&lt;/p&gt;&lt;p&gt;The second approach is to integrate Map Reduce with a SQL database. Two database companies have recently announced support for MapReduce: &lt;a href="http://www.greenplum.com/"&gt;Greenplum&lt;/a&gt; and &lt;a href="http://www.asterdata.com/"&gt;Aster Data&lt;/a&gt;. Interestingly, they have taken two very different approaches. I will call Greenplum's approach &lt;em&gt;"loose coupling"&lt;/em&gt; and Aster Data's approach &lt;em&gt;"tight coupling&lt;/em&gt;". Let's examine each in turn.&lt;/p&gt;&lt;p&gt;Greenplum's &lt;a href="http://www.greenplum.com/resources/MapReduce/"&gt;loose-coupling approach&lt;/a&gt; ties together Greenplum's database with Hadoop's implementation of Map Reduce. A Hadoop Map Reduce operation is visible as a database view within Greenplum's SQL interpreter. Conversely, Hadoop map and reduce functions can access data in the database by iterating over the results of database queries. Issuing a SQL query that uses a map-reduce view will launch the corresponding map-reduce operation, whose results can then be processed by the rest of the SQL query.&lt;/p&gt;&lt;p&gt;Aster Data's &lt;a href="http://www.asterdata.com/blog/index.php/2008/08/27/how-asters-in-database-mapreduce-takes-udfs-to-the-next-level/"&gt;tight-coupling approach&lt;/a&gt; is more interesting: the database natively supports map reduce (with no need for Hadoop). Map and reduce functions can be written in a variety of programming languages (C/C++, java, python). Aster has extended the SQL language itself to support how these functions get invoked, creating a new SQL dialect called SQL/MR. One of the cool features is that map and reduce functions are automatically polymorphic, just like native SQL functions such as SUM, COUNT and so on: the programmer can write them once and the database engine can invoke them with rows with different numbers of columns and columns of different types. This is a huge convenience over the Hadoop approach.&lt;/p&gt;&lt;p&gt;What are the pros and cons of these three different approaches? The advantage of the Pig Latin approach is that it works directly at the file level, and therefore it can express MapReduce computations that don't fit the relational data model. An example of such an operation is building an inverted index on a collection of text documents. Databases in general are bad at handling large text and image data, which are treated as "blobs." &lt;/p&gt;&lt;p&gt;The biggest disadvantages of the PigLatin approach is the need to learn an entirely new programming language. There is a large group of developers and DBA's familiar with SQL, and PigLatin does not have this support base. The second disadvantage is that the developer has to code declarative query plans by hand, while SQL programmer can rely on two decades of work on SQL query optimizers, which can automatically decide the order of operations, the degree of parallelism, and when to use indexes.&lt;/p&gt;&lt;p&gt;The advantages and disadvantages of the SQL integration approach in general mirror those of the Pig Latin approach. The loose coupling approach of Greenplum allows the use of files as well as relations, and therefore in principle supports file-based computations. The burden is on the application programmer, however, to decide on the scheduling and optimization of the Hadoop portion of the computation, without much help from the database.&lt;/p&gt;&lt;p&gt;Aster's tight-coupling approach, on the other hand, allows a much greater degree of automatic query optimization. The database system is intimately involved in the way map and reduce operations are scheduled across the cluster, and can decide on the degree of parallelism, as well use strategies such as pipelining across map reduce and relational operators. In addition, since the database system is solely in charge of overall resource allocation and usage, it also ensures sandboxing of user-defined code, preventing it from consuming too many resources and slowing down other tasks. For computations that use only data in the relational database, Aster by far has the most elegant solution; &lt;span style="text-decoration: line-through;"&gt;the weakness, of course, is that data stored outside the database is off-limits. &lt;/span&gt;&lt;/p&gt;&lt;p&gt;&lt;strong&gt;Update&lt;/strong&gt;: Tassos Argyros from Aster Data points out that Aster's implementation does in fact allow access to data stored outside the database. The developer needs to write a UDF that exposes the data to the database engine.&lt;span style="text-decoration: line-through;"&gt;&lt;br&gt;&lt;/span&gt;&lt;/p&gt;&lt;p&gt;All three approaches thus have their strengths and weaknesses. It's exciting to see the emergence of fresh thinking on data analytics, going beyond the initial file-oriented Map Reduce model. Over time, these approaches will evolve, borrowing learnings from one other. In time one or more will become the dominant paradigm for data analytics; I will be watching this space with great interest.&lt;/p&gt;&lt;p&gt;&lt;strong&gt;Disclosure:&lt;/strong&gt; I'm an investor in Aster Data and sit on their Board of Directors. &lt;/p&gt;&lt;p&gt;&lt;/p&gt;&lt;p&gt;&lt;/p&gt;&lt;/div&gt;&lt;div class="feedflare"&gt;
&lt;a href="http://feeds.feedburner.com/~ff/Datawocky?a=f_hMj0Af_F8:Vm5ROErm4hU:V_sGLiPBpWU"&gt;&lt;img src="http://feeds.feedburner.com/~ff/Datawocky?i=f_hMj0Af_F8:Vm5ROErm4hU:V_sGLiPBpWU" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/Datawocky?a=f_hMj0Af_F8:Vm5ROErm4hU:cGdyc7Q-1BI"&gt;&lt;img src="http://feeds.feedburner.com/~ff/Datawocky?d=cGdyc7Q-1BI" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/Datawocky?a=f_hMj0Af_F8:Vm5ROErm4hU:gIN9vFwOqvQ"&gt;&lt;img src="http://feeds.feedburner.com/~ff/Datawocky?i=f_hMj0Af_F8:Vm5ROErm4hU:gIN9vFwOqvQ" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/Datawocky?a=f_hMj0Af_F8:Vm5ROErm4hU:F7zBnMyn0Lo"&gt;&lt;img src="http://feeds.feedburner.com/~ff/Datawocky?i=f_hMj0Af_F8:Vm5ROErm4hU:F7zBnMyn0Lo" border="0"&gt;&lt;/img&gt;&lt;/a&gt;
&lt;/div&gt;&lt;img src="http://feeds.feedburner.com/~r/Datawocky/~4/f_hMj0Af_F8" height="1" width="1"/&gt;</content>


    <feedburner:origLink>http://anand.typepad.com/datawocky/2008/09/bridging-the-gap-between-relational-databases-and-mapreduce-three-new-approaches.html</feedburner:origLink></entry>
    <entry>
        <title>Stop Email Overload and Break Silos Using Wikis, Blogs, and IM</title>
        <link rel="alternate" type="text/html" href="http://feedproxy.google.com/~r/Datawocky/~3/8x5T1vkeLHc/stop-email-overload-and-break-silos-using-wikis-blogs-and-im.html" />
        <link rel="replies" type="text/html" href="http://anand.typepad.com/datawocky/2008/07/stop-email-overload-and-break-silos-using-wikis-blogs-and-im.html" thr:count="16" thr:updated="2008-12-03T11:02:43-08:00" />
        <id>tag:typepad.com,2003:post-53037496</id>
        <published>2008-07-21T15:43:08-07:00</published>
        <updated>2008-07-21T15:43:08-07:00</updated>
        <summary>Email is the central nervous system of most modern organizations, from startups to large corporations. Every communication, from the most important (planning for the big client meeting tomorrow) to the most trivial (fresh donuts in the kitchen) takes place through...</summary>
        <author>
            <name>anand_rajaraman</name>
        </author>
        
        
<content type="html" xml:lang="en-US" xml:base="http://anand.typepad.com/datawocky/">
&lt;div xmlns="http://www.w3.org/1999/xhtml"&gt;&lt;p&gt;Email is the central nervous system of most modern organizations, from
startups to large corporations. Every communication, from the most
important (planning for the big client meeting tomorrow) to the most
trivial (fresh donuts in the kitchen) takes place through the corporate
email system. The results: email overload and lowered productivity for
the entire organization. Employees are tethered to their email via
Blackberries even over the weekend, leading to communications burnout.&lt;/p&gt;&lt;p&gt;The
biggest single reason for this is the inherent nature of email itself:
it is a point-to-point communication medium. The sender has to decide
both the content of the message as well as who the recipients are. If
the recipient list is too large, it contributes to email overload. If
it is too small, that could lead to communication gaps and
"informational silos" in the organization, where one group in the
company doesn't really know what the other group is doing. Another
problem is that each email message is a single unit, making it hard to
track conversations among multiple parties. Many email readers thread
conversations, but that is done at a syntactic rather than semantic
level. Finally, putting everything in email makes it difficult to build
institutional memory. &lt;/p&gt;&lt;p&gt;We hit the email
wall at my company &lt;a href="http://www.kosmix.com"&gt;Kosmix&lt;/a&gt; recently. When we were less than 30 people,
managing by email worked reasonably well. The team was small enough
that everyone knew what everyone else was doing. Frequent hallway
conversations reinforced relationships. However, once we crossed the
30-person mark, we noticed problems creeping in. We started hearing
complaints of email overload and too many meetings. And despite the
email overload and too many meetings, people still felt that there was
a communication problem and a lack of visibility across teams and
projects. We were straining the limits of email as the sole
communications mechanism.&lt;/p&gt;&lt;p&gt;We knew something
had to be done. But what? Sri Subramaniam, our head of engineering,
proposed a bold restructuring of our internal communications. He led an
effort that resulted in us relying less on email and more on wikis,
blogs, and instant messaging. Here's how we use these technologies
everyday in running our business.&lt;/p&gt;&lt;h2&gt;Blogs for Status Reports&lt;br&gt;&lt;/h2&gt;&lt;p&gt;Each
employee and each project has a dedicated blog. People can post as
often as they wish to their personal or project blog, but they are
required to post at least one weekly status update. All blogs are
visible to everyone in the company. Anyone can subscribe to the feed
for any particular team or individual blog. So for example, Josh in
engineering can follow the blog of Mike in sales, if he's curious what
Mike is up to. This results in complete 360 degree visibility
throughout the organization. People can also post comments on these
blogs. Someone might post a problem they are facing, and others can
post comments providing suggestions. This results in automatic grouping
of conversations based on topics of interest.&lt;/p&gt;&lt;p&gt;The
biggest advantage of the blog approach is that it is a
publish-subscribe mechanism. I don't need to decide who to direct my
communication to;&amp;nbsp; I just post on my blog. Anyone in the company who is
interested in what I'm doing can subscribe to my blog to be notified of
updates. And if someone just has a passing interest, they can always
read my blog periodically without subscribing to it. This approach also
breaks silos, for example, between engineering and marketing, or
between marketing and sales. Sometimes the best product ideas come from
sales people. And sometimes the best sales ideas come from engineers. &lt;/p&gt;&lt;p&gt;No one is required to read any particular blog, with two exceptions:&lt;/p&gt;&lt;ol&gt;
&lt;li&gt;Managers are expected to read the status updates of their team members and post feedback.&lt;/li&gt;
&lt;li&gt;People working on a project are expected to read each other's blogs.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;The
blog approach has reduced email overload at Kosmix and even reduced the
number of time-consuming "status update" meetings.&amp;nbsp; Most important, the
blog serves as an institutional memory -- an electronic record of our
business. Conversations do not
get lost in the ether but are recorded and can be searched at any time
in the
future by new people on a project or new company employees.&lt;/p&gt;&lt;h2&gt;The Wiki for Persistent Information&lt;br&gt;&lt;/h2&gt;&lt;p&gt;While
blogs are great for status updates and discussions around ideas, they
are not the best place to put items that serve as reference material:
for example, documentation, specs, reports, and so on. The problem is
that blogs are in reverse chronological order, and each blog can have
just one author, preventing collaborative editing. For these
situations, we use a wiki. The internal corporate wiki has sections
corresponding to each project and each functional group in the company.
Documentation, specs, and reports go into the wiki. &lt;/p&gt;&lt;p&gt;The
other critical section on the wiki is the Team section. Every employee
has a homepage on the wiki, with a recent photo, describing their
responsibilities at work and interests outside of work. As the team
grows, and you see a new face at the office, this is a quick way of
finding out who that person is.&lt;span style="font-weight: bold;"&gt;&lt;/span&gt;&lt;/p&gt;&lt;h2&gt;Instant Messaging for Spontaneous Discussions&lt;/h2&gt;&lt;p&gt;As
Kosmix has grown, we now have people working from more than one
physical location. In addition, we promote a culture of people working
from home whenever it is compatible with their job responsibilities.
Thus, we need a substitute for the face-to-face hallway conversations
that cannot happen because someone is working from home or from another
location. Email is not the best option because it is asynchronous and
thus loses the spontaneity of a hallway chat. &lt;/p&gt;&lt;p&gt;Instant
messaging fills this need very well indeed. The entire Kosmix team is
on IM. Each team member is required to set the "status" message on
their IM client during normal sane working hours to indicate where they
are working from. They can also post a "Do not disturb" message to
indicate that they don't welcome interruptions at the moment. Instant
messaging leads to quick resolution of many issues without spawning
interminable email threads. &lt;/p&gt;&lt;p&gt;If I needed to have everyone in the company on my IM buddy list, that would be a very long buddy list indeed. To avoid this problem, every team member's IM handle and status are displayed on their wiki homepage. You can initiate an IM session with anyone from their wiki homepage.&lt;/p&gt;&lt;h2&gt;Convention over Configuration&lt;/h2&gt;&lt;p&gt;The effects of the communication restructuring have been
immediate and very visible. They include a lot less email and almost
none on weekends; better communication among people; and 360 degree
visibility for every member of the Kosmix team. After we instituted
these changes, everyone on the team feels more productive, more
knowledgeable about the company, has more spare time to spend on things
outside of work. &lt;/p&gt;&lt;p&gt;
Kosmix is certainly not the first company to use internal blogs, wikis,
and IM for corporate communication. Google has been using blogs for
status reports for a while now. The big difference is the conventions
we have established about we use these tools. For example, one of the
common complaints I've heard about Google's use of internal blogs is
that most employees feel no one reads their blog. I've heard of one
case where an employee for several weeks posted a status report that
read: "Is anyone reading this?" By establishing conventions around
expected read-write patterns, we have avoided this problem so far at
Kosmix. &lt;/p&gt;&lt;p&gt;
No doubt as Kosmix grows further, even this model will break down at
some point and we will have to look for new communication models. I'll
post an update when that happens! In the meantime, please do share your
experiences of innovative corporate communication practices.&lt;/p&gt;&lt;h2&gt;Implementation Notes&lt;br&gt;&lt;/h2&gt;&lt;p&gt;We
use &lt;a href="http://www.twiki.org"&gt;twiki&lt;/a&gt; for our wiki and blog software at
Kosmix. The wiki functionality in twiki is great, but it took quite a
bit of customization work from our indefatigable Sri Subramaniam to make it work
well as a blogging platform too. We are planning to release our twiki
tweaks as open source in the next couple of months, once we have a
chance to package them neatly.&lt;/p&gt;&lt;p&gt;Another
great option for blogs is &lt;a href="http://www.wordpress.org"&gt;WordPress&lt;/a&gt;, which
allows you to host blogs internal to your company. We went with twiki
because of the integrated wiki/blogging solution.&lt;/p&gt;&lt;p&gt;We
have standardized on &lt;a href="http://messenger.yahoo.com"&gt;Yahoo! Instant Messenger&lt;/a&gt; for instant messaging. However, the other
IM products such as &lt;a href="http://webmessenger.msn.com"&gt;MSN Instant Messenger&lt;/a&gt; and &lt;a href="http://www.google.com/talk"&gt;Google Talk&lt;/a&gt; have comparable
functionality. I would suggest you pick the one most people in your
company already use for personal communication. &lt;/p&gt;&lt;/div&gt;
&lt;div class="feedflare"&gt;
&lt;a href="http://feeds.feedburner.com/~ff/Datawocky?a=8x5T1vkeLHc:SYCy9SMXKjg:V_sGLiPBpWU"&gt;&lt;img src="http://feeds.feedburner.com/~ff/Datawocky?i=8x5T1vkeLHc:SYCy9SMXKjg:V_sGLiPBpWU" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/Datawocky?a=8x5T1vkeLHc:SYCy9SMXKjg:cGdyc7Q-1BI"&gt;&lt;img src="http://feeds.feedburner.com/~ff/Datawocky?d=cGdyc7Q-1BI" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/Datawocky?a=8x5T1vkeLHc:SYCy9SMXKjg:gIN9vFwOqvQ"&gt;&lt;img src="http://feeds.feedburner.com/~ff/Datawocky?i=8x5T1vkeLHc:SYCy9SMXKjg:gIN9vFwOqvQ" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/Datawocky?a=8x5T1vkeLHc:SYCy9SMXKjg:F7zBnMyn0Lo"&gt;&lt;img src="http://feeds.feedburner.com/~ff/Datawocky?i=8x5T1vkeLHc:SYCy9SMXKjg:F7zBnMyn0Lo" border="0"&gt;&lt;/img&gt;&lt;/a&gt;
&lt;/div&gt;&lt;img src="http://feeds.feedburner.com/~r/Datawocky/~4/8x5T1vkeLHc" height="1" width="1"/&gt;</content>


    <feedburner:origLink>http://anand.typepad.com/datawocky/2008/07/stop-email-overload-and-break-silos-using-wikis-blogs-and-im.html</feedburner:origLink></entry>
    <entry>
        <title>Why Google Doesn't Provide Earnings Forecasts</title>
        <link rel="alternate" type="text/html" href="http://feedproxy.google.com/~r/Datawocky/~3/89aRmHMWaHI/why-google-doesnt-provide-earnings-forecasts.html" />
        <link rel="replies" type="text/html" href="http://anand.typepad.com/datawocky/2008/07/why-google-doesnt-provide-earnings-forecasts.html" thr:count="1" thr:updated="2008-11-03T03:26:47-08:00" />
        <id>tag:typepad.com,2003:post-52840980</id>
        <published>2008-07-17T12:27:17-07:00</published>
        <updated>2008-07-17T12:27:17-07:00</updated>
        <summary>Most public companies provide forecasts of revenue and earnings in the upcoming quarters. These forecasts (sometimes called "guidance") form the basis of the work most stock analysts do to make buy and sell recommendations. Much to the consternation of these...</summary>
        <author>
            <name>anand_rajaraman</name>
        </author>
        <category scheme="http://www.sixapart.com/ns/types#category" term="Advertising" />
        <category scheme="http://www.sixapart.com/ns/types#category" term="Search" />
        
        
<content type="html" xml:lang="en-US" xml:base="http://anand.typepad.com/datawocky/">
&lt;div xmlns="http://www.w3.org/1999/xhtml"&gt;&lt;p&gt;Most public companies provide forecasts of revenue and earnings in the
upcoming quarters. These forecasts (sometimes called "guidance") form
the basis of the work most stock analysts do to make buy and sell
recommendations. Much to the consternation of these analysts, Google is
among the few companies that have refused to follow this practice. As a
result, estimates of Google's revenue by analysts using publicly
available data, like comScore numbers, have often been &lt;a href="http://anand.typepad.com/datawocky/2008/04/more-data-beats.html"&gt;spectacularly
wrong&lt;/a&gt;. Today's earnings call may be no different. &lt;/p&gt;&lt;p&gt;A
Google executive once explained to me why Google doesn't provide
forecasts. To understand it, you have think about the engineers at
Google who work on optimizing AdWords. How do they know they're doing a
good job? We know that Google is constantly bucket-testing tweaks to
their AdWords algorithms. An ad optimization project is considered
successful if it has one of two results:&lt;/p&gt;&lt;ul&gt;
&lt;li&gt;Increase revenue per search (RPS), while not using additional ad real estate on the search results page (SERP).&lt;/li&gt;
&lt;li&gt;Reduce the ad real estate on each SERP, while not reducing RPS.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The
tricky cases are the ones that increase RPS, while also using more ad
real estate. It then becomes a judgment call on whether they should be
rolled out across the site. If Google were to make earnings forecasts,
the thinking went, there would be huge temptation to roll out tweaks in
the gray area to make the numbers. As the quarters roll by, the area of
the page devotes to ads would keep steadily increasing, leading to
longer term problems with customer retention.&lt;/p&gt;&lt;p&gt;Of course, this doesn't mean there is no earnings pressure. In reality, whether they issue guidance or not, Google's stock price does depend on whether they continue to deliver robust revenue and earnings growth. So implicitly, there is always pressure to beat the estimates. And for the first time, as Google's stock has taken a hammering in recent months, I've heard about hiring slowdowns at Google. So there is definitely pressure to cut costs as well. It will be interesting to observe the battle between idealism and expediency play itself out, with its progress reflected in the ad real estate on Google's search results. It's easy to be idealistic with the wind behind your back; the true test is whether you retain the idealism in the face of headwinds. Time will tell.&lt;/p&gt;&lt;p&gt;This
brings us to today's earnings call. In my experience, the best
predictor of Google earnings has been &lt;a href="http://www.efrontier.com"&gt;Efficient Frontier's&lt;/a&gt; excellent
Search Engine Performance Report. EF is the largest ad agency for SEM
advertisers, and manages the campaigns of several large advertisers on
Google, Yahoo, and Microsoft. As I had noted earlier, in Q1 &lt;a href="http://anand.typepad.com/datawocky/2008/04/more-data-beats.html"&gt;an estimate
based on their report handily beat other forecasts&lt;/a&gt;, most of which use
ComScore data. (&lt;strong&gt;Disclosure&lt;/strong&gt;: My fund Cambrian Ventures is an investor in
EF.)&lt;/p&gt;&lt;p&gt;EF's &lt;a href="http://blog.efrontier.com/insights/2008/07/q2-search-engin.html"&gt;report for Q2&lt;/a&gt;, released this
morning, indicates a strong quarter for Google. Google gained more than
its fair share of advertising dollars in Q2 2008. For every new dollar
spent on search advertising, $1.10 was spent on Google, at the expense
of Yahoo and Microsoft. In addition, Google's average cost-per-click
(CPC) increased by 13.8% in Q2 2008 versus Q2 2007, while click volume
and CTR increased as well. And, there was strong growth overseas as
well, which should help earnings given the weak dollar.&lt;/p&gt;&lt;p&gt;I
don't have the time right now to do the math and figure out whether the
robust performance was sufficient to beat the &lt;a href="http://www.alleyinsider.com/2008/7/google-goog-q2-earnings-preview-downside-risk-to-revenue-"&gt;Street's estimates&lt;/a&gt;. You
should read the report for yourself and make that call.&lt;/p&gt;&lt;p&gt;&lt;strong&gt;Update:&lt;/strong&gt; Google's results, although robust, were below expectations. The biggest moment in the earnings call for me was this quote from Sergey (via &lt;a href="http://www.alleyinsider.com/2008/7/google-goog-q2-earnings-live-analysis"&gt;Silicon Alley Insider&lt;/a&gt;):&lt;br&gt;&lt;strong&gt;&lt;/strong&gt;&lt;/p&gt;&lt;p&gt;&lt;em&gt;&lt;blockquote&gt;&lt;p&gt;Sergey said the company may have overdone its quality control
efforts in the quarter (reducing the number of ads), and the reversal
of this could provide a modest accelerator to Q3&lt;/p&gt;&lt;/blockquote&gt;&lt;/em&gt;Quality efforts "overdone"? &lt;strong&gt;Apparently those pressures are telling after all, and Google is going abandon their principles a wee bit to venture into the grey zone.&lt;/strong&gt; Is is the start of a slippery slope?&lt;br&gt;&lt;em&gt;&lt;br&gt;&lt;/em&gt;&lt;strong&gt;&lt;/strong&gt;&lt;/p&gt;&lt;/div&gt;
&lt;div class="feedflare"&gt;
&lt;a href="http://feeds.feedburner.com/~ff/Datawocky?a=89aRmHMWaHI:8uWMRan0QnM:V_sGLiPBpWU"&gt;&lt;img src="http://feeds.feedburner.com/~ff/Datawocky?i=89aRmHMWaHI:8uWMRan0QnM:V_sGLiPBpWU" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/Datawocky?a=89aRmHMWaHI:8uWMRan0QnM:cGdyc7Q-1BI"&gt;&lt;img src="http://feeds.feedburner.com/~ff/Datawocky?d=cGdyc7Q-1BI" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/Datawocky?a=89aRmHMWaHI:8uWMRan0QnM:gIN9vFwOqvQ"&gt;&lt;img src="http://feeds.feedburner.com/~ff/Datawocky?i=89aRmHMWaHI:8uWMRan0QnM:gIN9vFwOqvQ" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/Datawocky?a=89aRmHMWaHI:8uWMRan0QnM:F7zBnMyn0Lo"&gt;&lt;img src="http://feeds.feedburner.com/~ff/Datawocky?i=89aRmHMWaHI:8uWMRan0QnM:F7zBnMyn0Lo" border="0"&gt;&lt;/img&gt;&lt;/a&gt;
&lt;/div&gt;&lt;img src="http://feeds.feedburner.com/~r/Datawocky/~4/89aRmHMWaHI" height="1" width="1"/&gt;</content>


    <feedburner:origLink>http://anand.typepad.com/datawocky/2008/07/why-google-doesnt-provide-earnings-forecasts.html</feedburner:origLink></entry>
    <entry>
        <title>The Real Long Tail: Why both Chris Anderson and Anita Elberse are Wrong</title>
        <link rel="alternate" type="text/html" href="http://feedproxy.google.com/~r/Datawocky/~3/ZtqvRaNE-eE/the-real-long-tail-why-both-chris-anderson-and-anita-elberse-are-wrong.html" />
        <link rel="replies" type="text/html" href="http://anand.typepad.com/datawocky/2008/07/the-real-long-tail-why-both-chris-anderson-and-anita-elberse-are-wrong.html" thr:count="8" thr:updated="2008-07-23T03:25:14-07:00" />
        <id>tag:typepad.com,2003:post-52484866</id>
        <published>2008-07-09T22:18:50-07:00</published>
        <updated>2008-07-09T22:18:50-07:00</updated>
        <summary>A new study by Anita Elberse, published in the Harvard Business Review, raises questions about the validity of Chris Anderson's Long Tail theory. If you're related to Rip Van Winkle, the Long Tail theory suggests that the dramatically lower distribution...</summary>
        <author>
            <name>anand_rajaraman</name>
        </author>
        <category scheme="http://www.sixapart.com/ns/types#category" term="Social Media" />
        <category scheme="http://www.sixapart.com/ns/types#category" term="Venture Capital" />
        
        
<content type="html" xml:lang="en-US" xml:base="http://anand.typepad.com/datawocky/">
&lt;div xmlns="http://www.w3.org/1999/xhtml"&gt;&lt;p&gt;A &lt;a href="http://harvardbusinessonline.hbsp.harvard.edu/hbsp/hbr/articles/article.jsp?ml_action=get-article&amp;amp;articleID=R0807H&amp;amp;ml_issueid=BR0807&amp;amp;ml_subscriber=true&amp;amp;pageNumber=1&amp;amp;_requestid=79542"&gt;new study&lt;/a&gt; by Anita Elberse, published in the Harvard Business Review, raises questions about the validity
of Chris Anderson's &lt;a href="http://www.wired.com/wired/archive/12.10/tail.html"&gt;Long Tail&lt;/a&gt; theory. If you're related to &lt;a href="http://en.wikipedia.org/wiki/Rip_Van_Winkle"&gt;Rip Van
Winkle&lt;/a&gt;, the Long Tail theory suggests that the dramatically lower
distribution costs for media (such as music and movies) enabled by the
internet has the potential to reshape the demand curve for media.
Traditionally, these businesses have been hits-driven, with the
majority of revenue and profits being attributable to a small number of
items (the hits). Anderson argues that the internet's ability to serve
niches cost-effectively increases the demand for items further down the
"tail" of the demand curve, making the aggregate demand for the tail
comparable to that for the head.&lt;/p&gt;&lt;p&gt;Anderson's
insight resonated instantly with the
digerati. It is said that Helen of Troy's face launched a thousand
ships; the Long Tail theory certainly launched more than a thousand
startups, all with an obligatory Long Tail slide in their investor
pitches. Recently, however, there has been a creeping suspicion that
the data don't support the theory; the backlash has been spearheaded,
among others, by &lt;a href="http://online.wsj.com/public/article/SB115387606762117314-Inp5lUxHwVDwS_SJv5zaQShPXlE_20070726.html"&gt;Lee Gomes&lt;/a&gt; of the Wall Street Journal. In her piece, Anita Elberse does a deep dive into the data and concludes
that the Long Tail theory is flawed.&lt;/p&gt;&lt;p&gt;Anderson
has posted a &lt;a href="http://www.longtail.com/the_long_tail/2008/06/excellent-hbr-p.html"&gt;rebuttal&lt;/a&gt; on his blog, pointing out a problem with
Elberse's analysis: defining the head and tail in percentage terms.
There is some truth to Anderson's rebuttal. But the heart of Elberse's
criticism lies not in the definition of the head and the tail. It's in
using &lt;a href="http://harvardbusinessonline.hbsp.harvard.edu/hbsp/hbr/articles/article.jsp?pageNumber=1&amp;amp;ml_subscriber=true&amp;amp;articleID=R0807H&amp;amp;_requestid=79542&amp;amp;referer=/hbsp/hbr/articles/article.jsp&amp;amp;reason=freeContent&amp;amp;productId=R0807H&amp;amp;OPERATION_TYPE=CHECK_COOKIE&amp;amp;FALSE=FALSE&amp;amp;TRUE=TRUE&amp;amp;ml_action=get-sidebar&amp;amp;ml_context=sidebar&amp;amp;ml_issueid=BR0807&amp;amp;ml_id=R0807H&amp;amp;ml_sidebar_id=2"&gt;McPhee's theory of exposure&lt;/a&gt; to conclude that positive feedback
effects reinforce the popularity of hits, while dooming items in the
tail to perpetual obscurity. She presents data from Quickflix, an
Australian movie rentals service showing that movies in the
tail are rated on average lower than movies in the head. Thus, movies
in the tail are destined to remain in the tail. Elberse exhorts media
executives to concentrate their resources on backing a small set of
potential blockbusters, rather than fritter it away on niches.&lt;/p&gt;&lt;p&gt;The
big problem with this argument is that it conflates cause and effect.
Before the internet, distribution was expensive, and there was no way
for consumers to provide instant feedback on products. Consumers then
got little choice in the matter of what items were readily available
and what items were hard to find. Thus, the hits were picked by a few
studio executives, publishers, or record producers who "greenlighted"
projects they thought had hit potential. But when distribution is
cheap, and consumer feedback loops are in place, the items that a lot
of consumers like become popular and move into the head. It's not that
items in the tail are inherently rated lower; items are in the tail
precisely &lt;strong id="v2cd"&gt;because&lt;/strong&gt; they are rated lower.&lt;/p&gt;&lt;p&gt;It's
as if we're comparing two systems of government, a hereditary
aristocracy and a democracy, by comparing the sizes of the ruling elite
in the two cases. That misses the point entirely. What matters is not
the size of the ruling elite, it's how they got there. So, the big
change wrought by the internet is not so much to change the shape of
the demand curve for media products, as Anderson claims; nor has there
been no change whatsoever, as Elberse posits. The big change is not in
what fraction of the demand is in the head, it's in how the items that
are in the head got there in the first place. Any change in the shape
of the curve itself is incidental.&lt;/p&gt;&lt;p&gt;There's
another market where we are seeing this phenomenon play out: the market
for Facebook (and MySpace) apps. In earlier years, it took a lot of
capital to get a company off the ground. The companies that got funded
were the ones with good business plans who could convince VCs to take
the plunge based on the people, the plan, and potentially some
intellectual property. But it doesn't take much capital to write a
Facebook app, leading to a proliferation of them. This paves the way
for the expected inversion. Facebook users don't use the apps that VCs
fund. Instead, Facebook users decide which apps they like, and VCs fund
the ones, such as Slide and RockYou, that gain popularity. &lt;/p&gt;&lt;p&gt;
It is instructive to look at the &lt;a style="float: left;" href="http://anand.typepad.com/.a/6a00d83471bc3153ef00e553ae3c248834-popup" onclick="window.open( this.href, '_blank', 'width=640,height=480,scrollbars=no,resizable=no,toolbar=no,directories=no,location=no,menubar=no,status=no,left=0,top=0' ); return false"&gt;&lt;img  class="at-xid-6a00d83471bc3153ef00e553ae3c248834 " alt="Facebook" src="http://anand.typepad.com/.a/6a00d83471bc3153ef00e553ae3c248834-320wi" style="margin: 0px 5px 5px 0px;"&gt;&lt;/a&gt;
&lt;a href="http://en.oreilly.com/gspeast2008/public/schedule/detail/4482"&gt;Facebook app trends study&lt;/a&gt; published by
Roger Margoulas and Ben Lorica at O'Reilly Research. The study shows
that at last count, there were close to 30,000 facebook apps. Usage,
however, is highly concentrated among the top few apps, a classic
example of a hits-driven industry (see graph) -- no long tail.
However, these hits have been produced by the collective action of
millions of Facebook users, rather than by a small set of savvy media
executives. And there's a lot of churn: new applications join the
winners and old winners die and are buried in the tail.&lt;/p&gt;&lt;p&gt;
The real Long Tail created by the internet is not the long tail of
consumption, but the long tail of influence. Earlier, the ability to
influence the decisions on who the winners and losers were rested with
a few media executives. Now every social network user has some
potential influence, however small, on the result. The long tail of
influence, combined with instant feedback loops, leads to a short tail
of consumption. The Facebook app market is a leading indicator of the
path the entire media industry will take in years to come.&lt;/p&gt;&lt;p&gt;&lt;strong&gt;Update:&lt;/strong&gt; Chris Anderson has posted a rebuttal in the Comments. Thanks Chris! Please do read his comment and my response. Chris points out that Facebook apps still follow a power law distribution. It doesn't matter how long the tail is, what matters is how heavy it is. The area under the long tail is a function of both length and depth, and depends crucially on the power law exponent. For the mathematically minded, the &lt;a href="http://anand.typepad.com/datawocky/not-all-powerlaws-have-lo.html"&gt;details are here&lt;/a&gt;.&lt;br&gt; &lt;/p&gt;&lt;/div&gt;
&lt;div class="feedflare"&gt;
&lt;a href="http://feeds.feedburner.com/~ff/Datawocky?a=ZtqvRaNE-eE:bQTiGvVSC_g:V_sGLiPBpWU"&gt;&lt;img src="http://feeds.feedburner.com/~ff/Datawocky?i=ZtqvRaNE-eE:bQTiGvVSC_g:V_sGLiPBpWU" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/Datawocky?a=ZtqvRaNE-eE:bQTiGvVSC_g:cGdyc7Q-1BI"&gt;&lt;img src="http://feeds.feedburner.com/~ff/Datawocky?d=cGdyc7Q-1BI" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/Datawocky?a=ZtqvRaNE-eE:bQTiGvVSC_g:gIN9vFwOqvQ"&gt;&lt;img src="http://feeds.feedburner.com/~ff/Datawocky?i=ZtqvRaNE-eE:bQTiGvVSC_g:gIN9vFwOqvQ" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/Datawocky?a=ZtqvRaNE-eE:bQTiGvVSC_g:F7zBnMyn0Lo"&gt;&lt;img src="http://feeds.feedburner.com/~ff/Datawocky?i=ZtqvRaNE-eE:bQTiGvVSC_g:F7zBnMyn0Lo" border="0"&gt;&lt;/img&gt;&lt;/a&gt;
&lt;/div&gt;&lt;img src="http://feeds.feedburner.com/~r/Datawocky/~4/ZtqvRaNE-eE" height="1" width="1"/&gt;</content>


    <feedburner:origLink>http://anand.typepad.com/datawocky/2008/07/the-real-long-tail-why-both-chris-anderson-and-anita-elberse-are-wrong.html</feedburner:origLink></entry>
 
</feed><!-- ph=1 --><!-- nhm:dynamic-ssi -->
