<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" media="screen" href="/~d/styles/atom10full.xsl"?><?xml-stylesheet type="text/css" media="screen" href="http://feeds.feedburner.com/~d/styles/itemcontent.css"?><feed xmlns="http://www.w3.org/2005/Atom" xmlns:openSearch="http://a9.com/-/spec/opensearch/1.1/" xmlns:georss="http://www.georss.org/georss" xmlns:gd="http://schemas.google.com/g/2005" xmlns:geo="http://www.w3.org/2003/01/geo/wgs84_pos#" xmlns:feedburner="http://rssnamespace.org/feedburner/ext/1.0" gd:etag="W/&quot;CUEGQ3o5cSp7ImA9WxNUF04.&quot;"><id>tag:blogger.com,1999:blog-7118563403027467631</id><updated>2009-11-08T21:27:02.429-05:00</updated><title>A Computer Scientist in a Business School</title><subtitle type="html">Thoughts and stories of a computer scientist in a business school.</subtitle><link rel="http://schemas.google.com/g/2005#feed" type="application/atom+xml" href="http://behind-the-enemy-lines.blogspot.com/feeds/posts/default" /><link rel="alternate" type="text/html" href="http://behind-the-enemy-lines.blogspot.com/" /><link rel="hub" href="http://pubsubhubbub.appspot.com/" /><link rel="next" type="application/atom+xml" href="http://www.blogger.com/feeds/7118563403027467631/posts/default?start-index=26&amp;max-results=25&amp;redirect=false&amp;v=2" /><author><name>Panos Ipeirotis</name><uri>http://www.blogger.com/profile/15283752183704062501</uri><email>noreply@blogger.com</email></author><generator version="7.00" uri="http://www.blogger.com">Blogger</generator><openSearch:totalResults>116</openSearch:totalResults><openSearch:startIndex>1</openSearch:startIndex><openSearch:itemsPerPage>25</openSearch:itemsPerPage><geo:lat>40.72596</geo:lat><geo:long>-73.998345</geo:long><link rel="license" type="text/html" href="http://creativecommons.org/licenses/by/3.0/" /><logo>http://creativecommons.org/images/public/somerights20.gif</logo><link rel="self" href="http://feeds.feedburner.com/AComputerScientistInABusinessSchool" type="application/atom+xml" /><feedburner:emailServiceId>AComputerScientistInABusinessSchool</feedburner:emailServiceId><feedburner:feedburnerHostname>http://feedburner.google.com</feedburner:feedburnerHostname><atom10:link xmlns:atom10="http://www.w3.org/2005/Atom" rel="hub" href="http://pubsubhubbub.appspot.com" /><entry gd:etag="W/&quot;D0AHRHw8fyp7ImA9WxNUFk0.&quot;"><id>tag:blogger.com,1999:blog-7118563403027467631.post-500392882300102496</id><published>2009-11-06T23:06:00.006-05:00</published><updated>2009-11-07T09:55:35.277-05:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2009-11-07T09:55:35.277-05:00</app:edited><category scheme="http://www.blogger.com/atom/ns#" term="economics" /><category scheme="http://www.blogger.com/atom/ns#" term="research" /><title>Utility of Money and the St. Petersburg Paradox</title><content type="html">Consider the following game:&lt;br /&gt;
&lt;br /&gt;
We will flip a fair coin, until a tail appears for the first time.&lt;br /&gt;
&lt;ul&gt;&lt;li&gt;If the tail appears in the first throw, you win&amp;nbsp;$2^1=2$ dollars.&lt;/li&gt;
&lt;li&gt;If the tail appears in the second throw,&amp;nbsp;you win&amp;nbsp;$2^2=4$&amp;nbsp;dollars.&amp;nbsp;&lt;/li&gt;
&lt;li&gt;If&amp;nbsp;the tail&amp;nbsp;appears in the third throw,&amp;nbsp;you win&amp;nbsp;$2^3=8$&amp;nbsp;dollars.&lt;/li&gt;
&lt;li&gt;...&lt;/li&gt;
&lt;li&gt;If&amp;nbsp;the tail&amp;nbsp;appears in the $n$-throw,&amp;nbsp;you win&amp;nbsp;$2^n$&amp;nbsp;dollars.&lt;/li&gt;
&lt;/ul&gt;&lt;div&gt;What is the amount of money that someone should risk to enter this game? (&lt;i&gt;This question works best when given to a person that claims to never play a lottery, roulette, or any gambling game, because the expected return is lower than the bet.&lt;/i&gt;)&lt;br /&gt;
&lt;/div&gt;&lt;div&gt;&lt;br /&gt;
&lt;/div&gt;&lt;div&gt;Computing the expected return of this game, we have:&lt;br /&gt;
&lt;/div&gt;&lt;div&gt;&lt;br /&gt;
&lt;/div&gt;&lt;div style="text-align: center;"&gt;$E=\frac{1}{2}\cdot 2+\frac{1}{4}\cdot 4 + \frac{1}{8}\cdot 8 + \cdots = 1+1+1+&amp;nbsp;\cdots&amp;nbsp;=\infty$&lt;br /&gt;
&lt;/div&gt;&lt;div style="text-align: center;"&gt;&lt;div style="text-align: auto;"&gt;&lt;br /&gt;
&lt;/div&gt;&lt;/div&gt;&lt;div&gt;In other words, the expected utility is infinity, and a rational player should be willing to gamble an arbitrarily large amount of money to enter this game.&lt;br /&gt;
&lt;br /&gt;
Can you find anyone willing to bet $1,000 to play this game? Or $10,000? Or even $100? Yes, I did not think so.&lt;br /&gt;
&lt;/div&gt;&lt;div&gt;&lt;br /&gt;
&lt;/div&gt;&lt;div&gt;This paradox is called the&amp;nbsp;&lt;a href="http://en.wikipedia.org/wiki/St._Petersburg_paradox"&gt;St. Petersburg Paradox&lt;/a&gt;, posed in 1713 by Nicholas Bernoulli and solved in 1738 by&amp;nbsp;Daniel Bernoulli. Since then, a number of potential explanations appeared.&lt;br /&gt;
&lt;br /&gt;
The most common approach is to use expected utility theory. In this case, we introduce a utility function $U(x)$, which describes the "satisfaction" that someone would get by having $x$ amount of money.&lt;br /&gt;
&lt;/div&gt;&lt;div&gt;&lt;br /&gt;
&lt;/div&gt;&lt;div&gt;&lt;b&gt;Utility of Money:&lt;/b&gt; The basic idea is that people do not bet based on the absolute amounts of the return but rather based on the &lt;i&gt;utility &lt;/i&gt;of the award. The value of an additional $100 when I have $100 in the bank is much higher compared to the case when I have $1,000,000 in the bank. This means that the "utility of money" function is a&amp;nbsp;concave&amp;nbsp;function of the available funds.&lt;br /&gt;
&lt;/div&gt;&lt;div&gt;&lt;br /&gt;
&lt;/div&gt;&lt;div&gt;Just for demonstration, below you can see such a concave utility-of-money function that we have computed as part of a research project:&lt;br /&gt;
&lt;/div&gt;&lt;div class="separator" style="clear: both; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; text-align: center;"&gt;&lt;a href="http://1.bp.blogspot.com/_skxW9kDLt_c/SvTn9Lj2ucI/AAAAAAAABM0/82cJk98RPjw/s1600-h/UoM.PNG" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" src="http://1.bp.blogspot.com/_skxW9kDLt_c/SvTn9Lj2ucI/AAAAAAAABM0/82cJk98RPjw/s400/UoM.PNG" /&gt;&lt;/a&gt;&lt;br /&gt;
&lt;/div&gt;&lt;div class="separator" style="clear: both; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; text-align: center;"&gt;&lt;br /&gt;
&lt;/div&gt;&lt;div&gt;This concavity&amp;nbsp;also partially explains the "risk aversion" that most people have: they prefer certainty over uncertainty. This means that they will reject even a reasonable bet with positive expected return. Why? Notice that the utility gained by winning is smaller than the decrease in utility that results from losing the bet. The higher the concavity, the higher the risk aversion.&lt;br /&gt;
&lt;/div&gt;&lt;div&gt;&lt;br /&gt;
&lt;/div&gt;&lt;div&gt;If you want to read more about utility of money and its applications to portfolio management, insurance, and analysis of other cases, &lt;a href="http://www.uv.es/bernardo/Lindley1975.pdf"&gt;take a look at this book chapter&lt;/a&gt;.&lt;br /&gt;
&lt;/div&gt;&lt;div&gt;&lt;br /&gt;
&lt;/div&gt;&lt;div&gt;So, next time that someone claims never to engage in any bet with a negative expected return, give the setting of the Bernoulli paradox with the positive expected return and observe the reactions...&lt;br /&gt;
&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/7118563403027467631-500392882300102496?l=behind-the-enemy-lines.blogspot.com'/&gt;&lt;/div&gt;&lt;div class="feedflare"&gt;
&lt;a href="http://feeds.feedburner.com/~ff/AComputerScientistInABusinessSchool?a=x7kUEtGcdK8:-QxofV2IDas:yIl2AUoC8zA"&gt;&lt;img src="http://feeds.feedburner.com/~ff/AComputerScientistInABusinessSchool?d=yIl2AUoC8zA" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/AComputerScientistInABusinessSchool?a=x7kUEtGcdK8:-QxofV2IDas:4cEx4HpKnUU"&gt;&lt;img src="http://feeds.feedburner.com/~ff/AComputerScientistInABusinessSchool?i=x7kUEtGcdK8:-QxofV2IDas:4cEx4HpKnUU" border="0"&gt;&lt;/img&gt;&lt;/a&gt;
&lt;/div&gt;&lt;img src="http://feeds.feedburner.com/~r/AComputerScientistInABusinessSchool/~4/x7kUEtGcdK8" height="1" width="1"/&gt;</content><link rel="replies" type="application/atom+xml" href="http://behind-the-enemy-lines.blogspot.com/feeds/500392882300102496/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="http://behind-the-enemy-lines.blogspot.com/2009/11/utility-of-money-and-st-petersburg.html#comment-form" title="7 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/7118563403027467631/posts/default/500392882300102496?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/7118563403027467631/posts/default/500392882300102496?v=2" /><link rel="alternate" type="text/html" href="http://feedproxy.google.com/~r/AComputerScientistInABusinessSchool/~3/x7kUEtGcdK8/utility-of-money-and-st-petersburg.html" title="Utility of Money and the St. Petersburg Paradox" /><author><name>Panos Ipeirotis</name><uri>http://www.blogger.com/profile/15283752183704062501</uri><email>noreply@blogger.com</email><gd:extendedProperty name="OpenSocialUserId" value="12579610116507126669" /></author><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="http://1.bp.blogspot.com/_skxW9kDLt_c/SvTn9Lj2ucI/AAAAAAAABM0/82cJk98RPjw/s72-c/UoM.PNG" height="72" width="72" /><thr:total xmlns:thr="http://purl.org/syndication/thread/1.0">7</thr:total><feedburner:origLink>http://behind-the-enemy-lines.blogspot.com/2009/11/utility-of-money-and-st-petersburg.html</feedburner:origLink></entry><entry gd:etag="W/&quot;DEUGSXY9eSp7ImA9WxNVFkk.&quot;"><id>tag:blogger.com,1999:blog-7118563403027467631.post-3784543192220262604</id><published>2009-10-25T23:34:00.001-04:00</published><updated>2009-10-27T08:23:48.861-04:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2009-10-27T08:23:48.861-04:00</app:edited><category scheme="http://www.blogger.com/atom/ns#" term="publishers" /><category scheme="http://www.blogger.com/atom/ns#" term="academia" /><category scheme="http://www.blogger.com/atom/ns#" term="open access" /><category scheme="http://www.blogger.com/atom/ns#" term="acm" /><title>What is the (Real) Cost of Open Access?</title><content type="html">After the &lt;a href="http://geomblog.blogspot.com/2008/01/cacm-is-dead-long-live-cacm.html"&gt;transformation of Communications of ACM&lt;/a&gt;, I find myself increasingly interested in the articles that are published in CACM.&amp;nbsp;As expected, one of the common ways to demonstrate my interest is by sharing the URL for the paper, on Twitter, on Facebook, on the blog, or by sharing the link with friends and colleagues.&amp;nbsp;Unfortunately, CACM has a closed-access policy, effectively preventing anyone without a ACM membership or without a university account from actually reading the papers. Same thing for papers published in conferences and journals, but there I can typically find the paper in the home page of the author. For CACM, this is often not the case.&lt;br /&gt;
&lt;br /&gt;
Needless to say, I hate closed access policies. While I can understand the&amp;nbsp;shortsightedness&amp;nbsp;of for-profit publishers, I fail to see why ACM has not adopted at least a "semi" Open Access model, making, say, the current issue of Communications of ACM available to the public. Or by giving public access to papers published 10 or 20 years back in the different journals and conferences.&lt;br /&gt;
&lt;br /&gt;
The stated goal of the association is to promote the field. By restricting access, ACM simply does not work towards this goal!&lt;br /&gt;
&lt;br /&gt;
The main argument that I hear is that &lt;a href="http://cacm.acm.org/opinion/articles/32075-open-closed-or-clopen-access/fulltext"&gt;publishing has some costs&lt;/a&gt;. But I am really trying to understand what are these costs. What is the magnitude of these costs? And who is being paid? Almost like the health-care debate, we are told that something is expensive but we have no idea of who ends up getting the money.&lt;br /&gt;
&lt;br /&gt;
Let's examine the potential cost factors:&lt;br /&gt;
&lt;br /&gt;
&lt;b&gt;Printing&lt;/b&gt;: I understand that printing &lt;i&gt;on paper&lt;/i&gt; has costs. But covering the the cost of printing seems easy: Amortize it across the print subscribers. (Or even abolish print versions.)&lt;br /&gt;
&lt;br /&gt;
&lt;b&gt;Servers for distribution&lt;/b&gt;: What is the cost of electronically distributing papers? The cost of running a server, should not be a concern. At the worst case, NSF should provide funds for that. I find it hard to think that NSF would turn down a request for funding a server that provides open access to scientific journals!&lt;br /&gt;
&lt;br /&gt;
&lt;b&gt;Submission handling&lt;/b&gt;: The cost of the submission website? I doubt that it is above $5K per year, per journal. Ask for a nominal submission fee (say $50 per paper) to cover this. The cost for the copy-editors? We can do &lt;a href="http://behind-the-enemy-lines.blogspot.com/2007/11/what-is-wrong-with-acm-typesetting.html"&gt;much better&lt;/a&gt; &lt;a href="http://www.stat.columbia.edu/%7Ecook/movabletype/archives/2005/01/the_pinchhitter.html"&gt;without them&lt;/a&gt;, thank you. (Seriously, why do we still have copyeditors?) &lt;br /&gt;
&lt;br /&gt;
&lt;b&gt;Admin cost&lt;/b&gt;: The only cost that I can think of is the cost of the admin staff. But how much is it? I honestly have no idea! Is it so high that the ACM member subscriptions cannot cover the cost?&amp;nbsp;I am trying to find the budget of ACM but I cannot find anything public.&lt;br /&gt;
&lt;br /&gt;
Are there other hidden costs?&lt;br /&gt;
&lt;br /&gt;
If anyone has pointers or extra information, please let me know. I am really trying to understand the real costs of high-quality electronic publishing.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/7118563403027467631-3784543192220262604?l=behind-the-enemy-lines.blogspot.com'/&gt;&lt;/div&gt;&lt;div class="feedflare"&gt;
&lt;a href="http://feeds.feedburner.com/~ff/AComputerScientistInABusinessSchool?a=ck9PdklMEuU:q6beluQXo10:yIl2AUoC8zA"&gt;&lt;img src="http://feeds.feedburner.com/~ff/AComputerScientistInABusinessSchool?d=yIl2AUoC8zA" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/AComputerScientistInABusinessSchool?a=ck9PdklMEuU:q6beluQXo10:4cEx4HpKnUU"&gt;&lt;img src="http://feeds.feedburner.com/~ff/AComputerScientistInABusinessSchool?i=ck9PdklMEuU:q6beluQXo10:4cEx4HpKnUU" border="0"&gt;&lt;/img&gt;&lt;/a&gt;
&lt;/div&gt;&lt;img src="http://feeds.feedburner.com/~r/AComputerScientistInABusinessSchool/~4/ck9PdklMEuU" height="1" width="1"/&gt;</content><link rel="replies" type="application/atom+xml" href="http://behind-the-enemy-lines.blogspot.com/feeds/3784543192220262604/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="http://behind-the-enemy-lines.blogspot.com/2009/10/what-is-real-cost-of-open-access.html#comment-form" title="14 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/7118563403027467631/posts/default/3784543192220262604?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/7118563403027467631/posts/default/3784543192220262604?v=2" /><link rel="alternate" type="text/html" href="http://feedproxy.google.com/~r/AComputerScientistInABusinessSchool/~3/ck9PdklMEuU/what-is-real-cost-of-open-access.html" title="What is the (Real) Cost of Open Access?" /><author><name>Panos Ipeirotis</name><uri>http://www.blogger.com/profile/15283752183704062501</uri><email>noreply@blogger.com</email><gd:extendedProperty name="OpenSocialUserId" value="12579610116507126669" /></author><thr:total xmlns:thr="http://purl.org/syndication/thread/1.0">14</thr:total><feedburner:origLink>http://behind-the-enemy-lines.blogspot.com/2009/10/what-is-real-cost-of-open-access.html</feedburner:origLink></entry><entry gd:etag="W/&quot;D0cHQXcyeyp7ImA9WxNWE0o.&quot;"><id>tag:blogger.com,1999:blog-7118563403027467631.post-2610528791816907032</id><published>2009-10-11T22:20:00.001-04:00</published><updated>2009-10-12T15:17:10.993-04:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2009-10-12T15:17:10.993-04:00</app:edited><title>When Noise is Your Friend: Smoothed Analysis</title><content type="html">Have you ever encountered the phrase "the algorithm has exponential running time, in the worst-case scenario, but in practice we observed it to be pretty efficient"? It is the phrase that divides&amp;nbsp;theoreticians&amp;nbsp;and practitioners. Many theoretical computer scientists focus on the analysis of the worst case complexity, generating often results that contradict practice.&lt;br /&gt;
&lt;br /&gt;
For example, the simplex algorithm for linear programming is well known to be pretty efficient in practice. In theory, the worst-case complexity of simplex is exponential, classifying the simplex algorithm as a "non-efficient" algorithm. However,&amp;nbsp;simplex has exponential running time only for very special cases. Most practitioners would even argue that you will never encounter such strange cases in practice. Only an adversary could potentially design such inputs.&lt;br /&gt;
&lt;br /&gt;
Similarly, the Traveling Salesman Problem is a hallmark example of an NP-complete problem, i.e., unlikely to have an efficient algorithm anytime soon. However, there are many implementations of TSP that can provide almost optimal solutions for TSP, for pretty big inputs.&lt;br /&gt;
&lt;br /&gt;
K-means is another such algorithm. It has a horrible worst-case scenario but ask the millions of people that use it for clustering. One of the most efficient clustering algorithms, despite its wost-case exponential complexity.&lt;br /&gt;
&lt;br /&gt;
So, how can we reconcile theory and practice?&lt;br /&gt;
&lt;br /&gt;
A very nice approach towards this reconciliation is the case of &lt;i&gt;smoothed analysis&lt;/i&gt;. I first learned about this approach for analyzing algorithms by attending the (fascinating) job talk of &lt;a href="http://math.mit.edu/~kelner/"&gt;Jon Kelner&lt;/a&gt;. Jon showed that&lt;b&gt; if you pertubate a little bit the input before feeding it to the simplex algorithm, then it is almost impossible for the pertubed input to generate an exponential running time&lt;/b&gt;.&amp;nbsp;In other words, by adding a little bit of noise in the data, there is the guarantee that we avoid the "tricky" parts of the input space.&lt;br /&gt;
&lt;br /&gt;
What is the beauty of this approach? It explains why in many cases "inefficient" algorithms work well in practice: Most real data contain noise, and this noise can actually be beneficial!&amp;nbsp;The other big lesson is that sometimes an algorithm ends up having a horrible worst-case performance just due to a small number of potential inputs, that are almost adversarial. Adding noise, may take care of these strange cases.&lt;br /&gt;
&lt;br /&gt;
The last issue of Communications of ACM, has a &lt;a href="http://doi.acm.org/10.1145/1562764.1562785"&gt;great review article by Spielman and Teng on Smoothed Analysis&lt;/a&gt;. Explains the difference between worst-case, average-case, and smoothed analysis, and points to a wide variety of problems that have been analyzed using this technique. Highly recommended!&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/7118563403027467631-2610528791816907032?l=behind-the-enemy-lines.blogspot.com'/&gt;&lt;/div&gt;&lt;div class="feedflare"&gt;
&lt;a href="http://feeds.feedburner.com/~ff/AComputerScientistInABusinessSchool?a=A1DBUYO2vUw:d5bDaE1Gam4:yIl2AUoC8zA"&gt;&lt;img src="http://feeds.feedburner.com/~ff/AComputerScientistInABusinessSchool?d=yIl2AUoC8zA" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/AComputerScientistInABusinessSchool?a=A1DBUYO2vUw:d5bDaE1Gam4:4cEx4HpKnUU"&gt;&lt;img src="http://feeds.feedburner.com/~ff/AComputerScientistInABusinessSchool?i=A1DBUYO2vUw:d5bDaE1Gam4:4cEx4HpKnUU" border="0"&gt;&lt;/img&gt;&lt;/a&gt;
&lt;/div&gt;&lt;img src="http://feeds.feedburner.com/~r/AComputerScientistInABusinessSchool/~4/A1DBUYO2vUw" height="1" width="1"/&gt;</content><link rel="replies" type="application/atom+xml" href="http://behind-the-enemy-lines.blogspot.com/feeds/2610528791816907032/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="http://behind-the-enemy-lines.blogspot.com/2009/10/when-noise-is-your-friend-smoothed.html#comment-form" title="5 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/7118563403027467631/posts/default/2610528791816907032?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/7118563403027467631/posts/default/2610528791816907032?v=2" /><link rel="alternate" type="text/html" href="http://feedproxy.google.com/~r/AComputerScientistInABusinessSchool/~3/A1DBUYO2vUw/when-noise-is-your-friend-smoothed.html" title="When Noise is Your Friend: Smoothed Analysis" /><author><name>Panos Ipeirotis</name><uri>http://www.blogger.com/profile/15283752183704062501</uri><email>noreply@blogger.com</email><gd:extendedProperty name="OpenSocialUserId" value="12579610116507126669" /></author><thr:total xmlns:thr="http://purl.org/syndication/thread/1.0">5</thr:total><feedburner:origLink>http://behind-the-enemy-lines.blogspot.com/2009/10/when-noise-is-your-friend-smoothed.html</feedburner:origLink></entry><entry gd:etag="W/&quot;AkAAQHwzeCp7ImA9WxNXEUo.&quot;"><id>tag:blogger.com,1999:blog-7118563403027467631.post-2083958669420029132</id><published>2009-09-28T18:58:00.001-04:00</published><updated>2009-09-28T18:59:01.280-04:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2009-09-28T18:59:01.280-04:00</app:edited><category scheme="http://www.blogger.com/atom/ns#" term="academia" /><category scheme="http://www.blogger.com/atom/ns#" term="research" /><category scheme="http://www.blogger.com/atom/ns#" term="prediction markets" /><title>Rationality, P=NP, Prediction Markets, and a Paradox</title><content type="html">I got the idea for this post after reading the post of Dick Lipton&lt;a href="http://rjlipton.wordpress.com/2009/09/11/would-you-bet-your-life/"&gt; on betting on the P=NP problem&lt;/a&gt;. The discussion in the comments was extensive, mainly touching the issues of risk aversion, the inability of humans to estimate properly small probabilities, and so on. (&lt;a href="http://bpp.wharton.upenn.edu/jwolfers/Papers/Favorite_Longshot_Bias.pdf"&gt;Wolfers and Snowberg argue that it is due to the inability of human to understand very small probabilities.&lt;/a&gt;) The &lt;a href="http://www.overcomingbias.com/2009/09/words-vs-bets.html"&gt;discussion continued in the Overcoming Bias blog&lt;/a&gt;&amp;nbsp;and there&amp;nbsp;&lt;a href="http://www.overcomingbias.com/2009/09/words-vs-bets.html#comment-432693"&gt;one of the comments&lt;/a&gt;, being tongue-in-cheek, caught my eye:&lt;br /&gt;
&lt;br /&gt;
&lt;blockquote&gt;probabilities must agree with logic on certainly-true and certainly-false statements, which means that the probability of logical truths has to be 1, and logical falsehoods have to be 0.&lt;br /&gt;
&lt;/blockquote&gt;&lt;br /&gt;
So,if P=NP is a decidable problem, it is either true or false. So, a &lt;i&gt;fully rational&lt;/i&gt; agent, participating in the market, should &lt;i&gt;know&lt;/i&gt; whether P=NP. It is not a matter of probabilities! All the information to make the decision is available. So, if the market has one or more rational players, the market should converge to a price of 0 or 1 immediately, depending on the state of the problem. Right?&lt;br /&gt;
&lt;br /&gt;
So, which of the following is true?&lt;br /&gt;
&lt;br /&gt;
&lt;ul&gt;&lt;li&gt;There are no rational agents. So, all the analysis of prediction markets that assume rationality of traders is incomplete.&lt;/li&gt;
&lt;li&gt;There are rational agents. The market does not converge to 0 or 1 because the P=?NP problem is undecidable.&lt;/li&gt;
&lt;li&gt;There are rational agents but the return from the risk-free rate until reaching the time to settlement exceeds the return from the market. So, the market gives information on how long it will take for the problem to be officially solved.&lt;/li&gt;
&lt;li&gt;&lt;a href="http://mat.tepper.cmu.edu/blog/?p=748"&gt;If your laptop cannot find the solution, neither can the market.&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;&lt;div&gt;OK, back to more serious work.&lt;br /&gt;
&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/7118563403027467631-2083958669420029132?l=behind-the-enemy-lines.blogspot.com'/&gt;&lt;/div&gt;&lt;div class="feedflare"&gt;
&lt;a href="http://feeds.feedburner.com/~ff/AComputerScientistInABusinessSchool?a=9b8DZbr7fYc:oQUmMaGOcXs:yIl2AUoC8zA"&gt;&lt;img src="http://feeds.feedburner.com/~ff/AComputerScientistInABusinessSchool?d=yIl2AUoC8zA" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/AComputerScientistInABusinessSchool?a=9b8DZbr7fYc:oQUmMaGOcXs:4cEx4HpKnUU"&gt;&lt;img src="http://feeds.feedburner.com/~ff/AComputerScientistInABusinessSchool?i=9b8DZbr7fYc:oQUmMaGOcXs:4cEx4HpKnUU" border="0"&gt;&lt;/img&gt;&lt;/a&gt;
&lt;/div&gt;&lt;img src="http://feeds.feedburner.com/~r/AComputerScientistInABusinessSchool/~4/9b8DZbr7fYc" height="1" width="1"/&gt;</content><link rel="replies" type="application/atom+xml" href="http://behind-the-enemy-lines.blogspot.com/feeds/2083958669420029132/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="http://behind-the-enemy-lines.blogspot.com/2009/09/rationality-pnp-prediction-markets-and.html#comment-form" title="2 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/7118563403027467631/posts/default/2083958669420029132?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/7118563403027467631/posts/default/2083958669420029132?v=2" /><link rel="alternate" type="text/html" href="http://feedproxy.google.com/~r/AComputerScientistInABusinessSchool/~3/9b8DZbr7fYc/rationality-pnp-prediction-markets-and.html" title="Rationality, P=NP, Prediction Markets, and a Paradox" /><author><name>Panos Ipeirotis</name><uri>http://www.blogger.com/profile/15283752183704062501</uri><email>noreply@blogger.com</email><gd:extendedProperty name="OpenSocialUserId" value="12579610116507126669" /></author><thr:total xmlns:thr="http://purl.org/syndication/thread/1.0">2</thr:total><feedburner:origLink>http://behind-the-enemy-lines.blogspot.com/2009/09/rationality-pnp-prediction-markets-and.html</feedburner:origLink></entry><entry gd:etag="W/&quot;DE4NQHY4eSp7ImA9WxNRGUg.&quot;"><id>tag:blogger.com,1999:blog-7118563403027467631.post-3178414262900130733</id><published>2009-09-14T15:20:00.003-04:00</published><updated>2009-09-14T15:36:31.831-04:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2009-09-14T15:36:31.831-04:00</app:edited><category scheme="http://www.blogger.com/atom/ns#" term="academia" /><category scheme="http://www.blogger.com/atom/ns#" term="research" /><title>Citation Tracker: Monitoring Citations to your Publications</title><content type="html">One of the common pastimes of academics is checking services such as Google Scholar to see the number of papers that cite our work. Quite often the statistics from Google Scholar, or from other services such as Web of Science, are used to create a citation report that is used for promotion and tenure purposes.&lt;br /&gt;
&lt;div&gt;&lt;br /&gt;
&lt;/div&gt;&lt;div&gt;While Google Scholar is extremely valuable for finding papers that cite a particular piece of work, it has some shortcomings, especially when creating a citation report for promotion. First, Google Scholar&amp;nbsp;does not differentiate between peer-reviewed (journal, conference, or workshop papers), and other publications (such as tech reports, or term papers); so, when preparing a citation report, I have to go over the list of papers, keeping the "legitimate" citations and removing the citations that are not admissible. Second, Google Scholar is noisy sometimes, and lists twice the same paper, or splits citations for the same paper into two different entries; some other times it does not include papers that are possible to find through a web search.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;
&lt;/div&gt;&lt;div&gt;Another feature that I would really like to see is the ability to find the "new" citations for a given paper, creating the appropriate alerts. A simple RSS feed would work wonders, but it is not there.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;
&lt;/div&gt;&lt;div&gt;Of course, Google Scholar also does not monitor the web to find other types of documents that may mention a particular paper. PhD seminars, or even blog posts, are things that I would like to keep track of when monitoring who cites my own work. Especially for such volatile pages, I typically want to keep a copy so that I can retrieve them a few years later, when compiling my promotion packet.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;
&lt;/div&gt;&lt;div&gt;For this reason, over the summer, I created a tool that can augment Google Scholar and monitor Google Scholar (and other services like Libra, CiteSeerX, SSRN), and also monitor the Web (Google, Bing, Ask) for mentions of the paper.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;
&lt;/div&gt;&lt;div&gt;You can access a pre-alpha version at&amp;nbsp;&lt;a href="http://www.citation-tracker.com/"&gt;http://www.citation-tracker.com&lt;/a&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;
&lt;/div&gt;&lt;div&gt;Some of the features:&lt;/div&gt;&lt;div&gt;&lt;ul&gt;&lt;li&gt;Import publications from Google Scholar, DBLP, BibTeX, and manually.&lt;/li&gt;
&lt;li&gt;Review the citations for each paper, and decide which ones to keep, which to discard, and which ones to examine later.&lt;/li&gt;
&lt;li&gt;Monitor citation services (Google Scholar, Libra, CiteSeerX, SSRN) and see notifications when new citations to your papers appear.&lt;/li&gt;
&lt;li&gt;Generate automatically a citation report, listing the papers that cite your work.&lt;/li&gt;
&lt;/ul&gt;&lt;div&gt;I have been using the service over the last few weeks and it seems reasonably stable. I import my papers using Google Scholar, "accept" the existing citations, and then wait to see about the new citations that pop up every now and then. I find it pretty useful for finding new papers that cite my work.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;
&lt;/div&gt;&lt;div&gt;Over the last few days I even started importing papers from other researchers that I consider relevant to my work, and for which I want to see what new papers cite them.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;
&lt;/div&gt;&lt;div&gt;Feel free to login and play with the system. Needless to say, it is an early release so I expect to see bugs here and there. If you see any bug, or if you would like to see a new feature, please add a note using the "feedback" tab that is visible on the side of the screen.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;
&lt;/div&gt;&lt;div&gt;Enjoy!&lt;/div&gt;&lt;div&gt;&lt;/div&gt;&lt;div&gt;&lt;/div&gt;&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/7118563403027467631-3178414262900130733?l=behind-the-enemy-lines.blogspot.com'/&gt;&lt;/div&gt;&lt;div class="feedflare"&gt;
&lt;a href="http://feeds.feedburner.com/~ff/AComputerScientistInABusinessSchool?a=v5mbg3OgFNQ:tV9-hvsZU58:yIl2AUoC8zA"&gt;&lt;img src="http://feeds.feedburner.com/~ff/AComputerScientistInABusinessSchool?d=yIl2AUoC8zA" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/AComputerScientistInABusinessSchool?a=v5mbg3OgFNQ:tV9-hvsZU58:4cEx4HpKnUU"&gt;&lt;img src="http://feeds.feedburner.com/~ff/AComputerScientistInABusinessSchool?i=v5mbg3OgFNQ:tV9-hvsZU58:4cEx4HpKnUU" border="0"&gt;&lt;/img&gt;&lt;/a&gt;
&lt;/div&gt;&lt;img src="http://feeds.feedburner.com/~r/AComputerScientistInABusinessSchool/~4/v5mbg3OgFNQ" height="1" width="1"/&gt;</content><link rel="replies" type="application/atom+xml" href="http://behind-the-enemy-lines.blogspot.com/feeds/3178414262900130733/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="http://behind-the-enemy-lines.blogspot.com/2009/09/citation-tracker-monitoring-citations.html#comment-form" title="9 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/7118563403027467631/posts/default/3178414262900130733?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/7118563403027467631/posts/default/3178414262900130733?v=2" /><link rel="alternate" type="text/html" href="http://feedproxy.google.com/~r/AComputerScientistInABusinessSchool/~3/v5mbg3OgFNQ/citation-tracker-monitoring-citations.html" title="Citation Tracker: Monitoring Citations to your Publications" /><author><name>Panos Ipeirotis</name><uri>http://www.blogger.com/profile/15283752183704062501</uri><email>noreply@blogger.com</email><gd:extendedProperty name="OpenSocialUserId" value="12579610116507126669" /></author><thr:total xmlns:thr="http://purl.org/syndication/thread/1.0">9</thr:total><feedburner:origLink>http://behind-the-enemy-lines.blogspot.com/2009/09/citation-tracker-monitoring-citations.html</feedburner:origLink></entry><entry gd:etag="W/&quot;CE8AQH04eCp7ImA9WxNRE0U.&quot;"><id>tag:blogger.com,1999:blog-7118563403027467631.post-374484658547264049</id><published>2009-09-06T12:25:00.021-04:00</published><updated>2009-09-08T00:07:21.330-04:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2009-09-08T00:07:21.330-04:00</app:edited><category scheme="http://www.blogger.com/atom/ns#" term="economics" /><category scheme="http://www.blogger.com/atom/ns#" term="research" /><category scheme="http://www.blogger.com/atom/ns#" term="data mining" /><category scheme="http://www.blogger.com/atom/ns#" term="structural models." /><category scheme="http://www.blogger.com/atom/ns#" term="reduced models" /><category scheme="http://www.blogger.com/atom/ns#" term="computer science" /><category scheme="http://www.blogger.com/atom/ns#" term="machine learning" /><title>The different attitudes of computer scientists and economists</title><content type="html">&lt;div&gt;I was reading Noam Nisan's blog post about the &lt;a href="http://agtb.wordpress.com/2009/09/06/cs-and-economics-different-attitudes/"&gt;different attitudes of computer scientists and economists&lt;/a&gt;. Noam hypothesizes that economists emphasize research on “what is” while computer scientists emphasize on “what can be”, and offers the view of an algorithmic game theorist.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;I have my own interpretation on this topic, mainly from the data mining point of view.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;b&gt;Economists are interested in suggesting policies&lt;/b&gt; (i.e., suggest to people, "what to do"). Therefore, it is important to built models that assign &lt;b&gt;causality&lt;/b&gt;. Computer scientists are rarely interested in the issue of causality. Computer scientists control the system (the computer) and algorithms can be directed to perform one way or another. In contrast, economists cannot really control the system that they study. They do not even know how the system behaves.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;When a computer scientist proposes an algorithm, the main focus is to examine the performance of the algorithm under different settings of incoming data. How the (computer) system will behave is controlled. When an economist suggests a policy, it is highly unclear how the underlying (rational?) agents will behave. Therefore, it is important to figure out what exactly "causes" the behavior of the agents, and figure out what policies can change this behavior.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;One area that gets closer to economics in this respect is the area of data mining and machine learning. Get the data, and learn how the underlying system behaves. For example, get data about credit card transactions and learn which of them are fraudulent. However, there is a significant difference in focus: Computer scientists are mainly focused on &lt;b&gt;predictive modelling&lt;/b&gt;. As long as the system can "predict" the outcome on unseen data, things are ok. A black box with perfect predictive performance is great. Explanatory models are rarely the focus. In the best case, someone may want to understand the internals of the predictive model but even if the model can be understood (e.g., read the rules or the decision tree), these rules are rarely causal in nature.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Let me give you an example: Suppose that you are trying to predict price per square feet for houses. As one independent variable (feature) you add average size of the house in the area. What the predictive model will find? That places that have smaller houses also have higher price per square foot. Unexpected? Not really. Houses in urban areas are typically smaller and more expensive compared to the their suburban and rural counterparts. &lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;For a predictive model, this information is absolutely sufficient; the average house size is a valuable feature for predictive purposes. Think however what would happen is someone was devising policy based on this feature. A house builder would try to build smaller houses in rural areas, hoping that the resulting prices would be higher. Or a politician in Manhattan would encourage construction of bigger apartments, since the experiments have shown that if average house size is increased, the prices will drop. Absurd? Yes. &lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Even funnier things can come up if someone uses country-wide data to predict demand for apartments using apartment prices. The result will show that increasing prices actually increases demand, even though we would expect the opposite. (Just the effect of prices increasing in places where there is higher demand.)&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Predictive modeling can survive (or even thrive) by exploiting such strange correlations. A causal model that captures correlations and presents them as causes can wreak havoc.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;So, an economist will try to build a model that will generate causal relationships. In the case above, a model based on supply and demand is more likely to result in a model that captures the true "causes" of increased apartment prices. A house builder can see these effects and make a more informed decision on how to build. Similarly, for a politician that is trying to encourage building more affordable housing.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Often, causal models are called "structural" in economics [not sure where the term comes from; I have seen a few different interpretations]. They typically start by modelling the micro-behavior of agents, and then proceed to explain the behavior of a large system comprising of the interactions of such agents. A benefit of such models is that assumptions are easier to check, test, and challenge. &lt;a href="http://mat.tepper.cmu.edu/blog/?p=679"&gt;In contrasts to "statistical" models, such models tend to generate relationships that are easier to consider "causal"&lt;/a&gt;.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;An advantage of causal models over predictive models is that causal models are valid even if the underlying data distribution changes. Causal models are supposed to be robust, as long as the behavior of the agents remains the same. A predictive model works under the assumption that the "unseen" data follow the same distribution as the "training" data. Change the distribution of the unseen data, and any performance guarantee for the predictive models disappears.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;b&gt;Update 1&lt;/b&gt;: This is not an attempt to downgrade the importance of predictive models. Most of the results presented by Google after a query are generated using predictive modeling algorithms. You get recommendations from Amazon and Netflix as the outcome of predictive algorithms. Your inbox remains spam-free due to the existence of the spam filter, again a system built using predictive modeling techniques. It is too hard, if not impossible, to build "causal" models for these applications.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;b&gt;Update 2:&lt;/b&gt; An interesting example of a company &lt;a href="http://mat.tepper.cmu.edu/blog/?p=536"&gt;deriving policy based on their predictive model is American Express&lt;/a&gt;. They realized that the feature "customer buys in a 99c store" is correlated with higher delinquency rates. So, AmEx decided to decrease the credit limit for such customers. Of course, the result will be that potentially affected customers will stop visiting such stores, decreasing the value of this policy for AmEx. Furthermore, this action may cause even more economic stress to these customers that are now "forced" to buy from more expensive stores, and this may result in a much higher default rate for AmEx. This "unexpected" outcome is the effect of devising policy based on non-causal variables. &lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;If AmEx had a variable "customer in economic distress", which arguably has a causal effect on default rates, then it would be possible to perform this action, without the ability of customers to game the system. However, since AmEx relied on a variable "customer buys in a 99c store" that is the &lt;i&gt;outcome &lt;/i&gt;of the variable "customer in economic distress" it &lt;i&gt;is &lt;/i&gt;possible for consumers to simply change their behavior in the face of economic distress.&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/7118563403027467631-374484658547264049?l=behind-the-enemy-lines.blogspot.com'/&gt;&lt;/div&gt;&lt;div class="feedflare"&gt;
&lt;a href="http://feeds.feedburner.com/~ff/AComputerScientistInABusinessSchool?a=uYTKU3Y8gm0:JXGcf_5FCa4:yIl2AUoC8zA"&gt;&lt;img src="http://feeds.feedburner.com/~ff/AComputerScientistInABusinessSchool?d=yIl2AUoC8zA" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/AComputerScientistInABusinessSchool?a=uYTKU3Y8gm0:JXGcf_5FCa4:4cEx4HpKnUU"&gt;&lt;img src="http://feeds.feedburner.com/~ff/AComputerScientistInABusinessSchool?i=uYTKU3Y8gm0:JXGcf_5FCa4:4cEx4HpKnUU" border="0"&gt;&lt;/img&gt;&lt;/a&gt;
&lt;/div&gt;&lt;img src="http://feeds.feedburner.com/~r/AComputerScientistInABusinessSchool/~4/uYTKU3Y8gm0" height="1" width="1"/&gt;</content><link rel="replies" type="application/atom+xml" href="http://behind-the-enemy-lines.blogspot.com/feeds/374484658547264049/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="http://behind-the-enemy-lines.blogspot.com/2009/09/different-attitudes-of-computer.html#comment-form" title="14 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/7118563403027467631/posts/default/374484658547264049?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/7118563403027467631/posts/default/374484658547264049?v=2" /><link rel="alternate" type="text/html" href="http://feedproxy.google.com/~r/AComputerScientistInABusinessSchool/~3/uYTKU3Y8gm0/different-attitudes-of-computer.html" title="The different attitudes of computer scientists and economists" /><author><name>Panos Ipeirotis</name><uri>http://www.blogger.com/profile/15283752183704062501</uri><email>noreply@blogger.com</email><gd:extendedProperty name="OpenSocialUserId" value="12579610116507126669" /></author><thr:total xmlns:thr="http://purl.org/syndication/thread/1.0">14</thr:total><feedburner:origLink>http://behind-the-enemy-lines.blogspot.com/2009/09/different-attitudes-of-computer.html</feedburner:origLink></entry><entry gd:etag="W/&quot;CEEMQXs4cSp7ImA9WxNSFEw.&quot;"><id>tag:blogger.com,1999:blog-7118563403027467631.post-8535764502669653909</id><published>2009-08-27T16:26:00.003-04:00</published><updated>2009-08-27T18:38:00.539-04:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2009-08-27T18:38:00.539-04:00</app:edited><category scheme="http://www.blogger.com/atom/ns#" term="research" /><title>Workshop on Information in Networks (WIN)</title><content type="html">&lt;div&gt;For those of you interested in the study of networked data, I would like to bring your attention to the "&lt;a href="http://winworkshop.net/"&gt;Workshop on Information in Networks (WIN)&lt;/a&gt;", a workshop organized by my colleagues Sinan Aral, Foster Provost, and Arun Sundararajan. It will take place on September 25-26, 2009. From the description:&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;blockquote&gt;The purpose of WIN is to bring together leading researchers studying ‘information in networks’ – its distribution, its diffusion, its value, and its influence on social and economic outcomes – in order to lay the foundation for ongoing relationships and to build a lasting multidisciplinary research community.&lt;/blockquote&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;I should emphasize that the phrase "&lt;i&gt;bring together leading researchers&lt;/i&gt;" is not a the standard template used in many call for papers. The lineup of speakers is truly outstanding! I would be very hard pressed to find &lt;i&gt;any &lt;/i&gt;conference that would have &lt;a href="http://winworkshop.net/speakers.php"&gt;such a lineup of invited speakers&lt;/a&gt;:&lt;/div&gt;&lt;div&gt;&lt;div&gt;&lt;ul&gt;&lt;li&gt;Lada Adamic, University of Michigan &lt;/li&gt;&lt;li&gt;Albert-Laszlo Barabasi, University of Notre Dame, Northeastern University&lt;/li&gt;&lt;li&gt;Ronald Burt, University of Chicago&lt;/li&gt;&lt;li&gt;Damon Centola, MIT&lt;/li&gt;&lt;li&gt;Pedro Domingos , University of Washington&lt;/li&gt;&lt;li&gt;Christos Faloutsos, Carnegie Mellon&lt;/li&gt;&lt;li&gt;James Fowler, University of California, San Diego&lt;/li&gt;&lt;li&gt;Sanjeev Goyal, University of Cambridge&lt;/li&gt;&lt;li&gt;Bernardo Huberman, HP Labs&lt;/li&gt;&lt;li&gt;Matthew Jackson, Stanford University &lt;/li&gt;&lt;li&gt;Michael Kearns, University of Pennsylvania&lt;/li&gt;&lt;li&gt;Jon Kleinberg, Cornell University&lt;/li&gt;&lt;li&gt;Rachel Kranton, Duke University&lt;/li&gt;&lt;li&gt;David Lazer, Harvard University&lt;/li&gt;&lt;li&gt;Jure Leskovec, Stanford &lt;/li&gt;&lt;li&gt;Michael Macy, Cornell University&lt;/li&gt;&lt;li&gt;Alex (Sandy) Pentland, MIT&lt;/li&gt;&lt;li&gt;Duncan Watts, Yahoo! Research&lt;/li&gt;&lt;/ul&gt;&lt;/div&gt;&lt;/div&gt;&lt;div&gt;It is really as good as it gets. If you are interested in networked data and can be in New York on September 25-26, then this is an event that you must attend!&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/7118563403027467631-8535764502669653909?l=behind-the-enemy-lines.blogspot.com'/&gt;&lt;/div&gt;&lt;div class="feedflare"&gt;
&lt;a href="http://feeds.feedburner.com/~ff/AComputerScientistInABusinessSchool?a=SmHgOab5LPI:ydowah-1Jv4:yIl2AUoC8zA"&gt;&lt;img src="http://feeds.feedburner.com/~ff/AComputerScientistInABusinessSchool?d=yIl2AUoC8zA" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/AComputerScientistInABusinessSchool?a=SmHgOab5LPI:ydowah-1Jv4:4cEx4HpKnUU"&gt;&lt;img src="http://feeds.feedburner.com/~ff/AComputerScientistInABusinessSchool?i=SmHgOab5LPI:ydowah-1Jv4:4cEx4HpKnUU" border="0"&gt;&lt;/img&gt;&lt;/a&gt;
&lt;/div&gt;&lt;img src="http://feeds.feedburner.com/~r/AComputerScientistInABusinessSchool/~4/SmHgOab5LPI" height="1" width="1"/&gt;</content><link rel="replies" type="application/atom+xml" href="http://behind-the-enemy-lines.blogspot.com/feeds/8535764502669653909/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="http://behind-the-enemy-lines.blogspot.com/2009/08/workshop-on-information-in-networks-win.html#comment-form" title="7 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/7118563403027467631/posts/default/8535764502669653909?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/7118563403027467631/posts/default/8535764502669653909?v=2" /><link rel="alternate" type="text/html" href="http://feedproxy.google.com/~r/AComputerScientistInABusinessSchool/~3/SmHgOab5LPI/workshop-on-information-in-networks-win.html" title="Workshop on Information in Networks (WIN)" /><author><name>Panos Ipeirotis</name><uri>http://www.blogger.com/profile/15283752183704062501</uri><email>noreply@blogger.com</email><gd:extendedProperty name="OpenSocialUserId" value="12579610116507126669" /></author><thr:total xmlns:thr="http://purl.org/syndication/thread/1.0">7</thr:total><feedburner:origLink>http://behind-the-enemy-lines.blogspot.com/2009/08/workshop-on-information-in-networks-win.html</feedburner:origLink></entry><entry gd:etag="W/&quot;Ak8EQ3s5cSp7ImA9WxNTEEk.&quot;"><id>tag:blogger.com,1999:blog-7118563403027467631.post-7957021653789745072</id><published>2009-08-11T01:16:00.005-04:00</published><updated>2009-08-11T23:46:42.529-04:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2009-08-11T23:46:42.529-04:00</app:edited><category scheme="http://www.blogger.com/atom/ns#" term="mechanical turk" /><category scheme="http://www.blogger.com/atom/ns#" term="academia" /><category scheme="http://www.blogger.com/atom/ns#" term="research" /><title>Get a Consent Form (for IRB) on MTurk using Qualification Tests</title><content type="html">I was browsing through the various qualification tests on Mechanical Turk, checking what requesters ask and how they structure the tests. The &lt;a href="https://www.mturk.com/mturk/requestqualification?qualificationId=M9QZWSYS5WGCYVC6XYKZ"&gt;one test that caught my eye&lt;/a&gt; was designed by Daniel Velleman and David Beaver from the Linguistics department of The University of Texas at Austin.&lt;br /&gt;&lt;br /&gt;Here is the test:&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span"   style="  line-height: 16px; font-family:verdana, arial, sans-serif;font-size:11px;"&gt;&lt;div class="overview-wrapper" style="font-size: 11px; line-height: 14px; font-family: verdana, arial, sans-serif; border-bottom-width: 1px; border-bottom-color: rgb(211, 211, 211); border-bottom-style: solid; "&gt;&lt;p class="overview title" style="font-size: 17px; line-height: 14px; font-family: verdana, arial, sans-serif; padding-left: 5px; padding-right: 5px; font-weight: bold; "&gt;&lt;/p&gt;&lt;/div&gt;&lt;/span&gt;&lt;/div&gt;&lt;blockquote&gt;&lt;div&gt;&lt;span class="Apple-style-span"   style="  line-height: 16px; font-family:verdana, arial, sans-serif;font-size:11px;"&gt;&lt;div class="overview-wrapper" style="font-size: 11px; line-height: 14px; font-family: verdana, arial, sans-serif; border-bottom-width: 1px; border-bottom-color: rgb(211, 211, 211); border-bottom-style: solid; "&gt;&lt;p class="overview title" style="font-size: 17px; line-height: 14px; font-family: verdana, arial, sans-serif; padding-left: 5px; padding-right: 5px; font-weight: bold; "&gt;"Which sentence do you prefer?" eligibility form&lt;/p&gt;&lt;p class="overview text" style="font-size: 11px; line-height: 14px; font-family: verdana, arial, sans-serif; padding-left: 5px; padding-right: 5px; "&gt;This qualification will allow you to participate in our English language research HIT, "Which sentence do you prefer?"&lt;/p&gt;&lt;/div&gt;&lt;div class="question-wrapper" style="font-size: 11px; line-height: 14px; font-family: verdana, arial, sans-serif; border-bottom-width: 1px; border-bottom-color: rgb(211, 211, 211); border-bottom-style: solid; "&gt;&lt;div class="question-content-wrapper" style="font-size: 11px; line-height: 14px; font-family: verdana, arial, sans-serif; "&gt;&lt;p class=" text" style="font-size: 11px; line-height: 14px; font-family: verdana, arial, sans-serif; padding-left: 5px; padding-right: 5px; "&gt;Is English your first language?&lt;/p&gt;&lt;/div&gt;&lt;div name="HITAnswer-wrapper" class="HITAnswer-wrapper" style="font-size: 11px; line-height: 14px; font-family: verdana, arial, sans-serif; "&gt;&lt;p class="question-selection-wrapper" style="font-size: 11px; line-height: 14px; font-family: verdana, arial, sans-serif; padding-left: 5px; padding-right: 5px; "&gt;&lt;/p&gt;&lt;fieldset class="fieldset"  style="border-top-style: none; border-right-style: none; border-bottom-style: none; border-left-style: none; border-width: initial; border- color:initial;"&gt;&lt;p class="radiobutton-wrapper" style="font-size: 11px; line-height: 14px; font-family: verdana, arial, sans-serif; padding-left: 0px; padding-right: 0px; margin-top: 0px; margin-right: 0px; margin-bottom: 0px; margin-left: 0px; padding-top: 0px; padding-bottom: 0px; vertical-align: middle; "&gt;&lt;table cellspacing="4" cellpadding="0" border="0"&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td valign="center"   style=" line-height: 14px;  font-family:verdana, arial, sans-serif;font-size:11px;"&gt;&lt;input type="radio" class="question selection" name="Answer_1" id="Answer_1" value="Selection_eWVz" style="font-size: 11px; line-height: 14px; font-family: verdana, arial, sans-serif; color: black; "&gt;&lt;/td&gt;&lt;td valign="center" style=" line-height: 14px;  font-family:verdana, arial, sans-serif;font-size:11px;"&gt;&lt;span class="answer text"&gt;Yes&lt;/span&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;/p&gt;&lt;p class="radiobutton-wrapper" style="font-size: 11px; line-height: 14px; font-family: verdana, arial, sans-serif; padding-left: 0px; padding-right: 0px; margin-top: 0px; margin-right: 0px; margin-bottom: 0px; margin-left: 0px; padding-top: 0px; padding-bottom: 0px; vertical-align: middle; "&gt;&lt;table cellspacing="4" cellpadding="0" border="0"&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td valign="center" style=" line-height: 14px;  font-family:verdana, arial, sans-serif;font-size:11px;"&gt;&lt;input type="radio" class="question selection" name="Answer_1" id="Answer_1" value="Selection_bm8-" style="font-size: 11px; line-height: 14px; font-family: verdana, arial, sans-serif; color: black; "&gt;&lt;/td&gt;&lt;td valign="center"   style=" line-height: 14px;  font-family:verdana, arial, sans-serif;font-size:11px;"&gt;&lt;span class="answer text"&gt;No&lt;/span&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;/p&gt;&lt;/fieldset&gt;&lt;p style="font-size: 11px; line-height: 14px; font-family: verdana, arial, sans-serif; padding-left: 5px; padding-right: 5px; "&gt;&lt;/p&gt;&lt;/div&gt;&lt;/div&gt;&lt;div class="question-wrapper" style="font-size: 11px; line-height: 14px; font-family: verdana, arial, sans-serif; border-bottom-width: 1px; border-bottom-color: rgb(211, 211, 211); border-bottom-style: solid; "&gt;&lt;div class="question-content-wrapper" style="font-size: 11px; line-height: 14px; font-family: verdana, arial, sans-serif; "&gt;&lt;p class=" text" style="font-size: 11px; line-height: 14px; font-family: verdana, arial, sans-serif; padding-left: 5px; padding-right: 5px; "&gt;Do you (or did you) have at least one parent or caregiver&lt;br /&gt;whose first language was English?&lt;/p&gt;&lt;/div&gt;&lt;div name="HITAnswer-wrapper" class="HITAnswer-wrapper" style="font-size: 11px; line-height: 14px; font-family: verdana, arial, sans-serif; "&gt;&lt;p class="question-selection-wrapper" style="font-size: 11px; line-height: 14px; font-family: verdana, arial, sans-serif; padding-left: 5px; padding-right: 5px; "&gt;&lt;/p&gt;&lt;fieldset class="fieldset" color="initial" style="border-top-style: none; border-right-style: none; border-bottom-style: none; border-left-style: none; border-width: initial; border- "&gt;&lt;p class="radiobutton-wrapper" style="font-size: 11px; line-height: 14px; font-family: verdana, arial, sans-serif; padding-left: 0px; padding-right: 0px; margin-top: 0px; margin-right: 0px; margin-bottom: 0px; margin-left: 0px; padding-top: 0px; padding-bottom: 0px; vertical-align: middle; "&gt;&lt;table cellspacing="4" cellpadding="0" border="0"&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td valign="center" face="verdana, arial, sans-serif" size="11px" style=" line-height: 14px;  "&gt;&lt;input type="radio" class="question selection" name="Answer_2" id="Answer_2" value="Selection_eWVz" style="font-size: 11px; line-height: 14px; font-family: verdana, arial, sans-serif; color: black; "&gt;&lt;/td&gt;&lt;td valign="center" face="verdana, arial, sans-serif" size="11px" style=" line-height: 14px;  "&gt;&lt;span class="answer text"&gt;Yes&lt;/span&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;/p&gt;&lt;p class="radiobutton-wrapper" style="font-size: 11px; line-height: 14px; font-family: verdana, arial, sans-serif; padding-left: 0px; padding-right: 0px; margin-top: 0px; margin-right: 0px; margin-bottom: 0px; margin-left: 0px; padding-top: 0px; padding-bottom: 0px; vertical-align: middle; "&gt;&lt;table cellspacing="4" cellpadding="0" border="0"&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td valign="center" face="verdana, arial, sans-serif" size="11px" style=" line-height: 14px;  "&gt;&lt;input type="radio" class="question selection" name="Answer_2" id="Answer_2" value="Selection_bm8-" style="font-size: 11px; line-height: 14px; font-family: verdana, arial, sans-serif; color: black; "&gt;&lt;/td&gt;&lt;td valign="center" style="font-size: 11px; line-height: 14px; font-family: verdana, arial, sans-serif; "&gt;&lt;span class="answer text"&gt;No&lt;/span&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;/p&gt;&lt;/fieldset&gt;&lt;/div&gt;&lt;/div&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;div&gt;&lt;span class="Apple-style-span"   style="  line-height: 14px; font-family:verdana, arial, sans-serif;font-size:11px;"&gt;&lt;div class="question-content-wrapper" style="font-size: 11px; line-height: 14px; font-family: verdana, arial, sans-serif; "&gt;&lt;p class=" text" style="font-size: 11px; line-height: 14px; font-family: verdana, arial, sans-serif; padding-left: 5px; padding-right: 5px; "&gt;Please read this information&lt;br /&gt;&lt;br /&gt;You are invited to participate in a survey, entitled "Which sentence do you prefer?" The study is being conducted by Daniel Velleman and David Beaver in the Linguistics department of The University of Texas at Austin.&lt;br /&gt;&lt;br /&gt;Calhoun 501&lt;br /&gt;1 University Station B5100&lt;br /&gt;Austin, TX 78712-0198&lt;br /&gt;(512) 471-1701&lt;br /&gt;&lt;br /&gt;The purpose of this study is to examine English speakers' preferences about the order in which written information is presented. Your participation in the survey will contribute to a better understanding of the English language. We estimate that it will take about a minute of your time to complete each  question. You are free to contact the investigator at the above address and phone number to discuss the survey.&lt;br /&gt;&lt;br /&gt;Risks to participants are considered minimal. There will be no costs for participating. You will be paid for each HIT you complete, but will not otherwise benefit from participating. Your Amazon account information will be kept while we collect data for tracking purposes only. A limited number of research team members will have access to the data during data collection. This information will be stripped from the final dataset.&lt;br /&gt;&lt;br /&gt;Your participation in this survey is voluntary. You may decline to answer any question and you have the right to withdraw from participation at any time without penalty. If you wish to withdraw from the study or have any questions, contact the investigator listed above.&lt;br /&gt;&lt;br /&gt;If you have any questions, please email Daniel Velleman at ut.linguistics.mturk@gmail.com. You may also request a hard copy of the survey from the contact information above.&lt;br /&gt;&lt;br /&gt;This study has been reviewed and approved by The University of Texas at Austin Institutional Review Board. If you have questions about your rights as a study participant, or are dissatisfied at any time with any aspect of this study, you may contact - anonymously, if you wish - the Institutional Review Board by phone at (512) 471-8871 or email at orsc@uts.cc.utexas.edu.&lt;br /&gt;&lt;br /&gt;IRB Approval Number: 2009-03-0123&lt;br /&gt;&lt;/p&gt;&lt;/div&gt;&lt;div name="HITAnswer-wrapper" class="HITAnswer-wrapper" style="font-size: 11px; line-height: 14px; font-family: verdana, arial, sans-serif; "&gt;&lt;p class="question-selection-wrapper" style="font-size: 11px; line-height: 14px; font-family: verdana, arial, sans-serif; padding-left: 5px; padding-right: 5px; "&gt;&lt;/p&gt;&lt;fieldset class="fieldset" color="initial" style="border-top-style: none; border-right-style: none; border-bottom-style: none; border-left-style: none; border-width: initial; border- "&gt;&lt;p class="checkbox-wrapper"   style=" line-height: 14px;  padding-left: 0px; padding-right: 0px; margin-top: 0px; margin-right: 0px; margin-bottom: 0px; margin-left: 0px; padding-top: 0px; padding-bottom: 0px; vertical-align: middle; font-family:verdana, arial, sans-serif;font-size:11px;"&gt;&lt;input type="checkbox" class="question selection" name="Answer_3" id="Answer_3" value="Selection_eWVz"    style=" line-height: 14px;   font-family:verdana, arial, sans-serif;font-size:11px;color:black;"&gt;&lt;span class="answer text"&gt;I understand want to participate in this study.&lt;/span&gt;&lt;/p&gt;&lt;/fieldset&gt;&lt;/div&gt;&lt;/span&gt;&lt;/div&gt;&lt;/div&gt;&lt;/blockquote&gt;&lt;div&gt;It is indeed a very clever idea to leverage a qualification test, to get workers to fill-in a consent form, and satisfy at the same time the requirement of the Institutional Review Board.&lt;/div&gt;&lt;br /&gt;&lt;br /&gt;&lt;div&gt;Perhaps the trick will be useful to other researchers that want to run human studies on Mechanical Turk. (I still believe that for this study an IRB is not required, but this is not the point of this post.)&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/7118563403027467631-7957021653789745072?l=behind-the-enemy-lines.blogspot.com'/&gt;&lt;/div&gt;&lt;div class="feedflare"&gt;
&lt;a href="http://feeds.feedburner.com/~ff/AComputerScientistInABusinessSchool?a=0mJ90hFgL3c:9hMEMQTVWJY:yIl2AUoC8zA"&gt;&lt;img src="http://feeds.feedburner.com/~ff/AComputerScientistInABusinessSchool?d=yIl2AUoC8zA" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/AComputerScientistInABusinessSchool?a=0mJ90hFgL3c:9hMEMQTVWJY:4cEx4HpKnUU"&gt;&lt;img src="http://feeds.feedburner.com/~ff/AComputerScientistInABusinessSchool?i=0mJ90hFgL3c:9hMEMQTVWJY:4cEx4HpKnUU" border="0"&gt;&lt;/img&gt;&lt;/a&gt;
&lt;/div&gt;&lt;img src="http://feeds.feedburner.com/~r/AComputerScientistInABusinessSchool/~4/0mJ90hFgL3c" height="1" width="1"/&gt;</content><link rel="replies" type="application/atom+xml" href="http://behind-the-enemy-lines.blogspot.com/feeds/7957021653789745072/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="http://behind-the-enemy-lines.blogspot.com/2009/08/get-consent-form-for-irb-on-mturk-using.html#comment-form" title="3 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/7118563403027467631/posts/default/7957021653789745072?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/7118563403027467631/posts/default/7957021653789745072?v=2" /><link rel="alternate" type="text/html" href="http://feedproxy.google.com/~r/AComputerScientistInABusinessSchool/~3/0mJ90hFgL3c/get-consent-form-for-irb-on-mturk-using.html" title="Get a Consent Form (for IRB) on MTurk using Qualification Tests" /><author><name>Panos Ipeirotis</name><uri>http://www.blogger.com/profile/15283752183704062501</uri><email>noreply@blogger.com</email><gd:extendedProperty name="OpenSocialUserId" value="12579610116507126669" /></author><thr:total xmlns:thr="http://purl.org/syndication/thread/1.0">3</thr:total><feedburner:origLink>http://behind-the-enemy-lines.blogspot.com/2009/08/get-consent-form-for-irb-on-mturk-using.html</feedburner:origLink></entry><entry gd:etag="W/&quot;CkIBRnY-fyp7ImA9WxJaFU8.&quot;"><id>tag:blogger.com,1999:blog-7118563403027467631.post-848605812896880915</id><published>2009-08-05T19:15:00.006-04:00</published><updated>2009-08-05T21:02:37.857-04:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2009-08-05T21:02:37.857-04:00</app:edited><category scheme="http://www.blogger.com/atom/ns#" term="mechanical turk" /><title>Top Requesters on Mechanical Turk</title><content type="html">&lt;div style="text-align: left;"&gt;Today I had a chat with Dahn Tamir about all things MTurk. He was particularly interested in the archive of all requesters that I have collected over the last 7 months. So, I queried the database, computed some basic statistics and sent him the results.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Then I thought: why not exporting the live results as well? A few php lines later, the leaderboard with the top Mechanical Turk requesters was born and is now available at &lt;a href="http://hyperion.stern.nyu.edu/mturk/requesters.php"&gt;http://hyperion.stern.nyu.edu/mturk/requesters.php&lt;/a&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;a href="http://hyperion.stern.nyu.edu/mturk/requesters.php"&gt;&lt;img src="http://2.bp.blogspot.com/_skxW9kDLt_c/SnoTfkWfykI/AAAAAAAAAik/erAI8dyYodw/s400/Mturk-Top-Requesters.PNG" border="0" alt="" id="BLOGGER_PHOTO_ID_5366623339020274242" style="display: block; margin-top: 0px; margin-right: auto; margin-bottom: 10px; margin-left: auto; text-align: center; cursor: pointer; width: 400px; height: 313px; " /&gt;&lt;/a&gt;&lt;/div&gt;&lt;div&gt;You can see for each requester the total number of projects they have posted on Mechanical Turk since January 2009, the total number of HITs, and the total value of the posted HITs. If you are also interested in whether the requester is still active, you can see when was the last time that they posted a HIT.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;By clicking on their names, you can see the archive of the last 100 tasks that they have posted  and by clicking at the requesterid you get to Amazon and you can see the tasks that are available now.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Enjoy!&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/7118563403027467631-848605812896880915?l=behind-the-enemy-lines.blogspot.com'/&gt;&lt;/div&gt;&lt;div class="feedflare"&gt;
&lt;a href="http://feeds.feedburner.com/~ff/AComputerScientistInABusinessSchool?a=QIDwrYM-iRQ:V1QgxWVR2Xo:yIl2AUoC8zA"&gt;&lt;img src="http://feeds.feedburner.com/~ff/AComputerScientistInABusinessSchool?d=yIl2AUoC8zA" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/AComputerScientistInABusinessSchool?a=QIDwrYM-iRQ:V1QgxWVR2Xo:4cEx4HpKnUU"&gt;&lt;img src="http://feeds.feedburner.com/~ff/AComputerScientistInABusinessSchool?i=QIDwrYM-iRQ:V1QgxWVR2Xo:4cEx4HpKnUU" border="0"&gt;&lt;/img&gt;&lt;/a&gt;
&lt;/div&gt;&lt;img src="http://feeds.feedburner.com/~r/AComputerScientistInABusinessSchool/~4/QIDwrYM-iRQ" height="1" width="1"/&gt;</content><link rel="replies" type="application/atom+xml" href="http://behind-the-enemy-lines.blogspot.com/feeds/848605812896880915/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="http://behind-the-enemy-lines.blogspot.com/2009/08/top-requesters-on-mechanical-turk.html#comment-form" title="17 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/7118563403027467631/posts/default/848605812896880915?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/7118563403027467631/posts/default/848605812896880915?v=2" /><link rel="alternate" type="text/html" href="http://feedproxy.google.com/~r/AComputerScientistInABusinessSchool/~3/QIDwrYM-iRQ/top-requesters-on-mechanical-turk.html" title="Top Requesters on Mechanical Turk" /><author><name>Panos Ipeirotis</name><uri>http://www.blogger.com/profile/15283752183704062501</uri><email>noreply@blogger.com</email><gd:extendedProperty name="OpenSocialUserId" value="12579610116507126669" /></author><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="http://2.bp.blogspot.com/_skxW9kDLt_c/SnoTfkWfykI/AAAAAAAAAik/erAI8dyYodw/s72-c/Mturk-Top-Requesters.PNG" height="72" width="72" /><thr:total xmlns:thr="http://purl.org/syndication/thread/1.0">17</thr:total><feedburner:origLink>http://behind-the-enemy-lines.blogspot.com/2009/08/top-requesters-on-mechanical-turk.html</feedburner:origLink></entry><entry gd:etag="W/&quot;D0QMRnk-eip7ImA9WxJaFEw.&quot;"><id>tag:blogger.com,1999:blog-7118563403027467631.post-5894902566878160063</id><published>2009-08-04T12:00:00.011-04:00</published><updated>2009-08-04T15:49:47.752-04:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2009-08-04T15:49:47.752-04:00</app:edited><category scheme="http://www.blogger.com/atom/ns#" term="mechanical turk" /><category scheme="http://www.blogger.com/atom/ns#" term="research" /><title>When to Post Tasks on Mechanical Turk?</title><content type="html">&lt;div style="text-align: left;"&gt;People that have experience with Mechanical Turk know that getting long tasks done on Mechanical Turk is tricky. While it is relatively easy to get small tasks done quickly, it is much more difficult to estimate how long a big task will take. The "estimated time" given by the Mechanical Turk interface is really crappy and provides pretty much no guidance if you expect your task to last longer than a day.&lt;/div&gt;&lt;div style="text-align: left;"&gt;&lt;br /&gt;&lt;/div&gt;&lt;div style="text-align: left;"&gt;Naturally questions like this arise: When is it best to post a task? How can I minimize my waiting time?&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Trying to understand better how tasks are being completed on Mechanical Turk, I started crawling Mechanical Turk every few minutes collecting data about the HITs, the requesters, how long each HIT is available and so on.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;b&gt;&lt;i&gt;Queue&lt;/i&gt;&lt;/b&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;The first outcome of this effort was the &lt;a href="http://hyperion.stern.nyu.edu/mturk/"&gt;Mechanical Turk Monitor&lt;/a&gt;, a visualization tool that shows how many projects are available at any given time, how many HITs, and the available rewards (&lt;a href="http://behind-the-enemy-lines.blogspot.com/2009/02/monitoring-dynamics-of-mechanical-turk.html"&gt;see the old post&lt;/a&gt;).&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://3.bp.blogspot.com/_skxW9kDLt_c/SnheRdeX5hI/AAAAAAAAAh8/Ayny6mcdpGw/s1600-h/Mturk-Monitor.PNG"&gt;&lt;img src="http://3.bp.blogspot.com/_skxW9kDLt_c/SnheRdeX5hI/AAAAAAAAAh8/Ayny6mcdpGw/s400/Mturk-Monitor.PNG" border="0" alt="" id="BLOGGER_PHOTO_ID_5366142610074756626" style="display: block; margin-top: 0px; margin-right: auto; margin-bottom: 10px; margin-left: auto; text-align: center; cursor: pointer; width: 400px; height: 285px; " /&gt;&lt;/a&gt;&lt;/div&gt;&lt;div&gt;&lt;b&gt;&lt;i&gt;Arrival process and Serving process&lt;/i&gt;&lt;/b&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;This tool was effectively showing the size of the "queue". However, it did not reveal neither how many tasks arrive per day on MTurk, nor how much work gets done on MTurk every day. So, last week I decided to display this information, and show the activity of the requesters and the corresponding activity of Turkers every day:&lt;/div&gt;&lt;div&gt;&lt;ul&gt;&lt;li&gt;&lt;a href="http://hyperion.stern.nyu.edu/mturk/arrivals.php"&gt;Activity of requestors&lt;/a&gt;: How many tasks are posted per day&lt;/li&gt;&lt;li&gt;&lt;a href="http://hyperion.stern.nyu.edu/mturk/completed.php"&gt;Activity of Turkers&lt;/a&gt;: How many tasks are completed per day&lt;/li&gt;&lt;/ul&gt;&lt;/div&gt;&lt;div&gt;&lt;b&gt;&lt;i&gt;Posting Activity&lt;/i&gt;&lt;/b&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Now, we can start scratching the surface on how things get done on Mechanical Turk. A first pass is to see the statistics for what is being posted over time:&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://2.bp.blogspot.com/_skxW9kDLt_c/Snhj0EhjPfI/AAAAAAAAAiE/Gtx7Hl4bcSI/s1600-h/Value-of-HITs-over-time.PNG"&gt;&lt;img src="http://2.bp.blogspot.com/_skxW9kDLt_c/Snhj0EhjPfI/AAAAAAAAAiE/Gtx7Hl4bcSI/s400/Value-of-HITs-over-time.PNG" border="0" alt="" id="BLOGGER_PHOTO_ID_5366148702230756850" style="display: block; margin-top: 0px; margin-right: auto; margin-bottom: 10px; margin-left: auto; text-align: center; cursor: pointer; width: 400px; height: 212px; " /&gt;&lt;/a&gt;&lt;/div&gt;&lt;div&gt;The x-axis depicts time, and the y-axis is the value of the HITs being posted every day. The blue line depicts the total value of the HITs being posted. One immediate observation is that there is some significant periodicity. Taking the 7-day average (red line) smooths significantly the curve. This indicates that there is some strong weekly periodicity. &lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Let's take a look at the distribution of &lt;b&gt;posting activity&lt;/b&gt; over the days of the week:&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://1.bp.blogspot.com/_skxW9kDLt_c/SnhmAcZ55nI/AAAAAAAAAiU/v-cZ-I_tdP8/s1600-h/MTurk-arrivals-perday.PNG"&gt;&lt;img src="http://1.bp.blogspot.com/_skxW9kDLt_c/SnhmAcZ55nI/AAAAAAAAAiU/v-cZ-I_tdP8/s400/MTurk-arrivals-perday.PNG" border="0" alt="" id="BLOGGER_PHOTO_ID_5366151113822824050" style="display: block; margin-top: 0px; margin-right: auto; margin-bottom: 10px; margin-left: auto; text-align: center; cursor: pointer; width: 400px; height: 289px; " /&gt;&lt;/a&gt;&lt;/div&gt;&lt;div&gt;The plot shows the distribution of the activity for every day of the week. By activity, we define the total value of HITs being posted on each day. As we can see, weekends tend to be significantly more quiet than weekdays. In fact, even Mondays tend to be relatively quiet, perhaps because requesters prepare their HITs that are then being posted on Tuesdays :-)&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Well, the plot is not very surprising. Lots of activity during the workdays, less activity over the weekends. &lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;b&gt;&lt;i&gt;Workers Activity&lt;/i&gt;&lt;/b&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;The interesting result though comes when we look at the activity of the Turkers:&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://4.bp.blogspot.com/_skxW9kDLt_c/SnhoSdEcOUI/AAAAAAAAAic/fKTjZ-Yz4-w/s1600-h/MTurk-serving-perday.PNG"&gt;&lt;img src="http://4.bp.blogspot.com/_skxW9kDLt_c/SnhoSdEcOUI/AAAAAAAAAic/fKTjZ-Yz4-w/s400/MTurk-serving-perday.PNG" border="0" alt="" id="BLOGGER_PHOTO_ID_5366153622262135106" style="display: block; margin-top: 0px; margin-right: auto; margin-bottom: 10px; margin-left: auto; text-align: center; cursor: pointer; width: 400px; height: 280px; " /&gt;&lt;/a&gt;&lt;/div&gt;&lt;div&gt;It seems that Turkers are not in sync with the requesters. In fact, the activity on Saturdays us comparable to the activity during the weekdays. Surprisingly, Mondays tend to see significantly less activity. (Perhaps due to the small number of tasks being posted over the weekend?).&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;b&gt;&lt;i&gt;Conclusion&lt;/i&gt;&lt;/b&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;What is clear is that there is a relative lag between the activity of requesters and workers. Although it is hard to figure out causality from these figures, it seems that Fridays and Saturdays are good days to post tasks on Mechanical Turk. Relatively low competition for the attention of workers, and significant level of workers activity during Saturday.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;So, now you know: &lt;i&gt;&lt;b&gt;Post your HITs on Friday and go away...&lt;/b&gt;&lt;/i&gt;&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/7118563403027467631-5894902566878160063?l=behind-the-enemy-lines.blogspot.com'/&gt;&lt;/div&gt;&lt;div class="feedflare"&gt;
&lt;a href="http://feeds.feedburner.com/~ff/AComputerScientistInABusinessSchool?a=3OuIykfBDJI:bWGUOfVnz3o:yIl2AUoC8zA"&gt;&lt;img src="http://feeds.feedburner.com/~ff/AComputerScientistInABusinessSchool?d=yIl2AUoC8zA" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/AComputerScientistInABusinessSchool?a=3OuIykfBDJI:bWGUOfVnz3o:4cEx4HpKnUU"&gt;&lt;img src="http://feeds.feedburner.com/~ff/AComputerScientistInABusinessSchool?i=3OuIykfBDJI:bWGUOfVnz3o:4cEx4HpKnUU" border="0"&gt;&lt;/img&gt;&lt;/a&gt;
&lt;/div&gt;&lt;img src="http://feeds.feedburner.com/~r/AComputerScientistInABusinessSchool/~4/3OuIykfBDJI" height="1" width="1"/&gt;</content><link rel="replies" type="application/atom+xml" href="http://behind-the-enemy-lines.blogspot.com/feeds/5894902566878160063/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="http://behind-the-enemy-lines.blogspot.com/2009/08/when-to-post-tasks-on-mechanical-turk.html#comment-form" title="3 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/7118563403027467631/posts/default/5894902566878160063?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/7118563403027467631/posts/default/5894902566878160063?v=2" /><link rel="alternate" type="text/html" href="http://feedproxy.google.com/~r/AComputerScientistInABusinessSchool/~3/3OuIykfBDJI/when-to-post-tasks-on-mechanical-turk.html" title="When to Post Tasks on Mechanical Turk?" /><author><name>Panos Ipeirotis</name><uri>http://www.blogger.com/profile/15283752183704062501</uri><email>noreply@blogger.com</email><gd:extendedProperty name="OpenSocialUserId" value="12579610116507126669" /></author><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="http://3.bp.blogspot.com/_skxW9kDLt_c/SnheRdeX5hI/AAAAAAAAAh8/Ayny6mcdpGw/s72-c/Mturk-Monitor.PNG" height="72" width="72" /><thr:total xmlns:thr="http://purl.org/syndication/thread/1.0">3</thr:total><feedburner:origLink>http://behind-the-enemy-lines.blogspot.com/2009/08/when-to-post-tasks-on-mechanical-turk.html</feedburner:origLink></entry><entry gd:etag="W/&quot;A08CRX08fyp7ImA9WxJaE0w.&quot;"><id>tag:blogger.com,1999:blog-7118563403027467631.post-3634478963216870443</id><published>2009-07-31T11:00:00.001-04:00</published><updated>2009-08-03T13:17:44.377-04:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2009-08-03T13:17:44.377-04:00</app:edited><category scheme="http://www.blogger.com/atom/ns#" term="academia" /><category scheme="http://www.blogger.com/atom/ns#" term="research" /><title>Workshops: Official or Unofficial Proceedings?</title><content type="html">In the process of organizing &lt;a href="http://www.cs.uwaterloo.ca/conferences/dbrank/2010/"&gt;DBRank 2010&lt;/a&gt;, we had to answer the following question: Should the proceedings for the workshop be "official" or "unofficial"? &lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Official workshop proceedings are undergoing the same process as the conference papers: Specific camera-ready format, submission by a given date to the proceedings chair, and then are officially hosted at the digital library of the publisher, with all the metadata, digital identifiers (DOI), and so on. (For DBRank 2010, that would be &lt;a href="http://ieeexplore.ieee.org/"&gt;IEEE Xplore&lt;/a&gt;.) For buraucratic purposes, these papers are considered "publications." &lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Unofficial proceedings are, well, unofficial. Typically the workshop chair posts the papers up to the website, and potentially brings printed copies for distribution at the workshop. There is no official publisher, there is no DOI assigned to the papers, and in principle this is not more of a publication than a paper posted to a website.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;So, should workshops have official or unofficial proceedings? &lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;There are some arguments aganst official proceedings:&lt;/div&gt;&lt;div&gt;&lt;ul&gt;&lt;li&gt;Increasingly, there is a significant conflict between workshop and conference publications. With some workshops allowing 8- or even 10-page workshop papers, it becomes hard for the authors of these papers to publish the same work in a conference, as there is typically significant overlap. Most database conferences will consider any past paper that is 3 pages or longer, to be a prior publication, and the conference version should have significant new content in order to be considered a "new" paper.&lt;/li&gt;&lt;li&gt;As conference become increasingly competitive many authors submit to workshops papers that could not "make it" to a conference. A workshop is typically easier to get into, and at the end "you get a paper" out of it. Needless to say, this pretty much violates the spirit of workshops that are supposed to be places for new, relatively immature research, not an archival publication.&lt;/li&gt;&lt;/ul&gt;&lt;div&gt;On the other hand, there are advantages in having official proceedings:&lt;/div&gt;&lt;div&gt;&lt;ul&gt;&lt;li&gt;It makes the workshop more attractive in the eyes of many authors. Authors get an official timestamp for their work and can point to a paper that has at least been lightly refereed, instead of pointing to a technical report or working paper.&lt;/li&gt;&lt;li&gt;It makes it easier for someone to locate the papers that were presented in the workshop. The websites for the workshops are not always hosted in "stable" websites and they disappear for various reasons. (For example, the websites for WebDB'99, WebDB 2000, WebDB 2001, and WebDB 2003 are not available any more, because the organizers have moved to different institutions.)&lt;/li&gt;&lt;/ul&gt;&lt;div&gt;So, what to do? Official or unofficial?&lt;/div&gt;&lt;/div&gt;&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/7118563403027467631-3634478963216870443?l=behind-the-enemy-lines.blogspot.com'/&gt;&lt;/div&gt;&lt;div class="feedflare"&gt;
&lt;a href="http://feeds.feedburner.com/~ff/AComputerScientistInABusinessSchool?a=NKufRJbmA58:ffb2KKMiMEM:yIl2AUoC8zA"&gt;&lt;img src="http://feeds.feedburner.com/~ff/AComputerScientistInABusinessSchool?d=yIl2AUoC8zA" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/AComputerScientistInABusinessSchool?a=NKufRJbmA58:ffb2KKMiMEM:4cEx4HpKnUU"&gt;&lt;img src="http://feeds.feedburner.com/~ff/AComputerScientistInABusinessSchool?i=NKufRJbmA58:ffb2KKMiMEM:4cEx4HpKnUU" border="0"&gt;&lt;/img&gt;&lt;/a&gt;
&lt;/div&gt;&lt;img src="http://feeds.feedburner.com/~r/AComputerScientistInABusinessSchool/~4/NKufRJbmA58" height="1" width="1"/&gt;</content><link rel="replies" type="application/atom+xml" href="http://behind-the-enemy-lines.blogspot.com/feeds/3634478963216870443/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="http://behind-the-enemy-lines.blogspot.com/2009/07/workshops-official-or-unofficial.html#comment-form" title="3 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/7118563403027467631/posts/default/3634478963216870443?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/7118563403027467631/posts/default/3634478963216870443?v=2" /><link rel="alternate" type="text/html" href="http://feedproxy.google.com/~r/AComputerScientistInABusinessSchool/~3/NKufRJbmA58/workshops-official-or-unofficial.html" title="Workshops: Official or Unofficial Proceedings?" /><author><name>Panos Ipeirotis</name><uri>http://www.blogger.com/profile/15283752183704062501</uri><email>noreply@blogger.com</email><gd:extendedProperty name="OpenSocialUserId" value="12579610116507126669" /></author><thr:total xmlns:thr="http://purl.org/syndication/thread/1.0">3</thr:total><feedburner:origLink>http://behind-the-enemy-lines.blogspot.com/2009/07/workshops-official-or-unofficial.html</feedburner:origLink></entry><entry gd:etag="W/&quot;DUEFRHY5eCp7ImA9WxJbGUU.&quot;"><id>tag:blogger.com,1999:blog-7118563403027467631.post-8499782213745593097</id><published>2009-07-30T12:57:00.014-04:00</published><updated>2009-07-30T17:00:15.820-04:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2009-07-30T17:00:15.820-04:00</app:edited><category scheme="http://www.blogger.com/atom/ns#" term="mechanical turk" /><title>Is Amazon Mechanical Turk a black market?</title><content type="html">&lt;div&gt;According to &lt;a href="http://en.wikipedia.org/wiki/Underground_economy"&gt;Wikipedia&lt;/a&gt;, a black market is: &lt;i&gt;"&lt;b&gt;a market where all commerce is conducted without regard to taxation, law or regulations of trade&lt;/b&gt;".&lt;/i&gt; How is this related to Mechanical Turk?&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Today, I received an email, asking about the tax and employment issues regarding Amazon Mechanical Turk. What are the rules about posting tasks on Mechanical Turk? How should these tasks be handled by accounting and human resources departments?&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;Unfortunately, Amazon did not design Mechanical Turk in a requester-friendly way. In an effort to relieve their accounting and HR department from a big overhead, Amazon transferred to the requesters the risk of violating the US Tax Code and engaging into &lt;span class="blsp-spelling-corrected" id="SPELLING_ERROR_0"&gt;illegal employment activities. &lt;/span&gt;&lt;div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="blsp-spelling-corrected" id="SPELLING_ERROR_0"&gt;&lt;/span&gt;How can this happen? The key issue is whether there is an employer-employee relationship between the requesters and the workers on Mechanical Turk. The crucial question is:&lt;/div&gt;&lt;div&gt;&lt;blockquote&gt;&lt;b&gt;When you submit funds to your Mechanical Turk account, who are you paying? Amazon.com or the worker?&lt;/b&gt; &lt;/blockquote&gt;&lt;/div&gt;&lt;div&gt;If it is Amazon, then you are simply letting Amazon deal with all the tax and employment issues associated with the worker: Amazon needs to verify that the worker is eligible for employment, takes care of tax issues, and so on. In this case, hiring someone for a micro-task on Amazon is the same as getting an agency to provide cleaning services to your home: you do not need to care if the person coming to clean your place is eligible for employment, whether the taxes are properly withheld from the paycheck and so on. It is the agency's task to take care of that. &lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;However, Amazon does not follow this route. According to the &lt;a href="https://www.mturk.com/mturk/conditionsofuse"&gt;terms and conditions, paragraph 6.a&lt;/a&gt;:&lt;/div&gt;&lt;div&gt;&lt;blockquote&gt;&lt;span class="Apple-style-span"  style="font-size:small;"&gt;In addition to the disclosures described in our Privacy Notice, we will disclose to Requesters [....] Provider Tax Information. "Provider Tax Information" means tax identification information of Providers, such as a Social Security Number or Employer Identification Number. Requesters use Provider Tax Information to fill out an IRS Form 1099 and send it to Providers. If you are a Requester and want Provider Tax Information from us to complete IRS Form 1099s for Providers you have paid, you must provide us with your company name and employer identification number ("Requester Tax Information"). You hereby consent to disclosure of Provider Tax Information, Requester Tax Information, and other data as described in this Section 6 and our Privacy Notice.&lt;/span&gt;&lt;/blockquote&gt;&lt;/div&gt;&lt;div&gt;This provision is there because a requester that paid a worker more than 600USD per year, is required to submit 1099-MISC tax forms to these workers. In other words, this tiny provision means that the &lt;b&gt;employer-employee relationship is not between Amazon and the worker but between the requester and the worker&lt;/b&gt;. This is in contrast to other marketplaces (e.g., Rent-A-Coder), where the requester pays the marketplace provider, and then the marketplace provider contracts individually the workers, taking care of tax issues, issues of employment authorization and so on.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;What are the implications of this policy?&lt;/div&gt;&lt;div&gt;&lt;ul&gt;&lt;li&gt;Requesters may be open to the risk of violating employment laws. It is possible that a requester is illegally employing US-based workers that do not have the right to work in the US.&lt;/li&gt;&lt;li&gt;Requesters may be open to the risk of violating the US tax code. The requester needs to keep track of how much they paid each individual worker (out of potentially thousands of workers), and send 1099-MISC tax forms to the workers that did more than 600USD worth of HITs over the year for the requester.&lt;/li&gt;&lt;/ul&gt;&lt;div&gt;OK, these are the risks. What are the potential counter arguments and how can somene avoid these issues?&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;b&gt;The employment-eligibility issue:&lt;/b&gt; Amazon pays in cash only people that have US bank accounts. This means that the person, if US-based, is legally in the US. I do not know if Amazon checks for employment eligibility (they should). If the person is not US-based, then Amazon pays through gift cards: From what I know, gift cards are not considered compensation, as we regularly give gift cards as awards to students, without worrying about their eligibility to work, and our accounting department never worried about this practice. So, the issue of illegal employment seems to be rather controlled but it would be nice if Amazon took explictly care of that. Yes, it is a big headache for the HR department of Amazon to handle thousands of micro-contractors, but this is the price to pay for running this service.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;b&gt;The tax issue: &lt;/b&gt;At the very least, Amazon should have an automatic service to take care of this issue rather than leaving requesters scramble to track all the micro-payments and send the paperwork. It is trivial: If a given requester-worker pair generated more than 600USD worth of HITs over the year, request tax information and send the 1099-MISC forms on their behalf. &lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;b&gt;A better solution&lt;/b&gt;: Request tax and employment-eligibility information from workers BEFORE they can work on the MTurk marketplace. Also, request tax information from all the requesters BEFORE they can post any tasks on MTurk. Then submit the tax forms automatically at the end of the year.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;b&gt;An even better solution&lt;/b&gt;: Adopt the Rent-A-Coder model, and consider the MTurk workers as Amazon contractors. Then requesters buy services from Amazon, in the same way they buy computing power on EC2, storage on S3, and so on. In this case, it is very simple to add the MTurk expense under the "software services" line in the accounting report.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;/div&gt;&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/7118563403027467631-8499782213745593097?l=behind-the-enemy-lines.blogspot.com'/&gt;&lt;/div&gt;&lt;div class="feedflare"&gt;
&lt;a href="http://feeds.feedburner.com/~ff/AComputerScientistInABusinessSchool?a=tmyADmVel1Q:S4SDDDRIYs8:yIl2AUoC8zA"&gt;&lt;img src="http://feeds.feedburner.com/~ff/AComputerScientistInABusinessSchool?d=yIl2AUoC8zA" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/AComputerScientistInABusinessSchool?a=tmyADmVel1Q:S4SDDDRIYs8:4cEx4HpKnUU"&gt;&lt;img src="http://feeds.feedburner.com/~ff/AComputerScientistInABusinessSchool?i=tmyADmVel1Q:S4SDDDRIYs8:4cEx4HpKnUU" border="0"&gt;&lt;/img&gt;&lt;/a&gt;
&lt;/div&gt;&lt;img src="http://feeds.feedburner.com/~r/AComputerScientistInABusinessSchool/~4/tmyADmVel1Q" height="1" width="1"/&gt;</content><link rel="replies" type="application/atom+xml" href="http://behind-the-enemy-lines.blogspot.com/feeds/8499782213745593097/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="http://behind-the-enemy-lines.blogspot.com/2009/07/is-amazon-mechanical-turk-black-market.html#comment-form" title="3 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/7118563403027467631/posts/default/8499782213745593097?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/7118563403027467631/posts/default/8499782213745593097?v=2" /><link rel="alternate" type="text/html" href="http://feedproxy.google.com/~r/AComputerScientistInABusinessSchool/~3/tmyADmVel1Q/is-amazon-mechanical-turk-black-market.html" title="Is Amazon Mechanical Turk a black market?" /><author><name>Panos Ipeirotis</name><uri>http://www.blogger.com/profile/15283752183704062501</uri><email>noreply@blogger.com</email><gd:extendedProperty name="OpenSocialUserId" value="12579610116507126669" /></author><thr:total xmlns:thr="http://purl.org/syndication/thread/1.0">3</thr:total><feedburner:origLink>http://behind-the-enemy-lines.blogspot.com/2009/07/is-amazon-mechanical-turk-black-market.html</feedburner:origLink></entry><entry gd:etag="W/&quot;CUcAQXk5fyp7ImA9WxJbGUo.&quot;"><id>tag:blogger.com,1999:blog-7118563403027467631.post-6880842805441351237</id><published>2009-07-14T14:32:00.018-04:00</published><updated>2009-07-30T12:57:20.727-04:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2009-07-30T12:57:20.727-04:00</app:edited><category scheme="http://www.blogger.com/atom/ns#" term="research" /><category scheme="http://www.blogger.com/atom/ns#" term="prediction markets" /><title>How Prices Evolve in Prediction Markets?</title><content type="html">&lt;div style="text-align: left;"&gt;Last week, &lt;a href="http://pages.stern.nyu.edu/~narchak"&gt;Nikolay &lt;/a&gt;presented our paper on "&lt;a href="http://pages.stern.nyu.edu/~panos/publications/ec2009.pdf"&gt;Modeling Volatility in Predictions Markets&lt;/a&gt;" at the EC'09 conference. One of the  questions that we are answering in this paper is, "&lt;span style="font-style: italic;"&gt;what is the most likely price of a prediction market contract at some point in the future?&lt;/span&gt;"&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Let's start with the &lt;b&gt;expected&lt;/b&gt; price. If we assume that the markets are efficient, then the current price of the contract is the best possible estimate for the future &lt;span style="font-style: italic; font-weight: bold;"&gt;expected &lt;/span&gt;price. &lt;b&gt;However, the current price is NOT the most likely price in the future. &lt;/b&gt;In fact the probability of the contract will have the same price in the future is decreasing with time. Why? Because the final price of the contract as we get closer to the expiration will get closer to 0 or 1, as the uncertainty about the outcome decreases over time. So, while the &lt;span style="font-style: italic;"&gt;expected &lt;/span&gt;price will be equal to the current price, most of the future prices will be closer to 0 and 1.&lt;br /&gt;&lt;br /&gt;Below you can see some 3d plots of the "future price density" as a function of the future price P and the time to expiration t. We assume that "now" is t=0 and the contract expires at t=1.&lt;br /&gt;&lt;br /&gt;If the current price is 0.5, then the future price density, as a function of the future price P and the time to expiration t, is:&lt;br /&gt;&lt;div style="text-align: center;"&gt;&lt;span class="Apple-style-span"  style="color:#0000EE;"&gt;&lt;span class="Apple-style-span" style="text-decoration: underline;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div style="text-align: center;"&gt;&lt;span class="Apple-style-span"  style="color:#0000EE;"&gt;&lt;span class="Apple-style-span" style="text-decoration: underline;"&gt;&lt;span class="Apple-style-span" style="color: rgb(0, 0, 0); -webkit-text-decorations-in-effect: none; "&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://4.bp.blogspot.com/_skxW9kDLt_c/Sl4fXTJqhGI/AAAAAAAAAhk/4R8vHYDzM3s/s1600-h/price-density_current-price-0.5.PNG"&gt;&lt;img src="http://4.bp.blogspot.com/_skxW9kDLt_c/Sl4fXTJqhGI/AAAAAAAAAhk/4R8vHYDzM3s/s400/price-density_current-price-0.5.PNG" border="0" alt="" id="BLOGGER_PHOTO_ID_5358755091755730018" style="display: block; margin-top: 0px; margin-right: auto; margin-bottom: 10px; margin-left: auto; text-align: center; cursor: pointer; width: 355px; height: 400px; " /&gt;&lt;/a&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;As you can see, the possible prices, when we are close to t=0, are clustered around the current price (in this case 0.5). Then, as we move closer to the expiration, the probability density moves closer to 0 and 1. As this contract had price 0.5, the plot is completely symmetric around the axis P=0.5.&lt;br /&gt;&lt;br /&gt;If we have a current contract price at 0.4, then the density becomes more skewed towards 0:&lt;div style="text-align: center;"&gt;&lt;span class="Apple-style-span"  style="color:#0000EE;"&gt;&lt;span class="Apple-style-span" style=""&gt;&lt;span class="Apple-style-span" style="color: rgb(0, 0, 0); -webkit-text-decorations-in-effect: none; "&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://3.bp.blogspot.com/_skxW9kDLt_c/Sl4fXhRMuhI/AAAAAAAAAhs/K2fRCm3MDq4/s1600-h/price-density_current-price-0.4.PNG" style="text-decoration: none;"&gt;&lt;img src="http://3.bp.blogspot.com/_skxW9kDLt_c/Sl4fXhRMuhI/AAAAAAAAAhs/K2fRCm3MDq4/s400/price-density_current-price-0.4.PNG" border="0" alt="" id="BLOGGER_PHOTO_ID_5358755095545428498" style="display: block; margin-top: 0px; margin-right: auto; margin-bottom: 10px; margin-left: auto; text-align: center; cursor: pointer; width: 366px; height: 400px; " /&gt;&lt;/a&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;And here is an even more skewed plot, with the current contract price at 0.25:&lt;br /&gt;&lt;div style="text-align: center;"&gt;&lt;span class="Apple-style-span"  style="color:#0000EE;"&gt;&lt;span class="Apple-style-span" style="text-decoration: underline;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div style="text-align: center;"&gt;&lt;span class="Apple-style-span"  style="color:#0000EE;"&gt;&lt;span class="Apple-style-span" style="text-decoration: underline;"&gt;&lt;span class="Apple-style-span" style="color: rgb(0, 0, 0); -webkit-text-decorations-in-effect: none; "&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://4.bp.blogspot.com/_skxW9kDLt_c/Sl4fYJW3RRI/AAAAAAAAAh0/MZquKIXG-ZE/s1600-h/price-density_current-price-0.25.PNG"&gt;&lt;img src="http://4.bp.blogspot.com/_skxW9kDLt_c/Sl4fYJW3RRI/AAAAAAAAAh0/MZquKIXG-ZE/s400/price-density_current-price-0.25.PNG" border="0" alt="" id="BLOGGER_PHOTO_ID_5358755106306606354" style="display: block; margin-top: 0px; margin-right: auto; margin-bottom: 10px; margin-left: auto; text-align: center; cursor: pointer; width: 367px; height: 400px; " /&gt;&lt;/a&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;Just in case you want to create your own plots, here is the Maple code:&lt;br /&gt;&lt;blockquote&gt;&lt;pre&gt;with(stats);&lt;br /&gt;&lt;br /&gt;normpdf:=(x,mu,sigma)-&gt;statevalf[pdf,normald[mu,sigma]](x);&lt;br /&gt;&lt;br /&gt;spdf:=x -&gt; normpdf(x,0,1);&lt;br /&gt;&lt;br /&gt;normicdf:=(p,mu,sigma)-&gt;statevalf[icdf,normald[mu,sigma]](p);&lt;br /&gt;&lt;br /&gt;sicdf:=x-&gt;normicdf(x,0,1);&lt;br /&gt;&lt;br /&gt;f:= (pnow,pfuture,lambda) -&gt; spdf ( sqrt(1/lambda) * sicdf(pnow) - sqrt(1/lambda-1) * sicdf(pfuture))*sqrt(1/lambda-1)/spdf(sicdf(pfuture));&lt;br /&gt;&lt;br /&gt;plot3d(eval((f(p,P,t)), {p=0.5}), P=0..1, t=0.1..0.75, axes=boxed, shading=zhue, orientation=[-120, 50]);&lt;br /&gt;&lt;/pre&gt;&lt;/blockquote&gt;&lt;br /&gt;&lt;br /&gt;So, what can we do with these results? One application is to price the X contracts on Intrade: In the "X" contracts, the question is about the future price movement of a prediction market contract (e.g., "will the contract for Democrats winning the 2012 election be above 0.75 on December 31st, 2010?").&lt;br /&gt;&lt;br /&gt;These X contracts are similar to the existing "call" and "put" options on the stock market, where people try to guess where the price of a stock will be in the future. &lt;span style="font-weight: bold;"&gt;There is a significant difference, though&lt;/span&gt;: When a trader prices and trades a call/put option for a share, (e.g., using the vanilla Black-Scholes formula) the trader  needs to guess the future volatility of the share price. Through this process, the trade gives to the public valuable information about the future volatility of share price. For prediction markets, trading an X contract does &lt;span style="font-weight: bold;"&gt;not &lt;/span&gt;reveal the same information. Our work shows what the exact form of future price distributions, without the need to provide any volatility estimates. (Volatility can be largely determined by the current price and time to expiration; see the &lt;a href="http://behind-the-enemy-lines.blogspot.com/2008/10/modeling-volatility-in-prediction_20.html"&gt;past blog post&lt;/a&gt; and the &lt;a href="http://pages.stern.nyu.edu/~panos/publications/ec2009.pdf"&gt;EC'09 paper&lt;/a&gt; for details.) So, pricing an X contract requires just to plug in the current price, time to expiration, and strike price (information that is already public) to find the "correct" price for the X contract.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;So, am I saying that the X contracts are completely useless? No. But the information revealed by trading these contracts is significantly less compared to the information revealed by trading options on stocks.&lt;br /&gt;&lt;br /&gt;&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/7118563403027467631-6880842805441351237?l=behind-the-enemy-lines.blogspot.com'/&gt;&lt;/div&gt;&lt;div class="feedflare"&gt;
&lt;a href="http://feeds.feedburner.com/~ff/AComputerScientistInABusinessSchool?a=47abU7DW70Q:Iry9arOxU_E:yIl2AUoC8zA"&gt;&lt;img src="http://feeds.feedburner.com/~ff/AComputerScientistInABusinessSchool?d=yIl2AUoC8zA" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/AComputerScientistInABusinessSchool?a=47abU7DW70Q:Iry9arOxU_E:4cEx4HpKnUU"&gt;&lt;img src="http://feeds.feedburner.com/~ff/AComputerScientistInABusinessSchool?i=47abU7DW70Q:Iry9arOxU_E:4cEx4HpKnUU" border="0"&gt;&lt;/img&gt;&lt;/a&gt;
&lt;/div&gt;&lt;img src="http://feeds.feedburner.com/~r/AComputerScientistInABusinessSchool/~4/47abU7DW70Q" height="1" width="1"/&gt;</content><link rel="replies" type="application/atom+xml" href="http://behind-the-enemy-lines.blogspot.com/feeds/6880842805441351237/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="http://behind-the-enemy-lines.blogspot.com/2009/07/how-prices-evolve-in-prediction-markets.html#comment-form" title="2 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/7118563403027467631/posts/default/6880842805441351237?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/7118563403027467631/posts/default/6880842805441351237?v=2" /><link rel="alternate" type="text/html" href="http://feedproxy.google.com/~r/AComputerScientistInABusinessSchool/~3/47abU7DW70Q/how-prices-evolve-in-prediction-markets.html" title="How Prices Evolve in Prediction Markets?" /><author><name>Panos Ipeirotis</name><uri>http://www.blogger.com/profile/15283752183704062501</uri><email>noreply@blogger.com</email><gd:extendedProperty name="OpenSocialUserId" value="12579610116507126669" /></author><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="http://4.bp.blogspot.com/_skxW9kDLt_c/Sl4fXTJqhGI/AAAAAAAAAhk/4R8vHYDzM3s/s72-c/price-density_current-price-0.5.PNG" height="72" width="72" /><thr:total xmlns:thr="http://purl.org/syndication/thread/1.0">2</thr:total><feedburner:origLink>http://behind-the-enemy-lines.blogspot.com/2009/07/how-prices-evolve-in-prediction-markets.html</feedburner:origLink></entry><entry gd:etag="W/&quot;Ak4HQn86cCp7ImA9WxJVF00.&quot;"><id>tag:blogger.com,1999:blog-7118563403027467631.post-1056087993537320833</id><published>2009-07-04T07:27:00.002-04:00</published><updated>2009-07-04T08:02:13.118-04:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2009-07-04T08:02:13.118-04:00</app:edited><category scheme="http://www.blogger.com/atom/ns#" term="academia" /><category scheme="http://www.blogger.com/atom/ns#" term="peer reviewing" /><category scheme="http://www.blogger.com/atom/ns#" term="open access" /><title>Books, Journals, Conferences, Blogs</title><content type="html">I was reading the overview on &lt;a href="http://www.earlham.edu/~peters/fos/overview.htm"&gt;Open Access Overview&lt;/a&gt; by Peter Suber, and I ran into the following paragraph:&lt;div&gt;&lt;div&gt;&lt;blockquote&gt;Scholarly journals do not pay authors for their articles, and have not done so since the first journals were launched in London and Paris in 1665. &lt;b&gt;Journals took off because they were more timely than books.&lt;/b&gt; For readers, journals were better than books for learning quickly about the recent work of others, and for authors they were better than books for sharing new work quickly with the wider world and, above all, for establishing priority over other scientists working on the same problem. &lt;b&gt;They gave authors the benefit of a fast, public time-stamp on their work&lt;/b&gt;. Because authors were rewarded in these strong, intangible ways, they accepted the fact that journals couldn't afford to pay them. Over time, journal revenue grew but authors continued in the tradition of writing articles for impact, not for money.&lt;/blockquote&gt;&lt;/div&gt;&lt;/div&gt;&lt;div&gt;It was amusing to see that there was this transition from books to journals, for pretty much the same reason that in computer science we have seen a transition from journals to conferences. I am wondering if the senior scholars of the day were commenting on this transition in the same way that Mike Trick &lt;a href="http://mat.tepper.cmu.edu/blog/?p=736"&gt;commented &lt;/a&gt;on the similar &lt;a href="http://mat.tepper.cmu.edu/blog/?p=720"&gt;tension between journal and conference publications&lt;/a&gt;:&lt;/div&gt;&lt;blockquote&gt;if a subgroup chooses a method of evaluation antithetical to the mores of the rest of academe, don’t be surprised if the group gets little respect outside their narrow group&lt;/blockquote&gt;So may be a few years from now, we will see a similar problem as people will start leaving "traditional" peer-reviewing behind, opting for new modes of publication, such as self-publishing. Michael Nielsen has an excellent article on the &lt;a href="http://michaelnielsen.org/blog/is-scientific-publishing-about-to-be-disrupted/"&gt;disruption of scientific publishing&lt;/a&gt;. Micheal points to the high quality blog posts from high-quality researchers: &lt;div&gt;&lt;blockquote&gt;Look at Terry Tao’s &lt;a href="http://terrytao.wordpress.com/tag/285g-poincare-conjecture/"&gt;wonderful series of posts&lt;/a&gt; explaining one of the biggest breakthroughs in recent mathematical history, the proof of the &lt;a href="http://en.wikipedia.org/wiki/Poincare_conjecture"&gt;Poincare conjecture&lt;/a&gt;. Or Tim Gowers recent experiment in &lt;a href="http://gowers.wordpress.com/2009/01/27/is-massively-collaborative-mathematics-possible/"&gt;“massively collaborative mathematics”&lt;/a&gt;, using open source principles to successfully attack a significant mathematical problem. Or Richard Lipton’s &lt;a href="http://rjlipton.wordpress.com/tag/factoring/"&gt;excellent series of posts&lt;/a&gt; exploring his ideas for solving a major problem in computer science, namely, finding a fast algorithm for factoring large numbers. &lt;/blockquote&gt;So, does the future of publication rely on self-publishing? Daniel Lemire may be right saying:&lt;/div&gt;&lt;div&gt;&lt;blockquote&gt;To me, the single most important recent event in academic publishing has been the publication by Perelman of his solution to the Poincarré conjecture on &lt;a href="http://aps.arxiv.org/find/grp_math/1/au:+Perelman_Grisha/0/1/0/all/0/1"&gt;arxiv&lt;/a&gt;. This is truly a historical event.&lt;/blockquote&gt;Will this change alter fundamentally the way academia works? I do not think so. It will simply mean that every scholar will be very careful about the quality of the work that is self-published. When everyone can speak, people will only listen to those that generate content of high quality, effectively ignoring those that publish for the sake of publishing.&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/7118563403027467631-1056087993537320833?l=behind-the-enemy-lines.blogspot.com'/&gt;&lt;/div&gt;&lt;div class="feedflare"&gt;
&lt;a href="http://feeds.feedburner.com/~ff/AComputerScientistInABusinessSchool?a=q42UuoxgK_w:fowOb8BcW88:yIl2AUoC8zA"&gt;&lt;img src="http://feeds.feedburner.com/~ff/AComputerScientistInABusinessSchool?d=yIl2AUoC8zA" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/AComputerScientistInABusinessSchool?a=q42UuoxgK_w:fowOb8BcW88:4cEx4HpKnUU"&gt;&lt;img src="http://feeds.feedburner.com/~ff/AComputerScientistInABusinessSchool?i=q42UuoxgK_w:fowOb8BcW88:4cEx4HpKnUU" border="0"&gt;&lt;/img&gt;&lt;/a&gt;
&lt;/div&gt;&lt;img src="http://feeds.feedburner.com/~r/AComputerScientistInABusinessSchool/~4/q42UuoxgK_w" height="1" width="1"/&gt;</content><link rel="replies" type="application/atom+xml" href="http://behind-the-enemy-lines.blogspot.com/feeds/1056087993537320833/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="http://behind-the-enemy-lines.blogspot.com/2009/07/books-journals-conferences-blogs.html#comment-form" title="1 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/7118563403027467631/posts/default/1056087993537320833?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/7118563403027467631/posts/default/1056087993537320833?v=2" /><link rel="alternate" type="text/html" href="http://feedproxy.google.com/~r/AComputerScientistInABusinessSchool/~3/q42UuoxgK_w/books-journals-conferences-blogs.html" title="Books, Journals, Conferences, Blogs" /><author><name>Panos Ipeirotis</name><uri>http://www.blogger.com/profile/15283752183704062501</uri><email>noreply@blogger.com</email><gd:extendedProperty name="OpenSocialUserId" value="12579610116507126669" /></author><thr:total xmlns:thr="http://purl.org/syndication/thread/1.0">1</thr:total><feedburner:origLink>http://behind-the-enemy-lines.blogspot.com/2009/07/books-journals-conferences-blogs.html</feedburner:origLink></entry><entry gd:etag="W/&quot;CkYGSXk5eSp7ImA9WxJVEkU.&quot;"><id>tag:blogger.com,1999:blog-7118563403027467631.post-1313885021441465171</id><published>2009-06-29T08:18:00.007-04:00</published><updated>2009-06-29T08:55:28.721-04:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2009-06-29T08:55:28.721-04:00</app:edited><category scheme="http://www.blogger.com/atom/ns#" term="mechanical turk" /><category scheme="http://www.blogger.com/atom/ns#" term="academia" /><category scheme="http://www.blogger.com/atom/ns#" term="research" /><category scheme="http://www.blogger.com/atom/ns#" term="prediction markets" /><category scheme="http://www.blogger.com/atom/ns#" term="hcomp" /><title>LiveBlogging from HCOMP 2009</title><content type="html">I am now attending &lt;a href="http://www.hcomp2009.org/Home.html"&gt;HCOMP2009&lt;/a&gt;, the workshop on Human Computation that &lt;a href="http://behind-the-enemy-lines.blogspot.com/2009/03/human-computation-workshop-hcomp-2009.html"&gt;I previously advertised in this blog&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;Overall, the workshop was well-attended with more than 40 people in the audience, pretty much filling the room. Below you can find the notes that I was keeping during the workshop. (See also the &lt;a href="http://twitter.com/#search?q=%23hcomp"&gt;associated tweets&lt;/a&gt;.) I hope that they capture nicely some of the discussion for those of you that could not be there.&lt;br /&gt;&lt;br /&gt;######################################################################&lt;br /&gt;&lt;br /&gt;The workshop started with introductions, and then Luis von Ahn described &lt;a href="http://monolingo.com/"&gt;MonoLingo&lt;/a&gt;, a human powered system for translation, using people that know ONLY ONE language. Monolingo relies on the fact that machine translation is pretty good at this point, but not perfect. So MonoLingo starts by by translating each word using a dictionary, giving multiple interpretations for each word. The human then (who is a native speaker of the target language) selects the translation for each word and forms the sentence that makes most sense.&lt;br /&gt;&lt;br /&gt;To give the incentives to users to participate, MonoLingo appears to be a language learning site: as the user learns the language, makes small translations posed as exercises. Such small translations are used to contribute to a larger translation task.&lt;br /&gt;&lt;br /&gt;The system reminded me a little bit of the crowdsourced-powered New York startup &lt;a href="http://www.speaklike.com/"&gt;SpeakLike&lt;/a&gt;, that I have seen at the New York Tech Meetup.&lt;br /&gt;&lt;br /&gt;I was also wondering how to best integrate MonoLingo with the more advanced machine translation systems, e.g., Google Translate, which generate significantly better translations than the word-by-word translation(s) generated by MonoLingo. Perhaps a more iterative approach (e.g., using the &lt;a href="http://groups.csail.mit.edu/uid/turkit/"&gt;TurkIt&lt;/a&gt; toolkit and ideas) might actually work wonders in this task, where people take turns to improve the overall translation.&lt;br /&gt;&lt;br /&gt;######################################################################&lt;br /&gt;&lt;br /&gt;The second talk of the workshop described "Herd It: Designing a social game to tag music" by Luke Barrington, Douglas Turnbull, Damien O’Malley, and Gert Lanckriet.&lt;br /&gt;&lt;br /&gt;HerdIt uses an active learning approach to tag music. Users tag music online and then a machine learning algorithm is trained to tag a few more songs etc. (The approach reminds me of our own KDD08 paper on active learning using multiple noisy labelers.)&lt;br /&gt;&lt;br /&gt;The HerdIt game starts playing the music and the players sees bubbles floating around each one with a tag (e.g., rock, pop, romantic, ballad etc). The players gets more points when they hit the bubble that corresponds to the more popular tags. The authors also added  quizzes in the game (e.g., a song plays and the question "the singer has long hair?" appears). For the quizzed, there is a parimutuel prediction market running in the background, where users bet on different outcomes/answers and the winners split the common pool for the bet.&lt;br /&gt;&lt;br /&gt;The HerdIt game, reminds me a little bit of &lt;a href="http://majorminer.org/info/intro"&gt;MajorMiner&lt;/a&gt;, a music game introduced by Michael Mandel, a classmate of mine at Columbia University.&lt;br /&gt;&lt;br /&gt;######################################################################&lt;br /&gt;&lt;br /&gt;The third talk described "KissKissBan: A Competitive Human Computation Game for Image Annotation" by Chien-Ju Ho, Tao-Hsuan Chang, Jong-Chuan Lee,  Jane Yung-Jen Hsu, and Kuan-Ta Chen.&lt;br /&gt;&lt;br /&gt;The authors are building on and improving the ESP game. They try to solve the "coalition problem" (people develop a strategy of typing a "common" word, e.g., "the" and proceeding to the next image) and the problem of limited data diversity of tags if you have a limited user base.&lt;br /&gt;&lt;br /&gt;The basic idea is to make the the 2-player ESP game, a 3-player game: a new type of player is introduced, named "blocker". The blocker is trying to prevent the other two users from proceeding by typing the "obvious taboo" words that the other two players may type to describe the depicted image. The more words the blocker catches, the better the score of the blocker. This encourages the diversity of tags, since the two "matcher" players will attempt non-obvious tags to describe the image. The blocker can also observe what words the "match" users are using across images. So if two players develop a cheating strategy and type some unrelated word often (e.g., "the") the blocker can catch that behavior and put such words in the "blocked" list. KissKissBan uses a zero-sum approach where the blocker gets the points that are being lost by the matchers. So the blocker has the incentive of entering many words that are then used by the matchers. I suspect that the blocker also loses points by entering words that are NOT used by the matcher players.&lt;br /&gt;&lt;br /&gt;Unfortunately, it does not seem to work if the matchers have access to a communication channel beyond the game (e.g., skype or being in the same room). Also, I would have liked to see some discussion of cheating strategies that can be used in such games.&lt;br /&gt;&lt;br /&gt;######################################################################&lt;br /&gt;&lt;br /&gt;The third talk was the paper "Community-based Game Design:  Experiments on Social Games for Commonsense Data Collection" by Yen-Ling Kuo, Kai-Yang Chiang,     Cheng-Wei Chan, Jong-Chuan Lee, Rex Wang, Edward Yu-Te Shen,  and   Jane Yung-Jen Hsu.&lt;br /&gt;  &lt;br /&gt;The authors describe a game-based an approach for building "common-sense" ontologies using "virtual pets" that are being "fed knowledge" by the friends of the player (e.g., in a Facebook-like setting). The game has some quiz-like templates (e.g., "______--likes to--______") which are then filled in by friends of the player using sensible values (e.g., "a student--likes to--have no homework"). To make it fun, the pets compete online playing such quizzes and they become smarter, getting "smart points". The smarter pets that have the most knowledge and give the most sensible answers appear in the leaderboards. The pets get points when they give the same answers as the pets of other owners. The game is named "Rapport Game".&lt;br /&gt;&lt;br /&gt;Sounded fun having a very distinct East Asian flavor, reminding me of a social-networking-enabled Tamagochi...&lt;br /&gt;&lt;br /&gt;######################################################################&lt;br /&gt;&lt;br /&gt;After the break, we attended the demos and posters. Unfortunately, the time was not enough to really check the large number of posters and demos that were available.&lt;br /&gt;&lt;br /&gt;From the demos, I liked TurkIt a lot, not so much for the library and toolkit, but mainly for introducing the idea of "iterative tasks": People that have used Mechanical Turk know that it is very hard to get people to do a good job in tasks that have free responses. Having a voting round after the submission (e.g., "Is this answer good or not") discourages fraud but does not generate high-quality results; it simply generates results that are "good enough". TurkIt introduces the idea of iterations, essentially soliciting multiple rounds of submissions, where users vote whether the "new" submission is better than the "old" one. Looking at the handwriting recognition task is pretty revealing why the approach works well: The first submitter takes a first pass, leaving blanks wherether the transcription is difficult. The second submitter fills-in (some of) the blanks, the third improves even more and so on. When there is no further improvement, the task ends. Very nice idea, I will be using it in the future.&lt;br /&gt;&lt;br /&gt;&lt;div&gt;I also learned about the internal crowdsourcing efforts at IBM (basic conclusion: giving a prize like iPod works much better than trying to motivate people through other marketing and management efforts... a good lesson for people trying to deploy enterprise prediction markets...)&lt;/div&gt;&lt;div&gt;&lt;br /&gt;######################################################################&lt;br /&gt;&lt;br /&gt;In the second part of the workshop, we have seen some discussion on the role of game theory in the design of the human computation games.&lt;br /&gt;&lt;br /&gt;The position paper by Shaili Jain and David Parkes "The Role of Game Theory in Human Computation Systems" gave an outline of promising directions for future research in this area.&lt;br /&gt;&lt;br /&gt;The basic idea is that human computation may benefit by using game theoretic concepts to improve the design of the games. The basic argument is that the use of game theory also solved problems in settings like P2P networks, where game theory has been used to  avoid the problem of free riding. Shaili gave a brief introduction to game theoretic analysis of some games and systems (e.g., PhotoSlap, ESP game and Yahoo Answers).&lt;br /&gt;&lt;br /&gt;The talk advocated a modeling of user actions (e.g., in the ESP games players select "easy" or "difficult" words), the corresponding costs and benefits for the users, and how these affect the outcome of the game. The nice outcome is that game theory helps predict the equilibrium of such games, essentially predicting what the stable state of these games will be.&lt;br /&gt;&lt;br /&gt;Some of the intriguing open questions:&lt;br /&gt;&lt;ol&gt;&lt;li&gt;How to quantify "fun"&lt;/li&gt;&lt;li&gt;How to introduce altruism in the models. &lt;/li&gt;&lt;li&gt;How to generate game theoretic models for the games described in the GWAP taxonomy (games with output agreement, input agreement, and inversion-problem games).&lt;/li&gt;&lt;li&gt;How to use game theoretic concepts of "population games" to model large number of agents that interact. &lt;/li&gt;&lt;/ol&gt;The high-level conclusion of the talk: It is good to build a game-theoretic model of each game, so that we can see how robust is the game to pertubations of design options.&lt;br /&gt;&lt;br /&gt;#####################################################################&lt;br /&gt;&lt;br /&gt;The game-theory discussion continued with the paper "On Formal Models for Social Verification" by    Chien-Ju Ho and     Kuan-Ta Chen.&lt;br /&gt;&lt;br /&gt;The authors described how to use game theory to show the effect of sequential verification vs parallel verification. Parallel verification is the process in which two users submit an answer for a question, and if it matches they get a reward. Sequential is the process in which the user submits an answer that needs to match a pre-given "correct" answer. (e.g., "Guess what is the original question for this answer", which allows a sanity check of the answers). The paper provides the corresponding equilibria that result from these mechanisms.&lt;br /&gt;&lt;br /&gt;######################################################################&lt;br /&gt;&lt;br /&gt;The paper "Efficient Human Computation: the Distributed Labeling Problem"  by Ran Gilad-Bachrach, Aharon Bar Hillel, and Liat Ein-dor tackled the following problem: Using humans we can collect labels and tags that describe an object (e.g., an image). When the number of possible labels is large, then we will start seeing consistency problems as different labelers are not coordinated. The labelers may provide correct answers but due to polysemy they end up giving superficially different labels, even though they mean the same thing ("truck" and "lorry"). Or due to homonymy they give the same label even though they mean different things (e.g. "greyhound" the dog, and "greyhound" the bus company).&lt;br /&gt;&lt;br /&gt;The authors describe graph-theoretic algorithms that can be used to resolve such problems and provide bounds about the optimality of the proposed approaches.  The question arised of how to deal with the fact that the same user may not be self-consistent over time. Also the question on how to deal with users of various degrees of reliability.&lt;br /&gt;&lt;br /&gt;######################################################################&lt;br /&gt;&lt;br /&gt;The last talk of the workshop was "Financial Incentives and the Performance of Crowds" by Winter Mason and Duncan J. Watts. The authors examine how Turkers respond to various levels of payment. Confirming my own observations, generally Turkers work more for higher pay but do not do much better work in terms of quality. So, paying higher gets more things done but the quality remains statistically the same (my own observation indicates that HITs paying more than 50 cents also tend to attract spammers, effectively decreasing the overall quality!)&lt;br /&gt;&lt;br /&gt;In a test of anchoring, authors asked users if they felt that they have been fairly compensated. Interestingly enough, they felt that they were underpaid, valuing their work 2-3 cents more per HIT compared to the payment --- interesting enough, the difference was consistent across levels of payment (e.g., a 5 cent HIT is worth 10, a 10 cent HIT is worth 15 etc). So it may make sense to start giving a low-payment for the HITs at first, and once the psychological anchor is established, increase the payment to satisfy the workers that felt that the initial payment was small :-)&lt;br /&gt;&lt;br /&gt;#####################################################################&lt;br /&gt;&lt;br /&gt;The overall themes that emerged from the workshop were rather clear: One the one hand, there is the experimental side of human computation, where researchers are trying to devise new incentives for users to participate, new types of actions, and new modes of interaction. On the other hand, we have the more abstract/theoretic side, where researchers are trying to model these actions and incentives to examine what theory predicts about these designs. Also, there is work that examines what to do with the noisy results that are being generated by such games and systems: how can we best handle the noise and use the generated data for data mining purposes?&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;The final question over dinner was where to organize the next instance of the workshop? There are so many disciplines coming together in this line of work that finding a commonly acceptable venue might be a challenge. Will the CHI people contine coming to KDD? Or will data miners attend CHI to attend this workshop? What about the WWW conference? Would be a good match? I personally favor KDD, but we will have to wait and see the result of the human computation on this issue...&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;(We also have the wiki with the bibliography in the area &lt;a href="http://hcomp2009.wikispaces.com/"&gt;http://hcomp2009.wikispaces.com/&lt;/a&gt; which we will have to curate and organize better to reflect the results of this workshop.)&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/7118563403027467631-1313885021441465171?l=behind-the-enemy-lines.blogspot.com'/&gt;&lt;/div&gt;&lt;div class="feedflare"&gt;
&lt;a href="http://feeds.feedburner.com/~ff/AComputerScientistInABusinessSchool?a=X_55vot7MJE:DpDFg7Gpv-E:yIl2AUoC8zA"&gt;&lt;img src="http://feeds.feedburner.com/~ff/AComputerScientistInABusinessSchool?d=yIl2AUoC8zA" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/AComputerScientistInABusinessSchool?a=X_55vot7MJE:DpDFg7Gpv-E:4cEx4HpKnUU"&gt;&lt;img src="http://feeds.feedburner.com/~ff/AComputerScientistInABusinessSchool?i=X_55vot7MJE:DpDFg7Gpv-E:4cEx4HpKnUU" border="0"&gt;&lt;/img&gt;&lt;/a&gt;
&lt;/div&gt;&lt;img src="http://feeds.feedburner.com/~r/AComputerScientistInABusinessSchool/~4/X_55vot7MJE" height="1" width="1"/&gt;</content><link rel="replies" type="application/atom+xml" href="http://behind-the-enemy-lines.blogspot.com/feeds/1313885021441465171/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="http://behind-the-enemy-lines.blogspot.com/2009/06/liveblogging-from-hcomp-2009.html#comment-form" title="1 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/7118563403027467631/posts/default/1313885021441465171?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/7118563403027467631/posts/default/1313885021441465171?v=2" /><link rel="alternate" type="text/html" href="http://feedproxy.google.com/~r/AComputerScientistInABusinessSchool/~3/X_55vot7MJE/liveblogging-from-hcomp-2009.html" title="LiveBlogging from HCOMP 2009" /><author><name>Panos Ipeirotis</name><uri>http://www.blogger.com/profile/15283752183704062501</uri><email>noreply@blogger.com</email><gd:extendedProperty name="OpenSocialUserId" value="12579610116507126669" /></author><thr:total xmlns:thr="http://purl.org/syndication/thread/1.0">1</thr:total><feedburner:origLink>http://behind-the-enemy-lines.blogspot.com/2009/06/liveblogging-from-hcomp-2009.html</feedburner:origLink></entry><entry gd:etag="W/&quot;DUIFSXY6eSp7ImA9WxJaGUk.&quot;"><id>tag:blogger.com,1999:blog-7118563403027467631.post-7281195183480823780</id><published>2009-06-09T19:56:00.014-04:00</published><updated>2009-08-10T19:38:38.811-04:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2009-08-10T19:38:38.811-04:00</app:edited><category scheme="http://www.blogger.com/atom/ns#" term="mechanical turk" /><category scheme="http://www.blogger.com/atom/ns#" term="research" /><category scheme="http://www.blogger.com/atom/ns#" term="large datasets" /><title>Google Fusion Tables: Databases on the Cloud</title><content type="html">From the Google Research Blog: &lt;a href="http://googleresearch.blogspot.com/2009/06/google-fusion-tables.html"&gt;Google Fusion Tables&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;Now it is possible to upload tabular data sets on Google, let other people use the data, and provide easy-to-use visualizations. No complicated joins or other heavy-duty relational stuff but there &lt;i&gt;is &lt;/i&gt;functionality to connect (fuse) tables. There is also functionality embedded to discuss the contents of the data set.&lt;br /&gt;&lt;br /&gt;Here is an early example. I took the data from a survey of Mechanical Turkers and imported it in Google Tables. Here is the resulting intensity map that shows the distribution of workers per country:&lt;br /&gt;&lt;br /&gt;&lt;script src="http://www.gmodules.com/ig/ifr?url=http://www.google.com/ig/modules/geomap.xml&amp;amp;up__table_query_url=http://tables.googlelabs.com/gvizdata?tq=select+col0%252Ccol1+from+6825+&amp;amp;up__table_query_refresh_interval=0&amp;amp;up_region=world&amp;amp;up_dataMode=regions&amp;amp;w=400&amp;amp;h=300&amp;amp;border=%23ffffff%7C3px%2C1px+solid+%23999999&amp;amp;synd=open&amp;amp;output=js"&gt;&lt;/script&gt;&lt;br /&gt;&lt;br /&gt;and the "lift" of the distribution of workers per state (we are comparing actual population percentage with percentage of Turkers):&lt;br /&gt;&lt;br /&gt;&lt;script src="http://www.gmodules.com/ig/ifr?url=http://www.google.com/ig/modules/geomap.xml&amp;amp;up__table_query_url=http://tables.googlelabs.com/gvizdata?tq=select+col0%252Ccol3+from+6805+&amp;amp;up__table_query_refresh_interval=0&amp;amp;up_region=US&amp;amp;up_dataMode=regions&amp;amp;w=400&amp;amp;h=300&amp;amp;border=%23ffffff%7C3px%2C1px+solid+%23999999&amp;amp;synd=open&amp;amp;output=js"&gt;&lt;/script&gt;&lt;br /&gt;&lt;br /&gt;I am truly excited about this feature. Just the idea that it will be possible to release "live" data sets, without having to set up complicated web interfaces, worrying about security, SQL injections, and so on, makes this absolutely wonderful for me.&lt;br /&gt;&lt;br /&gt;For comparison, see the corresponding visualizations from Many Eyes:&lt;br /&gt;&lt;br /&gt;&lt;script type="text/javascript" src="http://manyeyes.alphaworks.ibm.com/manyeyes/visualizations/a2913d46ecf411dd931a000255111976/comments/a2953e8cecf411dd931a000255111976.js?width=400&amp;amp;height=350"&gt;&lt;/script&gt;&lt;br /&gt;&lt;br /&gt;&lt;script type="text/javascript" src="http://manyeyes.alphaworks.ibm.com/manyeyes/visualizations/b565c050ecfd11ddacf6000255111976/comments/b56bf448ecfd11ddacf6000255111976.js?width=400&amp;amp;height=350"&gt;&lt;/script&gt;&lt;br /&gt;&lt;br /&gt;But the flexibility of Google Tables for data management counters the relative lack of visualization options.&lt;br /&gt;&lt;br /&gt;My only real complaint: The 100Mb limit. I was ready to upload my &lt;a href="http://hyperion.stern.nyu.edu/mturk/"&gt;Mechanical Turk archive&lt;/a&gt; (&lt;a href="http://behind-the-enemy-lines.blogspot.com/2009/02/monitoring-dynamics-of-mechanical-turk.html"&gt;see the related blog post&lt;/a&gt;) there, and let other people use it. Unfortunately, it is larger than the 100Mb limit. If only I could use the extra storage that I bought from Google for my Gmail account...&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/7118563403027467631-7281195183480823780?l=behind-the-enemy-lines.blogspot.com'/&gt;&lt;/div&gt;&lt;div class="feedflare"&gt;
&lt;a href="http://feeds.feedburner.com/~ff/AComputerScientistInABusinessSchool?a=lR2fn5PoJrk:TgxTF119Zco:yIl2AUoC8zA"&gt;&lt;img src="http://feeds.feedburner.com/~ff/AComputerScientistInABusinessSchool?d=yIl2AUoC8zA" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/AComputerScientistInABusinessSchool?a=lR2fn5PoJrk:TgxTF119Zco:4cEx4HpKnUU"&gt;&lt;img src="http://feeds.feedburner.com/~ff/AComputerScientistInABusinessSchool?i=lR2fn5PoJrk:TgxTF119Zco:4cEx4HpKnUU" border="0"&gt;&lt;/img&gt;&lt;/a&gt;
&lt;/div&gt;&lt;img src="http://feeds.feedburner.com/~r/AComputerScientistInABusinessSchool/~4/lR2fn5PoJrk" height="1" width="1"/&gt;</content><link rel="replies" type="application/atom+xml" href="http://behind-the-enemy-lines.blogspot.com/feeds/7281195183480823780/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="http://behind-the-enemy-lines.blogspot.com/2009/06/google-fusion-tables-databases-on-cloud.html#comment-form" title="2 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/7118563403027467631/posts/default/7281195183480823780?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/7118563403027467631/posts/default/7281195183480823780?v=2" /><link rel="alternate" type="text/html" href="http://feedproxy.google.com/~r/AComputerScientistInABusinessSchool/~3/lR2fn5PoJrk/google-fusion-tables-databases-on-cloud.html" title="Google Fusion Tables: Databases on the Cloud" /><author><name>Panos Ipeirotis</name><uri>http://www.blogger.com/profile/15283752183704062501</uri><email>noreply@blogger.com</email><gd:extendedProperty name="OpenSocialUserId" value="12579610116507126669" /></author><thr:total xmlns:thr="http://purl.org/syndication/thread/1.0">2</thr:total><feedburner:origLink>http://behind-the-enemy-lines.blogspot.com/2009/06/google-fusion-tables-databases-on-cloud.html</feedburner:origLink></entry><entry gd:etag="W/&quot;DUACQX05eyp7ImA9WxJQGU8.&quot;"><id>tag:blogger.com,1999:blog-7118563403027467631.post-4942540729628140660</id><published>2009-06-02T03:02:00.005-04:00</published><updated>2009-06-02T03:29:20.323-04:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2009-06-02T03:29:20.323-04:00</app:edited><category scheme="http://www.blogger.com/atom/ns#" term="academia" /><category scheme="http://www.blogger.com/atom/ns#" term="research" /><category scheme="http://www.blogger.com/atom/ns#" term="peer reviewing" /><title>Acceptance Rate: 100%</title><content type="html">I have been thinking lately about the concept of binary decisions, mainly in the context of paper reviewing. Most of the time, the decision for a paper is binary: Accept a paper or not. &lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;div&gt;Any binary decision that depends on some explicit or implicit threshold will always be problematic. Whatever is threshold+epsilon gets in, whatever is threshold-espilon is out. An epsilon difference generates significantly different outcomes. To make things worse, the area around the threshold is typically densely populated. No matter where we put the threshold, papers with small differences in quality, even under perfect quality assesment, get very different treatments.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Here is an alternative. Allow all authors to decide whether to publish their papers or not. With one condition. They will also publish together the reviews for the paper. The paper got 3 strong rejects and the authors still want to publish the paper? Fine! &lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;If such a system was in place, then most of the authors would seek to get good reviews instead of trying to pass the threshold and get into the publish-land. &lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Going a step further? Keep the reviewed versions of the paper together with reviews of earlier versions. Did the authors address the comments? They can have a statement describing that. Later reviewers can take a look and see whether this is the case. This policy would also encourage submissions of only high-quality results.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;This can also be matched with the requirement for the reviews to come from at least some high quality reviewers, but I will leave this for another post.&lt;/div&gt;&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/7118563403027467631-4942540729628140660?l=behind-the-enemy-lines.blogspot.com'/&gt;&lt;/div&gt;&lt;div class="feedflare"&gt;
&lt;a href="http://feeds.feedburner.com/~ff/AComputerScientistInABusinessSchool?a=YPou2xVtLjI:ZI9ncilhbi8:yIl2AUoC8zA"&gt;&lt;img src="http://feeds.feedburner.com/~ff/AComputerScientistInABusinessSchool?d=yIl2AUoC8zA" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/AComputerScientistInABusinessSchool?a=YPou2xVtLjI:ZI9ncilhbi8:4cEx4HpKnUU"&gt;&lt;img src="http://feeds.feedburner.com/~ff/AComputerScientistInABusinessSchool?i=YPou2xVtLjI:ZI9ncilhbi8:4cEx4HpKnUU" border="0"&gt;&lt;/img&gt;&lt;/a&gt;
&lt;/div&gt;&lt;img src="http://feeds.feedburner.com/~r/AComputerScientistInABusinessSchool/~4/YPou2xVtLjI" height="1" width="1"/&gt;</content><link rel="replies" type="application/atom+xml" href="http://behind-the-enemy-lines.blogspot.com/feeds/4942540729628140660/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="http://behind-the-enemy-lines.blogspot.com/2009/06/acceptance-rate-100.html#comment-form" title="8 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/7118563403027467631/posts/default/4942540729628140660?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/7118563403027467631/posts/default/4942540729628140660?v=2" /><link rel="alternate" type="text/html" href="http://feedproxy.google.com/~r/AComputerScientistInABusinessSchool/~3/YPou2xVtLjI/acceptance-rate-100.html" title="Acceptance Rate: 100%" /><author><name>Panos Ipeirotis</name><uri>http://www.blogger.com/profile/15283752183704062501</uri><email>noreply@blogger.com</email><gd:extendedProperty name="OpenSocialUserId" value="12579610116507126669" /></author><thr:total xmlns:thr="http://purl.org/syndication/thread/1.0">8</thr:total><feedburner:origLink>http://behind-the-enemy-lines.blogspot.com/2009/06/acceptance-rate-100.html</feedburner:origLink></entry><entry gd:etag="W/&quot;CUcGRHg-fCp7ImA9WxJQEks.&quot;"><id>tag:blogger.com,1999:blog-7118563403027467631.post-2589844777050690793</id><published>2009-05-25T09:00:00.000-04:00</published><updated>2009-05-25T10:50:25.654-04:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2009-05-25T10:50:25.654-04:00</app:edited><category scheme="http://www.blogger.com/atom/ns#" term="evaluation" /><category scheme="http://www.blogger.com/atom/ns#" term="academia" /><category scheme="http://www.blogger.com/atom/ns#" term="peer reviewing" /><title>Evaluation Feedback and Stakhanovist Research Profiles</title><content type="html">Every year, after the Spring semester, we receive a report with our annual evaluation, together with feedback and advice for career improvement (some written, some verbal). Part of the feedback that I received this year:&lt;div&gt;&lt;ol&gt;&lt;li&gt;&lt;i&gt;You get too many best paper awards, and you do not have that many journal papers. You may want to write more journal papers instead of spending so much time polishing the conference papers that you send out.&lt;/i&gt;&lt;/li&gt;&lt;li&gt;&lt;i&gt;You are a member of too many program committees. You may consider reviewing less and write more journal papers instead.&lt;/i&gt;&lt;/li&gt;&lt;/ol&gt;&lt;/div&gt;&lt;div&gt;I guess that having a &lt;a href="http://blog.codalism.com/?p=302"&gt;Stakhanovist research profile&lt;/a&gt; (see the corresponding &lt;a href="http://portal.acm.org/citation.cfm?doid=1498765.1498780"&gt;ACM&lt;/a&gt; &lt;a href="http://doi.acm.org/10.1145/1297797.1297815"&gt;articles&lt;/a&gt;) is a virtue after all. &lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;i&gt;&lt;span class="Apple-style-span"  style="font-size:small;"&gt;(*) &lt;/span&gt;&lt;/i&gt;&lt;a href="http://en.wikipedia.org/wiki/Aleksei_Grigorievich_Stakhanov"&gt;&lt;i&gt;&lt;span class="Apple-style-span"  style="font-size:small;"&gt;Alexey Stakhanov&lt;/span&gt;&lt;/i&gt;&lt;/a&gt;&lt;i&gt;&lt;span class="Apple-style-span"  style="font-size:small;"&gt; was a miner in Soviet Union, &lt;/span&gt;&lt;/i&gt;&lt;span class="Apple-style-span" style="line-height: 19px; "&gt;&lt;i&gt;&lt;span class="Apple-style-span"  style="font-size:small;"&gt;who cut 102 tons of coal during a six-hour shift with a pneumatic drill, when the average production being 6-7 tons. Stakhanov's record created the &lt;/span&gt;&lt;/i&gt;&lt;a href="http://en.wikipedia.org/wiki/Stakhanovite"&gt;&lt;i&gt;&lt;span class="Apple-style-span"  style="font-size:small;"&gt;Stakhanovite movement&lt;/span&gt;&lt;/i&gt;&lt;/a&gt;&lt;i&gt;&lt;span class="Apple-style-span"  style="font-size:small;"&gt; where workers were encouraged to exceeded production targets, typically trying to overcome previous production records.&lt;/span&gt;&lt;/i&gt;&lt;/span&gt;&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/7118563403027467631-2589844777050690793?l=behind-the-enemy-lines.blogspot.com'/&gt;&lt;/div&gt;&lt;div class="feedflare"&gt;
&lt;a href="http://feeds.feedburner.com/~ff/AComputerScientistInABusinessSchool?a=gq-AKoFJldA:3kr_Fkj6WFg:yIl2AUoC8zA"&gt;&lt;img src="http://feeds.feedburner.com/~ff/AComputerScientistInABusinessSchool?d=yIl2AUoC8zA" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/AComputerScientistInABusinessSchool?a=gq-AKoFJldA:3kr_Fkj6WFg:4cEx4HpKnUU"&gt;&lt;img src="http://feeds.feedburner.com/~ff/AComputerScientistInABusinessSchool?i=gq-AKoFJldA:3kr_Fkj6WFg:4cEx4HpKnUU" border="0"&gt;&lt;/img&gt;&lt;/a&gt;
&lt;/div&gt;&lt;img src="http://feeds.feedburner.com/~r/AComputerScientistInABusinessSchool/~4/gq-AKoFJldA" height="1" width="1"/&gt;</content><link rel="replies" type="application/atom+xml" href="http://behind-the-enemy-lines.blogspot.com/feeds/2589844777050690793/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="http://behind-the-enemy-lines.blogspot.com/2009/05/evaluation-feedback-and-stakhanovist.html#comment-form" title="8 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/7118563403027467631/posts/default/2589844777050690793?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/7118563403027467631/posts/default/2589844777050690793?v=2" /><link rel="alternate" type="text/html" href="http://feedproxy.google.com/~r/AComputerScientistInABusinessSchool/~3/gq-AKoFJldA/evaluation-feedback-and-stakhanovist.html" title="Evaluation Feedback and Stakhanovist Research Profiles" /><author><name>Panos Ipeirotis</name><uri>http://www.blogger.com/profile/15283752183704062501</uri><email>noreply@blogger.com</email><gd:extendedProperty name="OpenSocialUserId" value="12579610116507126669" /></author><thr:total xmlns:thr="http://purl.org/syndication/thread/1.0">8</thr:total><feedburner:origLink>http://behind-the-enemy-lines.blogspot.com/2009/05/evaluation-feedback-and-stakhanovist.html</feedburner:origLink></entry><entry gd:etag="W/&quot;CkUNQ3k8cSp7ImA9WxJRF0o.&quot;"><id>tag:blogger.com,1999:blog-7118563403027467631.post-1195052243125802221</id><published>2009-05-18T19:36:00.003-04:00</published><updated>2009-05-19T17:58:12.779-04:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2009-05-19T17:58:12.779-04:00</app:edited><category scheme="http://www.blogger.com/atom/ns#" term="academia" /><category scheme="http://www.blogger.com/atom/ns#" term="research" /><category scheme="http://www.blogger.com/atom/ns#" term="peer reviewing" /><title>Public Paper Reviews</title><content type="html">I am now trying to review a journal paper. Unfortunately, the paper is completely unreadable,  being a long list of definitions, "lemmas," and "theorems" (the quotes are intentional). Even more unfortunate is the fact that the author does not give any background but rather cites some recently published paper of his, which contain all the necessary background. &lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Trying to understand the paper, I attempted to read the published paper. Well, no luck! The paper was also horribly written. I started wondering who in his right mind decided to accept this paper to an &lt;span class="blsp-spelling-error" id="SPELLING_ERROR_0"&gt;ACM&lt;/span&gt; journal, and what the reviewers were saying. I simply cannot believe that anyone in his right mind would actually read such papers, and even try to write any meaningful review afterwards. Most probably the reviewers gave up and agreed after many revisions to allow the paper to be published, hoping that they will never have to read the paper again. Or may be I am wrong and the paper is indeed a hidden gem?&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="font-weight: bold;"&gt;So, how can we avoid such cases? Here is my &lt;a href="http://etuttounmagnamagna.blogspot.com/2009/03/whats-wrong-with-scholarly-publication.html"&gt;not &lt;/a&gt;&lt;span class="Apple-style-span" style="font-weight: normal; "&gt;&lt;a href="http://vonahn.blogspot.com/2009/03/reviews-should-be-published.html"&gt;&lt;span class="Apple-style-span" style="font-weight: bold; "&gt;so original&lt;/span&gt;&lt;/a&gt;&lt;span class="Apple-style-span" style="font-weight: bold; "&gt; suggestion: Publish the (anonymous) reviews together with each paper!&lt;/span&gt; &lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;I cannot see anything negative with that. It will make everyone happier. People that write high-quality reviews would not mind seeing their names being published together with the reviews. Other reviewers will see what is a high-quality review and hopefully will try to &lt;span class="blsp-spelling-corrected" id="SPELLING_ERROR_1"&gt;imitate&lt;/span&gt; the style. &lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;In fact, the practice of publishing a commentary for each paper is not new. I have seen many papers in statistics being published with eponymous commentary. Often, reading the reviews is more interesting than reading the paper.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Furthermore, the reviews will offer a quick overview of the contributions and shortcomings of the paper. It will also allow the reader to understand what lead to the acceptance of the paper. Was it a new idea? An excellent experimental evaluation? Or just the reviewers could not even read the paper and just gave up, giving a lukewarm "accept"?&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/7118563403027467631-1195052243125802221?l=behind-the-enemy-lines.blogspot.com'/&gt;&lt;/div&gt;&lt;div class="feedflare"&gt;
&lt;a href="http://feeds.feedburner.com/~ff/AComputerScientistInABusinessSchool?a=YJiBdRm5ANo:YvoP_g5j4Hk:yIl2AUoC8zA"&gt;&lt;img src="http://feeds.feedburner.com/~ff/AComputerScientistInABusinessSchool?d=yIl2AUoC8zA" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/AComputerScientistInABusinessSchool?a=YJiBdRm5ANo:YvoP_g5j4Hk:4cEx4HpKnUU"&gt;&lt;img src="http://feeds.feedburner.com/~ff/AComputerScientistInABusinessSchool?i=YJiBdRm5ANo:YvoP_g5j4Hk:4cEx4HpKnUU" border="0"&gt;&lt;/img&gt;&lt;/a&gt;
&lt;/div&gt;&lt;img src="http://feeds.feedburner.com/~r/AComputerScientistInABusinessSchool/~4/YJiBdRm5ANo" height="1" width="1"/&gt;</content><link rel="replies" type="application/atom+xml" href="http://behind-the-enemy-lines.blogspot.com/feeds/1195052243125802221/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="http://behind-the-enemy-lines.blogspot.com/2009/05/public-paper-reviews.html#comment-form" title="14 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/7118563403027467631/posts/default/1195052243125802221?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/7118563403027467631/posts/default/1195052243125802221?v=2" /><link rel="alternate" type="text/html" href="http://feedproxy.google.com/~r/AComputerScientistInABusinessSchool/~3/YJiBdRm5ANo/public-paper-reviews.html" title="Public Paper Reviews" /><author><name>Panos Ipeirotis</name><uri>http://www.blogger.com/profile/15283752183704062501</uri><email>noreply@blogger.com</email><gd:extendedProperty name="OpenSocialUserId" value="12579610116507126669" /></author><thr:total xmlns:thr="http://purl.org/syndication/thread/1.0">14</thr:total><feedburner:origLink>http://behind-the-enemy-lines.blogspot.com/2009/05/public-paper-reviews.html</feedburner:origLink></entry><entry gd:etag="W/&quot;Ck4MR3w9cSp7ImA9WxJTE0k.&quot;"><id>tag:blogger.com,1999:blog-7118563403027467631.post-2755187951253601275</id><published>2009-04-20T23:32:00.006-04:00</published><updated>2009-04-21T15:09:46.269-04:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2009-04-21T15:09:46.269-04:00</app:edited><category scheme="http://www.blogger.com/atom/ns#" term="incentives" /><category scheme="http://www.blogger.com/atom/ns#" term="amazon" /><category scheme="http://www.blogger.com/atom/ns#" term="google" /><title>Google App Engine and Java: First Impressions</title><content type="html">Over the last few days I have been playing with Google App Engine, the infrastructure provided by Google for building applications in the cloud. To give some context, I tried to build a crawler that will retrieve and store historic information from a marketplace. &lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;I have already built this application  and it was running on my local Linux machine, storing information into a SQL database. However, I was getting unconfortable seeing the database growing significantly and running big queries was interfering with other users who was using the database machine for their own projects. So, I decided to see how easy is to port such a vanilla project into the cloud.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;My impressions so far:&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="font-weight: bold;"&gt;&lt;span class="Apple-style-span" style="font-size: large;"&gt;&lt;span class="Apple-style-span" style="font-style: italic;"&gt;Ease of programming&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;It was pretty easy to follow the provided tutorials and get a basic application up and running pretty soon. It may be a good way to introduce students to (web) programming. The Eclipse plugin hides very significant fraction of the complexity, and allows the programmer to focus on the application development.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="font-weight: bold;"&gt;&lt;span class="Apple-style-span" style="font-size: large;"&gt;&lt;span class="Apple-style-span" style="font-style: italic;"&gt;Database support&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;No SQL database anymore. While we still save "entities" and there is support for relationships across entities, Google data storage is based on BigTable, not on a SQL database. This means &lt;span class="Apple-style-span" style="font-style: italic;"&gt;no joins&lt;/span&gt;. You can always implement your own version of a join but this is not how the Google datastore is supposed to be used. Slowly you realize that denormalization is desirable and often absolutely necessary. For someone like me that likes a fully normalized scheme, making sure that we do not have inconsistencies anywhere, it felt almost too messy: too much information replicated everywhere, need to be extra careful not to have anomalies, and so on. I can see a significant learning curve for migrating databases into such an environment. Giving up joins is not easy... (Our MBA students who keep all their data in a single spreadsheet, will feel right at home ;-) But it is not that bad. Personally, it helped me to consider the entities in the Google datastore as materialized views of some underlying relations, and use lazy updating techniques to keep the data consistent.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="font-weight: bold;"&gt;&lt;span class="Apple-style-span" style="font-size: large;"&gt;&lt;span class="Apple-style-span" style="font-style: italic;"&gt;30-second limit&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;By far the most annoying aspect of the Google App Engine is the limit of 30 seconds execution time for any process. Nothing can run for more than 30 seconds. Since I wanted to build a crawler, I had to re-think the infrastructure. It was necessary to break the task into smaller chunks that can be completed within the 30 second limit. &lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;To achieve this, I buily a "task queue" structure that was keeping track of the pages that need to be fetched, and this queue was stored as a persistent structure in the datastore. Then, the "crawler" process was picking URLs from the queue, and was fetching whatever pages can be fetched within the 30 second limit, storing the retrieved pages to the Google datastore. Pretty annoying the fact that the 30 second limitation also includes the time to fetch the page. Often, I was timing out just because the remote server was slow to send the requested page. &lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Finally, to get the crawler running "all the time", I scheduled a cron job that was starting the "30-second crawler" process every minute. Almost like trying to travel a trip with a car that can run for every 30 seconds at a time, and can be restarted every minute. Not very elegant, not the most efficient, but it works for lightweight tasks.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="font-weight: bold;"&gt;&lt;span class="Apple-style-span" style="font-size: large;"&gt;&lt;span class="Apple-style-span" style="font-style: italic;"&gt;Quota system&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Google App Engine allows applications to run for free, as long as they stay below some usage quota. Once the app exceeds its daily allocated free quota, it gets billed, up to a maximum specified limit. &lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;In other words you pay for CPU usage. This is in direct contrast to Amazon EC2 that charges by the "wall time" a virtual machine is running. Since Google App Engine charges only for the actually consumed resources it encourages code that is as efficient as possible and spends as few resources as possible.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Artists say that the limitations of a medium are a major force for creativity. I have to say that the quota system has the same effect. I found myself thinking and rethinking of how I can make the process as efficient as possible. Since I actually see all the time the exact amount of resources spent for each process, I am compelled to make the processes as efficient as possible. This is not the case for regular desktop programming. OK, it takes 2 seconds instead of 0.1. So what? I have plenty of resources, and I can afford being sloppy. When I am being billed for the consumed resources, I have a pretty immediate incentive to write the best code possible. &lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;I may be overreaching here, but I see the concept of being billed according to CPU usage a force that will encourage deeper learning in Computer Science. The effect of optimization is immediate, measurable, and it is often necessary to optimize, just to get your process running.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;I remember the stories of the old-timers and how they were trying to super-optimize their code, so that the mainframe can execute the code overnight and they can get the results back. Well, the mainframe is back!&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/7118563403027467631-2755187951253601275?l=behind-the-enemy-lines.blogspot.com'/&gt;&lt;/div&gt;&lt;div class="feedflare"&gt;
&lt;a href="http://feeds.feedburner.com/~ff/AComputerScientistInABusinessSchool?a=HyWyQqeaDDk:f5mNV96jYpM:yIl2AUoC8zA"&gt;&lt;img src="http://feeds.feedburner.com/~ff/AComputerScientistInABusinessSchool?d=yIl2AUoC8zA" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/AComputerScientistInABusinessSchool?a=HyWyQqeaDDk:f5mNV96jYpM:4cEx4HpKnUU"&gt;&lt;img src="http://feeds.feedburner.com/~ff/AComputerScientistInABusinessSchool?i=HyWyQqeaDDk:f5mNV96jYpM:4cEx4HpKnUU" border="0"&gt;&lt;/img&gt;&lt;/a&gt;
&lt;/div&gt;&lt;img src="http://feeds.feedburner.com/~r/AComputerScientistInABusinessSchool/~4/HyWyQqeaDDk" height="1" width="1"/&gt;</content><link rel="replies" type="application/atom+xml" href="http://behind-the-enemy-lines.blogspot.com/feeds/2755187951253601275/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="http://behind-the-enemy-lines.blogspot.com/2009/04/google-app-engine-and-java-first.html#comment-form" title="3 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/7118563403027467631/posts/default/2755187951253601275?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/7118563403027467631/posts/default/2755187951253601275?v=2" /><link rel="alternate" type="text/html" href="http://feedproxy.google.com/~r/AComputerScientistInABusinessSchool/~3/HyWyQqeaDDk/google-app-engine-and-java-first.html" title="Google App Engine and Java: First Impressions" /><author><name>Panos Ipeirotis</name><uri>http://www.blogger.com/profile/15283752183704062501</uri><email>noreply@blogger.com</email><gd:extendedProperty name="OpenSocialUserId" value="12579610116507126669" /></author><thr:total xmlns:thr="http://purl.org/syndication/thread/1.0">3</thr:total><feedburner:origLink>http://behind-the-enemy-lines.blogspot.com/2009/04/google-app-engine-and-java-first.html</feedburner:origLink></entry><entry gd:etag="W/&quot;A0MNQ3czcSp7ImA9WxVaEk4.&quot;"><id>tag:blogger.com,1999:blog-7118563403027467631.post-3491235541838056492</id><published>2009-04-08T20:51:00.004-04:00</published><updated>2009-04-08T21:11:32.989-04:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2009-04-08T21:11:32.989-04:00</app:edited><category scheme="http://www.blogger.com/atom/ns#" term="mechanical turk" /><category scheme="http://www.blogger.com/atom/ns#" term="research" /><title>LiveOps and Human Computation</title><content type="html">When we talk about human computation, the canonical examples are either the Games with a Purpose from Luis von Ahn, or Amazon's Mechanical Turk.&lt;br /&gt;&lt;br /&gt;Recently, though I learned about LiveOps, a company that allows "micro-outsourcing" of small tasks, such as handling a telephone call, or taking a pizza order in a drivethrough. Quoting Jonathan Zittrain who wrote a paper &lt;a href="http://ssrn.com/abstract=1140445"&gt;Ubiquitous Human Computing:&lt;/a&gt;&lt;br /&gt;&lt;blockquote&gt;We are in the initial stages of distributed human computing that can be directed at mental tasks the way that surplus remote server rackspace or Web hosting can be purchased to accommodate sudden spikes in Internet traffic (von Ahn 2005; Hewlett Packard (HP) 2008) or PCs can be arranged in grid computing configurations, each executing code in an identical virtual environment (International Business Machines (IBM) 2006). &lt;span class="Apple-style-span" style="font-weight: bold;"&gt;At some fast food drivethroughs, the microphone and speaker next to the marquee menu are patched through to an order-taker thousands of miles away. That person types up the requested order and dispatches it back to a screen in the food preparation area of the restaurant while the car is idling&lt;/span&gt; (Richtel 2006). Services like LiveOps recruit workers for such mental contracting tasks (LiveOps 2008a). Applicants to LiveOps navigate a fully automated hours-long vetting system that tests their skills and suitability. Out of 2,000 applicants per week, roughly 40 emerge for a second round of interviews by LiveOps managers (LiveOps 2008b).  &lt;/blockquote&gt;&lt;blockquote&gt;Those who succeed and become contractors for firms like LiveOps encounter an unusual combination of freedom and control. They can work whenever they like, wherever they like, for as much or as little time as they like. When they log in to work they choose from a menu of assignments tailored to their skill and reputation levels. These might include &lt;span class="Apple-style-span" style="font-weight: bold;"&gt;taking pizza orders, placing sales calls, lobbying for a political candidate, or handling customer service inquiries.&lt;/span&gt; Then there is the control: every call and transaction is measured and recorded. Interactions can be monitored live by fellow LiveOps mentors or official LiveOps managers, or pulled up later as part of a larger assessment of contractors’ work. &lt;span class="Apple-style-span" style="font-weight: bold;"&gt;Judgments are developed and recorded about contractors’ performance, such that an incoming pizza order can be routed to the best pizzaorder- taker – who may not be the same as the best political campaigner&lt;/span&gt; (Hornik 2007). Contractors can be de-accredited at any time.&lt;br /&gt;&lt;/blockquote&gt;I find the similarities with Mechanical Turk striking but I can clearly see how LiveOps differentiates itself by handling tasks that are not suitable for the Mechanical Turk platform. I also find it mildly entertaining that I am using LiveOps as an example in class when we talk about VoIP, but I never thought of actually digging deeper to see how they work.&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;On a tangentially related note, if you want to find papers related to human computation, you can visit a wiki that we created at &lt;a href="http://hcomp2009.wikispaces.com/"&gt;http://hcomp2009.wikispaces.com/&lt;/a&gt;. Feel free to add more papers, add notes to the current papers, or simply send suggestions on how to improve it. In the &lt;a href="http://www.hcomp2009.org/"&gt;Human Computation Workshop (HComp 2009)&lt;/a&gt; we are trying to bring together people who are interested in all aspects of human computation, and the wiki is just one part of this effort.&lt;/div&gt;&lt;div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/7118563403027467631-3491235541838056492?l=behind-the-enemy-lines.blogspot.com'/&gt;&lt;/div&gt;&lt;div class="feedflare"&gt;
&lt;a href="http://feeds.feedburner.com/~ff/AComputerScientistInABusinessSchool?a=mEpPD1OtQ4A:xYQbEhP9j2U:yIl2AUoC8zA"&gt;&lt;img src="http://feeds.feedburner.com/~ff/AComputerScientistInABusinessSchool?d=yIl2AUoC8zA" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/AComputerScientistInABusinessSchool?a=mEpPD1OtQ4A:xYQbEhP9j2U:4cEx4HpKnUU"&gt;&lt;img src="http://feeds.feedburner.com/~ff/AComputerScientistInABusinessSchool?i=mEpPD1OtQ4A:xYQbEhP9j2U:4cEx4HpKnUU" border="0"&gt;&lt;/img&gt;&lt;/a&gt;
&lt;/div&gt;&lt;img src="http://feeds.feedburner.com/~r/AComputerScientistInABusinessSchool/~4/mEpPD1OtQ4A" height="1" width="1"/&gt;</content><link rel="replies" type="application/atom+xml" href="http://behind-the-enemy-lines.blogspot.com/feeds/3491235541838056492/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="http://behind-the-enemy-lines.blogspot.com/2009/04/liveops-and-human-computation.html#comment-form" title="1 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/7118563403027467631/posts/default/3491235541838056492?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/7118563403027467631/posts/default/3491235541838056492?v=2" /><link rel="alternate" type="text/html" href="http://feedproxy.google.com/~r/AComputerScientistInABusinessSchool/~3/mEpPD1OtQ4A/liveops-and-human-computation.html" title="LiveOps and Human Computation" /><author><name>Panos Ipeirotis</name><uri>http://www.blogger.com/profile/15283752183704062501</uri><email>noreply@blogger.com</email><gd:extendedProperty name="OpenSocialUserId" value="12579610116507126669" /></author><thr:total xmlns:thr="http://purl.org/syndication/thread/1.0">1</thr:total><feedburner:origLink>http://behind-the-enemy-lines.blogspot.com/2009/04/liveops-and-human-computation.html</feedburner:origLink></entry><entry gd:etag="W/&quot;CEINQ3c8fip7ImA9WxNSEE4.&quot;"><id>tag:blogger.com,1999:blog-7118563403027467631.post-2798817603567815971</id><published>2009-03-19T10:33:00.015-04:00</published><updated>2009-08-23T09:03:12.976-04:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2009-08-23T09:03:12.976-04:00</app:edited><category scheme="http://www.blogger.com/atom/ns#" term="statistics" /><category scheme="http://www.blogger.com/atom/ns#" term="research" /><category scheme="http://www.blogger.com/atom/ns#" term="power law" /><title>Fitting a Power-Law with Censored Data</title><content type="html">A few days back, I was describing how to model the distribution of waiting times for Mechanical Turk. There, I described how to use the maximum likelihood estimator to infer the parameter of the power-law, using as input to the estimator the duration of the &lt;span style="font-weight: bold;"&gt;completed &lt;/span&gt;tasks.&lt;br /&gt;&lt;br /&gt;However, this approach introduces a bias: There are tasks that we observed that have &lt;span style="font-weight: bold;"&gt;not&lt;/span&gt; finished yet. Take a look for example at the two distribution plots for the completion time of tasks on Mechanical Turk:&lt;br /&gt;&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://2.bp.blogspot.com/_skxW9kDLt_c/SYflJk_O_GI/AAAAAAAAAWg/ac-12cWs7bU/s1600-h/mturk-duration-HITs.PNG"&gt;&lt;img style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer; width: 400px; height: 273px;" src="http://2.bp.blogspot.com/_skxW9kDLt_c/SYflJk_O_GI/AAAAAAAAAWg/ac-12cWs7bU/s400/mturk-duration-HITs.PNG" alt="" id="BLOGGER_PHOTO_ID_5298455439335160930" border="0" /&gt;&lt;/a&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://2.bp.blogspot.com/_skxW9kDLt_c/SYt1WoxKwMI/AAAAAAAAAWo/1yYmhYozrsI/s1600-h/Mturk-duration-task-CDF.PNG"&gt;&lt;img style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer; width: 400px; height: 230px;" src="http://2.bp.blogspot.com/_skxW9kDLt_c/SYt1WoxKwMI/AAAAAAAAAWo/1yYmhYozrsI/s400/Mturk-duration-task-CDF.PNG" alt="" id="BLOGGER_PHOTO_ID_5299458418292736194" border="0" /&gt;&lt;/a&gt;You see that there is a "cutoff" in the pdf plot at some point between 512 and 1024 hours. Similarly, we see a sudden drop in the cdf plot. This is just the effect of keeping only the completed tasks, ignoring the tasks that are still running.&lt;br /&gt;&lt;br /&gt;By using only the completed tasks, we effectively ignore the information provided by the incomplete tasks: knowing that a task has not finished after running for $U$ hours, gives us information that the duration of the task will be &lt;span style="font-style: italic;"&gt;at least&lt;/span&gt; $U$: This is valuable information. Such data points are called &lt;span style="font-weight: bold;"&gt;censored&lt;/span&gt;. Let me give some background on censored data for readers not familiar with the concept.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;What do you mean by calling a data point censored?&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;In general, we call a data point censored when we cannot get the exact value for the data point, but we can provide a lower bound for its value, or an upper value, or both. According to the type of bounds, the data points are classified as: right censored (lower bound), left censored (upper bound), and interval censored (both lower and upper bound).&lt;br /&gt;&lt;br /&gt;By far the most typical type of censored data is the right censored data. The duration of the unfinished tasks on Mechanical Turk are right censored. In a salary survey if the income is marked as "100K+ per year" is also a right censored data point. The lifetime of a patient treated by a drug who is still alive at the end of the study is also a right censored data point.&lt;br /&gt;&lt;br /&gt;A left censored point is an observation for which we know its maximum possible value. In a salary survey if the income is marked as "Less than 10K+ per year" this is a left censored data point.&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;How can we modify a maximum likelihood estimator to use censored data points?&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-style: italic;"&gt;Deriving the simple MLE estimator&lt;/span&gt;&lt;span style="font-style: italic;"&gt; (no censored data points)&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;Let's see first how we derive a maximum likelihood estimator. I will use as an example the continuous power-law distribution but the process is largely the same for other distributions as well. For the continuous power law, the probability density function is&lt;br /&gt;&lt;br /&gt;&lt;div style="text-align: center;"&gt;$Pr(x=x_i) = (\alpha-1) \cdot {x_i}^{-\alpha}$&lt;br /&gt;&lt;/div&gt;&lt;br /&gt;First, let's say that we observe $n$ data points with values $x_1, \ldots, x_n$. We assume that the data points come indeed from a continuous power-law and we are trying to estimate the most likely parameter $\alpha$ that generated these points.&lt;br /&gt;&lt;br /&gt;The likelihood of observing these data points is the probability of seeing the the value $x_1$ &lt;span style="font-weight: bold; font-style: italic;"&gt;and&lt;/span&gt; the value $x_2$ &lt;span style="font-weight: bold; font-style: italic;"&gt;and &lt;/span&gt;the value $x_3$, ... &lt;span style="font-weight: bold;"&gt;and &lt;/span&gt;the value $x_n$. So, assuming independence across the points, the likelihood function is:&lt;br /&gt;&lt;br /&gt;&lt;div style="text-align: center;"&gt;$L(\alpha) = \prod_{i=1}^n Pr(x=x_i) $&lt;br /&gt;&lt;br /&gt;&lt;/div&gt;By substituting the value $Pr(x=x_i)$ with the power-law instantiation, we have:&lt;br /&gt;&lt;br /&gt;&lt;div style="text-align: center;"&gt;$L(\alpha) = \prod_{i=1}^n (\alpha-1) \cdot {x_i}^{-\alpha}$&lt;br /&gt;&lt;/div&gt;&lt;br /&gt;We need to find the $\alpha$ that maximizes $l(\alpha)$. Instead of maximizing $l(\alpha)$ directly, we opt to maximize the logarithm which is also maximized at the same value, and it much easier to work with analytically:&lt;br /&gt;&lt;br /&gt;&lt;div style="text-align: center;"&gt;$l(\alpha) = \ln(L(\alpha)) = \sum_{i=1}^n \ln\left( (\alpha-1) \cdot {x_i}^{-\alpha} \right)$&lt;br /&gt;&lt;div style="text-align: left;"&gt;which results in:&lt;br /&gt;&lt;/div&gt;$l(\alpha)  = n \ln(\alpha-1) - \alpha \sum_{i=1}^n \log(x_i)$&lt;br /&gt;&lt;/div&gt;&lt;br /&gt;To find the value of $\alpha$ that maximizes $l(\alpha)$, we take the derivative with respect to $\alpha$ and set the derivative equal to 0:&lt;br /&gt;&lt;br /&gt;&lt;div style="text-align: center;"&gt;$\frac{d}{d \alpha}l(\alpha) = 0$&lt;br /&gt;&lt;/div&gt;&lt;br /&gt;so&lt;br /&gt;&lt;br /&gt;&lt;div style="text-align: center;"&gt;$n \frac{1}{\alpha-1} - \sum_{i=1}^n \log(x_i) = 0$&lt;br /&gt;&lt;/div&gt;&lt;br /&gt;So the MLE estimator is:&lt;br /&gt;&lt;br /&gt;&lt;div style="text-align: center;"&gt;&lt;span style="font-weight: bold;"&gt;$\alpha  = 1 + n \left[\sum_{i=1}^n \ln(x_i) \right]^{-1}$&lt;/span&gt;&lt;br /&gt;&lt;/div&gt;&lt;br /&gt;Let's move on now to the case of censored data.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-style: italic;"&gt;Deriving the MLE estimator with right censored data&lt;br /&gt;&lt;br /&gt;&lt;/span&gt;Now, let's say that we observe $n$+$m$ data points drawn from the distribution: We have $n$ completed tasks, with duration values $x_1, \ldots, x_n$ and $m$ non-completed tasks that had duration $u_1, \ldots, u_m$ when we observed them but they were still going on (and so their final duration is going to be longer than the currently observed one --- right censored points).&lt;br /&gt;&lt;br /&gt;Again, we assume that the data points come indeed from a continuous power-law and we are trying to estimate the most likely parameter $\alpha$ that generated these points.&lt;br /&gt;&lt;br /&gt;Now the likelihood of observing these data points is slightly different. We need to accommodate the fact that the duration for the censored observations will be longer than the currently observed one. So we get this version, (&lt;a href="http://en.wikipedia.org/wiki/Survival_analysis#Fitting_parameters_to_data"&gt;for more details see also the related Wikipedia entry&lt;/a&gt;) boldfacing the part that deals with the censored data:&lt;br /&gt;&lt;br /&gt;&lt;div style="text-align: center;"&gt;$L(\alpha) = \prod_{i=1}^n Pr(x=x_i) \cdot$ $ \mathbf{\prod_{i=1}^m Pr(x&gt;u_i)}$&lt;br /&gt;&lt;/div&gt;&lt;br /&gt;Since $Pr(x&gt;u_i) = {u_i}^{-(\alpha-1)}$ the likelihood becomes:&lt;br /&gt;&lt;br /&gt;&lt;div style="text-align: center;"&gt;$L(\alpha) = \prod_{i=1}^n (\alpha-1) {x_i}^{-\alpha} \cdot \prod_{i=1}^m {u_i}^{-\alpha+1}$&lt;br /&gt;&lt;/div&gt;&lt;br /&gt;Analogously to the case above, we take the log:&lt;br /&gt;&lt;br /&gt;&lt;div style="text-align: center;"&gt;$l(\alpha) = \ln(L(\alpha))$&lt;/div&gt;&lt;div style="text-align: center;"&gt;&lt;br /&gt;&lt;/div&gt;&lt;div style="text-align: left;"&gt;which becomes:&lt;/div&gt;&lt;div style="text-align: center;"&gt;&lt;br /&gt;&lt;/div&gt;&lt;div style="text-align: center;"&gt;$l(\alpha) =n \ln(\alpha-1) -\alpha \sum_{i=1}^n \ln(x_i) + \sum_{i=1}^m  \ln(u_i) -\alpha \sum_{i=1}^m  \ln(u_i)$&lt;br /&gt;&lt;/div&gt;&lt;br /&gt;and set the derivative equal to 0:&lt;br /&gt;&lt;br /&gt;&lt;div style="text-align: center;"&gt;$\frac{d}{d \alpha}l(\alpha) = 0$&lt;br /&gt;&lt;/div&gt;&lt;div style="text-align: center;"&gt;$n \frac{1}{\alpha-1} -\sum_{i=1}^n \ln(x_i) -\sum_{i=1}^m  \ln(u_i) = 0$&lt;br /&gt;&lt;/div&gt;&lt;br /&gt;So, here is the MLE estimator that uses censored data points as well:&lt;br /&gt;&lt;br /&gt;&lt;div style="text-align: center;"&gt;&lt;span style="font-weight: bold;"&gt;$\alpha= 1 + n \left[ \sum_{i=1}^n \ln(x_i) +\sum_{i=1}^m \ln(u_i)\right]^{-1}$&lt;/span&gt;&lt;br /&gt;&lt;/div&gt;&lt;br /&gt;As you will see the MLE estimator with censored data also contains the factor $\sum_{i=1}^m  \ln(u_i) \right$ but the normalizing constant is still $n$ and not $n+m$.&lt;br /&gt;&lt;br /&gt;So, what is the difference? For power laws, the higher the fraction of censored points, the lower the exponent, i.e., the tail is heavier: an expected outcome as the censored data points tend to represent unobserved tail points. For example, for Mechanical Turk, the exponent without censored data points is 1.35 and with censored data points is 1.29. Not a big difference in this case, but there are situations where this correction can change significantly the overall analysis (e.g., if the exponent is close to 2).&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/7118563403027467631-2798817603567815971?l=behind-the-enemy-lines.blogspot.com'/&gt;&lt;/div&gt;&lt;div class="feedflare"&gt;
&lt;a href="http://feeds.feedburner.com/~ff/AComputerScientistInABusinessSchool?a=2YwKst0rm0A:Wq9sAbzNK8E:yIl2AUoC8zA"&gt;&lt;img src="http://feeds.feedburner.com/~ff/AComputerScientistInABusinessSchool?d=yIl2AUoC8zA" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/AComputerScientistInABusinessSchool?a=2YwKst0rm0A:Wq9sAbzNK8E:4cEx4HpKnUU"&gt;&lt;img src="http://feeds.feedburner.com/~ff/AComputerScientistInABusinessSchool?i=2YwKst0rm0A:Wq9sAbzNK8E:4cEx4HpKnUU" border="0"&gt;&lt;/img&gt;&lt;/a&gt;
&lt;/div&gt;&lt;img src="http://feeds.feedburner.com/~r/AComputerScientistInABusinessSchool/~4/2YwKst0rm0A" height="1" width="1"/&gt;</content><link rel="replies" type="application/atom+xml" href="http://behind-the-enemy-lines.blogspot.com/feeds/2798817603567815971/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="http://behind-the-enemy-lines.blogspot.com/2009/02/fitting-power-law-with-censored-data.html#comment-form" title="0 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/7118563403027467631/posts/default/2798817603567815971?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/7118563403027467631/posts/default/2798817603567815971?v=2" /><link rel="alternate" type="text/html" href="http://feedproxy.google.com/~r/AComputerScientistInABusinessSchool/~3/2YwKst0rm0A/fitting-power-law-with-censored-data.html" title="Fitting a Power-Law with Censored Data" /><author><name>Panos Ipeirotis</name><uri>http://www.blogger.com/profile/15283752183704062501</uri><email>noreply@blogger.com</email><gd:extendedProperty name="OpenSocialUserId" value="12579610116507126669" /></author><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="http://2.bp.blogspot.com/_skxW9kDLt_c/SYflJk_O_GI/AAAAAAAAAWg/ac-12cWs7bU/s72-c/mturk-duration-HITs.PNG" height="72" width="72" /><thr:total xmlns:thr="http://purl.org/syndication/thread/1.0">0</thr:total><feedburner:origLink>http://behind-the-enemy-lines.blogspot.com/2009/02/fitting-power-law-with-censored-data.html</feedburner:origLink></entry><entry gd:etag="W/&quot;AkEGRXc9fyp7ImA9WxVUEk4.&quot;"><id>tag:blogger.com,1999:blog-7118563403027467631.post-637156019528150411</id><published>2009-03-16T16:53:00.007-04:00</published><updated>2009-03-16T17:23:44.967-04:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2009-03-16T17:23:44.967-04:00</app:edited><category scheme="http://www.blogger.com/atom/ns#" term="statistics" /><category scheme="http://www.blogger.com/atom/ns#" term="mechanical turk" /><category scheme="http://www.blogger.com/atom/ns#" term="research" /><title>Turker Demographics vs Internet Demographics</title><content type="html">My post about the &lt;a href="http://behind-the-enemy-lines.blogspot.com/2008/03/mechanical-turk-demographics.html"&gt;demographics of Mechanical Turk workers&lt;/a&gt; is one of the most popular posts on this blog, and I often see the results being discussed in other forums as well. One of the common questions that I see is: "How these demographics compare to the general Internet population?"&lt;br /&gt;&lt;br /&gt;Apparently, this is not a very easy question to answer. It is relatively hard to find publicly available data about the demographics of &lt;span style="font-style: italic;"&gt;Internet &lt;/span&gt;users. Fortunately, I found some data from ComScore, dating back to June 2008. I also had data about Mechanical Turk from two separate surveys that I ran on October 2008 and on December 2008 (both asking 1000 Turkers). The results across the two MTurk surveys were rather consistent, indicating that the results are rather trustworthy.&lt;br /&gt;&lt;br /&gt;So, how Turkers compare to the general US Internet population? The short answers:&lt;br /&gt;&lt;ul&gt;&lt;li&gt;&lt;span style="font-weight: bold;"&gt;Turkers are younger&lt;/span&gt;. 54% of Turkers are between 21-35 years old, compared to 22% of the general population.&lt;/li&gt;&lt;li&gt;&lt;span style="font-weight: bold;"&gt;Turkers are mainly female.&lt;/span&gt; 70% of the Turkers are female, compared to 50% of the general population.&lt;br /&gt;&lt;/li&gt;&lt;li&gt;&lt;span style="font-weight: bold;"&gt;Turkers have lower income&lt;/span&gt;. 65% of Turkers have household income less than 60K, compared to 45% of the general population.&lt;/li&gt;&lt;li&gt;&lt;span style="font-weight: bold;"&gt;Turkers have smaller families&lt;/span&gt;. 55% of Turkers do not have children, compared to the 40% of the general population.&lt;/li&gt;&lt;li&gt;Geographical distribution of Turkers and Internet users is similar.&lt;/li&gt;&lt;li&gt;Race composition of Turkers and Internet users is similar, although there are slightly more Asians on Mechanical Turk.&lt;br /&gt;&lt;/li&gt;&lt;/ul&gt;Of course, the last two bullets may be simply the result of the first: Younger people have lower income, do not have children, and live in smaller households.&lt;br /&gt;&lt;br /&gt;For those of you that would like to have a more detailed look at the statistics, here is the corresponding table:&lt;br /&gt;&lt;br /&gt;&lt;table border="2" cellpadding="0" cellspacing="0"&gt; &lt;tbody&gt;&lt;tr&gt;  &lt;td&gt;&lt;br /&gt;&lt;/td&gt;  &lt;td&gt;June 2008&lt;/td&gt;  &lt;td&gt;October 2008&lt;/td&gt;  &lt;td&gt;December 2008&lt;/td&gt; &lt;/tr&gt; &lt;tr&gt;  &lt;td&gt;&lt;br /&gt;&lt;/td&gt;  &lt;td&gt;Internet&lt;/td&gt;  &lt;td&gt;Turks&lt;/td&gt;  &lt;td&gt;Turks&lt;/td&gt; &lt;/tr&gt; &lt;tr&gt;  &lt;td&gt;Total Audience&lt;/td&gt;  &lt;td&gt;100&lt;/td&gt;  &lt;td&gt;100&lt;/td&gt;  &lt;td&gt;100&lt;/td&gt; &lt;/tr&gt; &lt;tr&gt;  &lt;td style="font-weight: bold;"&gt;Persons - Age&lt;/td&gt;  &lt;td&gt;&lt;br /&gt;&lt;/td&gt;  &lt;td&gt;&lt;br /&gt;&lt;/td&gt; &lt;/tr&gt; &lt;tr&gt;  &lt;td&gt;    Persons: 15+&lt;/td&gt;  &lt;td&gt;85.9&lt;/td&gt;  &lt;td&gt;100&lt;/td&gt;  &lt;td&gt;100&lt;/td&gt; &lt;/tr&gt; &lt;tr&gt;  &lt;td&gt;    Persons: 18+&lt;/td&gt;  &lt;td&gt;80.1&lt;/td&gt;  &lt;td&gt;99.6&lt;/td&gt;  &lt;td&gt;99.5&lt;/td&gt; &lt;/tr&gt; &lt;tr&gt;  &lt;td&gt;    Persons: 21+&lt;/td&gt;  &lt;td&gt;74.3&lt;/td&gt;  &lt;td&gt;92.9&lt;/td&gt;  &lt;td&gt;91.1&lt;/td&gt; &lt;/tr&gt; &lt;tr&gt;  &lt;td&gt;    Persons: 35+&lt;/td&gt;  &lt;td&gt;52.4&lt;/td&gt;  &lt;td&gt;39.3&lt;/td&gt;  &lt;td&gt;37.1&lt;/td&gt; &lt;/tr&gt; &lt;tr&gt;  &lt;td&gt;    Persons: 50+&lt;/td&gt;  &lt;td&gt;24.3&lt;/td&gt;  &lt;td&gt;11.2&lt;/td&gt;  &lt;td&gt;10.7&lt;/td&gt; &lt;/tr&gt; &lt;tr&gt;  &lt;td&gt;    Persons: 55+&lt;/td&gt;  &lt;td&gt;16.2&lt;/td&gt;  &lt;td&gt;5.2&lt;/td&gt;  &lt;td&gt;5.4&lt;/td&gt; &lt;/tr&gt; &lt;tr&gt;  &lt;td&gt;    Persons: 2-11&lt;/td&gt;  &lt;td&gt;9.5&lt;/td&gt;  &lt;td&gt;0&lt;/td&gt;  &lt;td&gt;0&lt;/td&gt; &lt;/tr&gt; &lt;tr&gt;  &lt;td&gt;    Persons: 2-17&lt;/td&gt;  &lt;td&gt;19.9&lt;/td&gt;  &lt;td&gt;0.2&lt;/td&gt;  &lt;td&gt;0.4&lt;/td&gt; &lt;/tr&gt; &lt;tr&gt;  &lt;td&gt;    Persons: 6-11&lt;/td&gt;  &lt;td&gt;7.4&lt;/td&gt;  &lt;td&gt;0&lt;/td&gt;  &lt;td&gt;0&lt;/td&gt; &lt;/tr&gt; &lt;tr&gt;  &lt;td&gt;    Persons: 6-14&lt;/td&gt;  &lt;td&gt;12&lt;/td&gt;  &lt;td&gt;0&lt;/td&gt;  &lt;td&gt;0&lt;/td&gt; &lt;/tr&gt; &lt;tr&gt;  &lt;td&gt;    Persons: 9-14&lt;/td&gt;  &lt;td&gt;8.9&lt;/td&gt;  &lt;td&gt;0&lt;/td&gt;  &lt;td&gt;0&lt;/td&gt; &lt;/tr&gt; &lt;tr&gt;  &lt;td&gt;    Persons: 12-17&lt;/td&gt;  &lt;td&gt;10.4&lt;/td&gt;  &lt;td&gt;0.2&lt;/td&gt;  &lt;td&gt;0.4&lt;/td&gt; &lt;/tr&gt; &lt;tr&gt;  &lt;td&gt;    Persons: 12-24&lt;/td&gt;  &lt;td&gt;22.9&lt;/td&gt;  &lt;td&gt;19&lt;/td&gt;  &lt;td&gt;21.5&lt;/td&gt; &lt;/tr&gt; &lt;tr&gt;  &lt;td&gt;    Persons: 12-34&lt;/td&gt;  &lt;td&gt;38&lt;/td&gt;  &lt;td&gt;57.8&lt;/td&gt;  &lt;td&gt;60&lt;/td&gt; &lt;/tr&gt; &lt;tr&gt;  &lt;td&gt;    Persons: 12-49&lt;/td&gt;  &lt;td&gt;66.2&lt;/td&gt;  &lt;td&gt;87.4&lt;/td&gt;  &lt;td&gt;88.2&lt;/td&gt; &lt;/tr&gt; &lt;tr&gt;  &lt;td&gt;    Persons: 18-24&lt;/td&gt;  &lt;td&gt;12.5&lt;/td&gt;  &lt;td&gt;18.7&lt;/td&gt;  &lt;td&gt;21.1&lt;/td&gt; &lt;/tr&gt; &lt;tr&gt;  &lt;td&gt;    Persons: 18-34&lt;/td&gt;  &lt;td&gt;27.6&lt;/td&gt;  &lt;td&gt;57.5&lt;/td&gt;  &lt;td&gt;59.7&lt;/td&gt; &lt;/tr&gt; &lt;tr&gt;  &lt;td&gt;    Persons: 18-49&lt;/td&gt;  &lt;td&gt;55.8&lt;/td&gt;  &lt;td&gt;87.2&lt;/td&gt;  &lt;td&gt;87.8&lt;/td&gt; &lt;/tr&gt; &lt;tr&gt;  &lt;td&gt;    Persons: 21-34&lt;/td&gt;  &lt;td&gt;21.9&lt;/td&gt;  &lt;td&gt;53.3&lt;/td&gt;  &lt;td&gt;53.9&lt;/td&gt; &lt;/tr&gt; &lt;tr&gt;  &lt;td&gt;    Persons: 21-49&lt;/td&gt;  &lt;td&gt;50&lt;/td&gt;  &lt;td&gt;82.9&lt;/td&gt;  &lt;td&gt;82&lt;/td&gt; &lt;/tr&gt; &lt;tr&gt;  &lt;td&gt;    Persons: 25-34&lt;/td&gt;  &lt;td&gt;15.1&lt;/td&gt;  &lt;td&gt;38.8&lt;/td&gt;  &lt;td&gt;38.6&lt;/td&gt; &lt;/tr&gt; &lt;tr&gt;  &lt;td&gt;    Persons: 25-49&lt;/td&gt;  &lt;td&gt;43.2&lt;/td&gt;  &lt;td&gt;68.4&lt;/td&gt;  &lt;td&gt;66.7&lt;/td&gt; &lt;/tr&gt; &lt;tr&gt;  &lt;td&gt;    Persons: 25-54&lt;/td&gt;  &lt;td&gt;51.3&lt;/td&gt;  &lt;td&gt;75.2&lt;/td&gt;  &lt;td&gt;72.3&lt;/td&gt; &lt;/tr&gt; &lt;tr&gt;  &lt;td&gt;    Persons: 35-44&lt;/td&gt;  &lt;td&gt;18.7&lt;/td&gt;  &lt;td&gt;22.4&lt;/td&gt;  &lt;td&gt;21.5&lt;/td&gt; &lt;/tr&gt; &lt;tr&gt;  &lt;td&gt;    Persons: 35-49&lt;/td&gt;  &lt;td&gt;28.2&lt;/td&gt;  &lt;td&gt;29.7&lt;/td&gt;  &lt;td&gt;28.1&lt;/td&gt; &lt;/tr&gt; &lt;tr&gt;  &lt;td&gt;    Persons: 35-54&lt;/td&gt;  &lt;td&gt;36.2&lt;/td&gt;  &lt;td&gt;36.4&lt;/td&gt;  &lt;td&gt;33.7&lt;/td&gt; &lt;/tr&gt; &lt;tr&gt;  &lt;td&gt;    Persons: 35-64&lt;/td&gt;  &lt;td&gt;46.8&lt;/td&gt;  &lt;td&gt;41.4&lt;/td&gt;  &lt;td&gt;38.8&lt;/td&gt; &lt;/tr&gt; &lt;tr&gt;  &lt;td&gt;    Persons: 45-54&lt;/td&gt;  &lt;td&gt;17.6&lt;/td&gt;  &lt;td&gt;14&lt;/td&gt;  &lt;td&gt;12.2&lt;/td&gt; &lt;/tr&gt; &lt;tr&gt;  &lt;td&gt;    Persons: 45-64&lt;/td&gt;  &lt;td&gt;28.1&lt;/td&gt;  &lt;td&gt;19&lt;/td&gt;  &lt;td&gt;17.4&lt;/td&gt; &lt;/tr&gt; &lt;tr&gt;  &lt;td&gt;    Persons: 55-64&lt;/td&gt;  &lt;td&gt;10.5&lt;/td&gt;  &lt;td&gt;5&lt;/td&gt;  &lt;td&gt;5.2&lt;/td&gt; &lt;/tr&gt; &lt;tr&gt;  &lt;td&gt;    Persons: 65+&lt;/td&gt;  &lt;td&gt;5.7&lt;/td&gt;  &lt;td&gt;0.7&lt;/td&gt;  &lt;td&gt;1.1&lt;/td&gt; &lt;/tr&gt; &lt;tr&gt;  &lt;td style="font-weight: bold;"&gt;Males - Age&lt;/td&gt;  &lt;td&gt;&lt;br /&gt;&lt;/td&gt;  &lt;td&gt;&lt;br /&gt;&lt;/td&gt; &lt;/tr&gt; &lt;tr&gt;  &lt;td&gt;    All Males&lt;/td&gt;  &lt;td&gt;49.5&lt;/td&gt;  &lt;td&gt;28&lt;/td&gt;  &lt;td&gt;36.6&lt;/td&gt; &lt;/tr&gt; &lt;tr&gt;  &lt;td&gt;    Male: 15+&lt;/td&gt;  &lt;td&gt;42.1&lt;/td&gt;  &lt;td&gt;28&lt;/td&gt;  &lt;td&gt;36.6&lt;/td&gt; &lt;/tr&gt; &lt;tr&gt;  &lt;td&gt;    Male: 18+&lt;/td&gt;  &lt;td&gt;39.1&lt;/td&gt;  &lt;td&gt;27.8&lt;/td&gt;  &lt;td&gt;36.3&lt;/td&gt; &lt;/tr&gt; &lt;tr&gt;  &lt;td&gt;    Male: 21+&lt;/td&gt;  &lt;td&gt;36.1&lt;/td&gt;  &lt;td&gt;24.7&lt;/td&gt;  &lt;td&gt;32.4&lt;/td&gt; &lt;/tr&gt; &lt;tr&gt;  &lt;td&gt;    Male: 35+&lt;/td&gt;  &lt;td&gt;25.7&lt;/td&gt;  &lt;td&gt;9.5&lt;/td&gt;  &lt;td&gt;11.3&lt;/td&gt; &lt;/tr&gt; &lt;tr&gt;  &lt;td&gt;    Male: 50+&lt;/td&gt;  &lt;td&gt;12&lt;/td&gt;  &lt;td&gt;2.8&lt;/td&gt;  &lt;td&gt;2.6&lt;/td&gt; &lt;/tr&gt; &lt;tr&gt;  &lt;td&gt;    Male: 55+&lt;/td&gt;  &lt;td&gt;8.1&lt;/td&gt;  &lt;td&gt;1.4&lt;/td&gt;  &lt;td&gt;1.1&lt;/td&gt; &lt;/tr&gt; &lt;tr&gt;  &lt;td&gt;    Male: 2-11&lt;/td&gt;  &lt;td&gt;4.9&lt;/td&gt;  &lt;td&gt;0&lt;/td&gt;  &lt;td&gt;0&lt;/td&gt; &lt;/tr&gt; &lt;tr&gt;  &lt;td&gt;    Male: 2-17&lt;/td&gt;  &lt;td&gt;10.4&lt;/td&gt;  &lt;td&gt;0.1&lt;/td&gt;  &lt;td&gt;0.2&lt;/td&gt; &lt;/tr&gt; &lt;tr&gt;  &lt;td&gt;    Male: 6-11&lt;/td&gt;  &lt;td&gt;3.9&lt;/td&gt;  &lt;td&gt;0&lt;/td&gt;  &lt;td&gt;0&lt;/td&gt; &lt;/tr&gt; &lt;tr&gt;  &lt;td&gt;    Male: 6-14&lt;/td&gt;  &lt;td&gt;6.3&lt;/td&gt;  &lt;td&gt;0&lt;/td&gt;  &lt;td&gt;0&lt;/td&gt; &lt;/tr&gt; &lt;tr&gt;  &lt;td&gt;    Male: 9-14&lt;/td&gt;  &lt;td&gt;4.5&lt;/td&gt;  &lt;td&gt;0&lt;/td&gt;  &lt;td&gt;0&lt;/td&gt; &lt;/tr&gt; &lt;tr&gt;  &lt;td&gt;    Male: 12-17&lt;/td&gt;  &lt;td&gt;5.5&lt;/td&gt;  &lt;td&gt;0.1&lt;/td&gt;  &lt;td&gt;0.2&lt;/td&gt; &lt;/tr&gt; &lt;tr&gt;  &lt;td&gt;    Male: 12-24&lt;/td&gt;  &lt;td&gt;11.6&lt;/td&gt;  &lt;td&gt;7.5&lt;/td&gt;  &lt;td&gt;9.1&lt;/td&gt; &lt;/tr&gt; &lt;tr&gt;  &lt;td&gt;    Male: 12-34&lt;/td&gt;  &lt;td&gt;18.9&lt;/td&gt;  &lt;td&gt;17.3&lt;/td&gt;  &lt;td&gt;24.2&lt;/td&gt; &lt;/tr&gt; &lt;tr&gt;  &lt;td&gt;    Male: 12-49&lt;/td&gt;  &lt;td&gt;32.5&lt;/td&gt;  &lt;td&gt;25&lt;/td&gt;  &lt;td&gt;33.9&lt;/td&gt; &lt;/tr&gt; &lt;tr&gt;  &lt;td&gt;    Male: 18-24&lt;/td&gt;  &lt;td&gt;6.1&lt;/td&gt;  &lt;td&gt;7.4&lt;/td&gt;  &lt;td&gt;8.9&lt;/td&gt; &lt;/tr&gt; &lt;tr&gt;  &lt;td&gt;    Male: 18-34&lt;/td&gt;  &lt;td&gt;13.4&lt;/td&gt;  &lt;td&gt;17.2&lt;/td&gt;  &lt;td&gt;23.9&lt;/td&gt; &lt;/tr&gt; &lt;tr&gt;  &lt;td&gt;    Male: 18-49&lt;/td&gt;  &lt;td&gt;27.1&lt;/td&gt;  &lt;td&gt;24.9&lt;/td&gt;  &lt;td&gt;33.7&lt;/td&gt; &lt;/tr&gt; &lt;tr&gt;  &lt;td&gt;    Male: 21-34&lt;/td&gt;  &lt;td&gt;10.4&lt;/td&gt;  &lt;td&gt;15.2&lt;/td&gt;  &lt;td&gt;21.1&lt;/td&gt; &lt;/tr&gt; &lt;tr&gt;  &lt;td&gt;    Male: 21-49&lt;/td&gt;  &lt;td&gt;24.1&lt;/td&gt;  &lt;td&gt;22.9&lt;/td&gt;  &lt;td&gt;30.8&lt;/td&gt; &lt;/tr&gt; &lt;tr&gt;  &lt;td&gt;    Males: 25-34&lt;/td&gt;  &lt;td&gt;7.3&lt;/td&gt;  &lt;td&gt;9.8&lt;/td&gt;  &lt;td&gt;15&lt;/td&gt; &lt;/tr&gt; &lt;tr&gt;  &lt;td&gt;    Male: 25-49&lt;/td&gt;  &lt;td&gt;20.9&lt;/td&gt;  &lt;td&gt;17.6&lt;/td&gt;  &lt;td&gt;24.8&lt;/td&gt; &lt;/tr&gt; &lt;tr&gt;  &lt;td&gt;    Male: 25-54&lt;/td&gt;  &lt;td&gt;24.8&lt;/td&gt;  &lt;td&gt;19&lt;/td&gt;  &lt;td&gt;26.3&lt;/td&gt; &lt;/tr&gt; &lt;tr&gt;  &lt;td&gt;    Males: 35-44&lt;/td&gt;  &lt;td&gt;9.1&lt;/td&gt;  &lt;td&gt;6&lt;/td&gt;  &lt;td&gt;8&lt;/td&gt; &lt;/tr&gt; &lt;tr&gt;  &lt;td&gt;    Male: 35-49&lt;/td&gt;  &lt;td&gt;13.7&lt;/td&gt;  &lt;td&gt;7.7&lt;/td&gt;  &lt;td&gt;9.7&lt;/td&gt; &lt;/tr&gt; &lt;tr&gt;  &lt;td&gt;    Male: 35-54&lt;/td&gt;  &lt;td&gt;17.5&lt;/td&gt;  &lt;td&gt;9.1&lt;/td&gt;  &lt;td&gt;11.2&lt;/td&gt; &lt;/tr&gt; &lt;tr&gt;  &lt;td&gt;    Male: 35-64&lt;/td&gt;  &lt;td&gt;22.6&lt;/td&gt;  &lt;td&gt;10.6&lt;/td&gt;  &lt;td&gt;12.3&lt;/td&gt; &lt;/tr&gt; &lt;tr&gt;  &lt;td&gt;    Male: 45-54&lt;/td&gt;  &lt;td&gt;8.4&lt;/td&gt;  &lt;td&gt;3.1&lt;/td&gt;  &lt;td&gt;3.2&lt;/td&gt; &lt;/tr&gt; &lt;tr&gt;  &lt;td&gt;    Male: 45-64&lt;/td&gt;  &lt;td&gt;13.5&lt;/td&gt;  &lt;td&gt;4.5&lt;/td&gt;  &lt;td&gt;4.3&lt;/td&gt; &lt;/tr&gt; &lt;tr&gt;  &lt;td&gt;    Males: 55-64&lt;/td&gt;  &lt;td&gt;5.1&lt;/td&gt;  &lt;td&gt;1.4&lt;/td&gt;  &lt;td&gt;1.1&lt;/td&gt; &lt;/tr&gt; &lt;tr&gt;  &lt;td&gt;    Males: 65+&lt;/td&gt;  &lt;td&gt;3&lt;/td&gt;  &lt;td&gt;0&lt;/td&gt;  &lt;td&gt;0.1&lt;/td&gt; &lt;/tr&gt; &lt;tr&gt;  &lt;td style="font-weight: bold;"&gt;Females - Age&lt;/td&gt;  &lt;td&gt;&lt;br /&gt;&lt;/td&gt;  &lt;td&gt;&lt;br /&gt;&lt;/td&gt; &lt;/tr&gt; &lt;tr&gt;  &lt;td&gt;    All Females&lt;/td&gt;  &lt;td&gt;50.5&lt;/td&gt;  &lt;td&gt;72&lt;/td&gt;  &lt;td&gt;63.4&lt;/td&gt; &lt;/tr&gt; &lt;tr&gt;  &lt;td&gt;    Female: 15+&lt;/td&gt;  &lt;td&gt;43.8&lt;/td&gt;  &lt;td&gt;72&lt;/td&gt;  &lt;td&gt;63.4&lt;/td&gt; &lt;/tr&gt; &lt;tr&gt;  &lt;td&gt;    Female: 18+&lt;/td&gt;  &lt;td&gt;41&lt;/td&gt;  &lt;td&gt;71.9&lt;/td&gt;  &lt;td&gt;63.3&lt;/td&gt; &lt;/tr&gt; &lt;tr&gt;  &lt;td&gt;    Female: 21+&lt;/td&gt;  &lt;td&gt;38.2&lt;/td&gt;  &lt;td&gt;68.2&lt;/td&gt;  &lt;td&gt;58.7&lt;/td&gt; &lt;/tr&gt; &lt;tr&gt;  &lt;td&gt;    Female: 35+&lt;/td&gt;  &lt;td&gt;26.8&lt;/td&gt;  &lt;td&gt;29.8&lt;/td&gt;  &lt;td&gt;25.8&lt;/td&gt; &lt;/tr&gt; &lt;tr&gt;  &lt;td&gt;    Female: 50+&lt;/td&gt;  &lt;td&gt;12.3&lt;/td&gt;  &lt;td&gt;8.3&lt;/td&gt;  &lt;td&gt;8.1&lt;/td&gt; &lt;/tr&gt; &lt;tr&gt;  &lt;td&gt;    Female: 55+&lt;/td&gt;  &lt;td&gt;8.1&lt;/td&gt;  &lt;td&gt;3.8&lt;/td&gt;  &lt;td&gt;4.3&lt;/td&gt; &lt;/tr&gt; &lt;tr&gt;  &lt;td&gt;    Female: 2-11&lt;/td&gt;  &lt;td&gt;4.6&lt;/td&gt;  &lt;td&gt;0&lt;/td&gt;  &lt;td&gt;0&lt;/td&gt; &lt;/tr&gt; &lt;tr&gt;  &lt;td&gt;    Female: 2-17&lt;/td&gt;  &lt;td&gt;9.5&lt;/td&gt;  &lt;td&gt;0.1&lt;/td&gt;  &lt;td&gt;0.1&lt;/td&gt; &lt;/tr&gt; &lt;tr&gt;  &lt;td&gt;    Female: 6-11&lt;/td&gt;  &lt;td&gt;3.6&lt;/td&gt;  &lt;td&gt;0&lt;/td&gt;  &lt;td&gt;0&lt;/td&gt; &lt;/tr&gt; &lt;tr&gt;  &lt;td&gt;    Female: 6-14&lt;/td&gt;  &lt;td&gt;5.7&lt;/td&gt;  &lt;td&gt;0&lt;/td&gt;  &lt;td&gt;0&lt;/td&gt; &lt;/tr&gt; &lt;tr&gt;  &lt;td&gt;    Female: 9-14&lt;/td&gt;  &lt;td&gt;4.5&lt;/td&gt;  &lt;td&gt;0&lt;/td&gt;  &lt;td&gt;0&lt;/td&gt; &lt;/tr&gt; &lt;tr&gt;  &lt;td&gt;    Female: 12-17&lt;/td&gt;  &lt;td&gt;4.9&lt;/td&gt;  &lt;td&gt;0.1&lt;/td&gt;  &lt;td&gt;0.1&lt;/td&gt; &lt;/tr&gt; &lt;tr&gt;  &lt;td&gt;    Female: 12-24&lt;/td&gt;  &lt;td&gt;11.3&lt;/td&gt;  &lt;td&gt;11.5&lt;/td&gt;  &lt;td&gt;12.3&lt;/td&gt; &lt;/tr&gt; &lt;tr&gt;  &lt;td&gt;    Female: 12-34&lt;/td&gt;  &lt;td&gt;19.1&lt;/td&gt;  &lt;td&gt;40.5&lt;/td&gt;  &lt;td&gt;35.9&lt;/td&gt; &lt;/tr&gt; &lt;tr&gt;  &lt;td&gt;    Female: 12-49&lt;/td&gt;  &lt;td&gt;33.6&lt;/td&gt;  &lt;td&gt;62.4&lt;/td&gt;  &lt;td&gt;54.3&lt;/td&gt; &lt;/tr&gt; &lt;tr&gt;  &lt;td&gt;    Female: 18-24&lt;/td&gt;  &lt;td&gt;6.4&lt;/td&gt;  &lt;td&gt;11.5&lt;/td&gt;  &lt;td&gt;12.2&lt;/td&gt; &lt;/tr&gt; &lt;tr&gt;  &lt;td&gt;    Female: 18-34&lt;/td&gt;  &lt;td&gt;14.2&lt;/td&gt;  &lt;td&gt;40.5&lt;/td&gt;  &lt;td&gt;35.8&lt;/td&gt; &lt;/tr&gt; &lt;tr&gt;  &lt;td&gt;    Female: 18-49&lt;/td&gt;  &lt;td&gt;28.7&lt;/td&gt;  &lt;td&gt;62.4&lt;/td&gt;  &lt;td&gt;54.1&lt;/td&gt; &lt;/tr&gt; &lt;tr&gt;  &lt;td&gt;    Female: 21-34&lt;/td&gt;  &lt;td&gt;11.5&lt;/td&gt;  &lt;td&gt;38.1&lt;/td&gt;  &lt;td&gt;32.8&lt;/td&gt; &lt;/tr&gt; &lt;tr&gt;  &lt;td&gt;    Female: 21-49&lt;/td&gt;  &lt;td&gt;25.9&lt;/td&gt;  &lt;td&gt;60&lt;/td&gt;  &lt;td&gt;51.2&lt;/td&gt; &lt;/tr&gt; &lt;tr&gt;  &lt;td&gt;    Females: 25-34&lt;/td&gt;  &lt;td&gt;7.8&lt;/td&gt;  &lt;td&gt;28.9&lt;/td&gt;  &lt;td&gt;23.6&lt;/td&gt; &lt;/tr&gt; &lt;tr&gt;  &lt;td&gt;    Female: 25-49&lt;/td&gt;  &lt;td&gt;22.3&lt;/td&gt;  &lt;td&gt;50.9&lt;/td&gt;  &lt;td&gt;41.9&lt;/td&gt; &lt;/tr&gt; &lt;tr&gt;  &lt;td&gt;    Female: 25-54&lt;/td&gt;  &lt;td&gt;26.5&lt;/td&gt;  &lt;td&gt;56.2&lt;/td&gt;  &lt;td&gt;46&lt;/td&gt; &lt;/tr&gt; &lt;tr&gt;  &lt;td&gt;    Females: 35-44&lt;/td&gt;  &lt;td&gt;9.5&lt;/td&gt;  &lt;td&gt;16.4&lt;/td&gt;  &lt;td&gt;13.4&lt;/td&gt; &lt;/tr&gt; &lt;tr&gt;  &lt;td&gt;    Female: 35-49&lt;/td&gt;  &lt;td&gt;14.5&lt;/td&gt;  &lt;td&gt;21.9&lt;/td&gt;  &lt;td&gt;18.4&lt;/td&gt; &lt;/tr&gt; &lt;tr&gt;  &lt;td&gt;    Female: 35-54&lt;/td&gt;  &lt;td&gt;18.7&lt;/td&gt;  &lt;td&gt;27.3&lt;/td&gt;  &lt;td&gt;22.4&lt;/td&gt; &lt;/tr&gt; &lt;tr&gt;  &lt;td&gt;    Female: 35-64&lt;/td&gt;  &lt;td&gt;24.1&lt;/td&gt;  &lt;td&gt;30.8&lt;/td&gt;  &lt;td&gt;26.5&lt;/td&gt; &lt;/tr&gt; &lt;tr&gt;  &lt;td&gt;    Female: 45-54&lt;/td&gt;  &lt;td&gt;9.2&lt;/td&gt;  &lt;td&gt;10.9&lt;/td&gt;  &lt;td&gt;9&lt;/td&gt; &lt;/tr&gt; &lt;tr&gt;  &lt;td&gt;    Female: 45-64&lt;/td&gt;  &lt;td&gt;14.6&lt;/td&gt;  &lt;td&gt;14.5&lt;/td&gt;  &lt;td&gt;13.1&lt;/td&gt; &lt;/tr&gt; &lt;tr&gt;  &lt;td&gt;    Females: 55-64&lt;/td&gt;  &lt;td&gt;5.4&lt;/td&gt;  &lt;td&gt;3.6&lt;/td&gt;  &lt;td&gt;4.1&lt;/td&gt; &lt;/tr&gt; &lt;tr&gt;  &lt;td&gt;    Females: 65+&lt;/td&gt;  &lt;td&gt;2.6&lt;/td&gt;  &lt;td&gt;0.7&lt;/td&gt;  &lt;td&gt;1&lt;/td&gt; &lt;/tr&gt; &lt;tr&gt;  &lt;td style="font-weight: bold;"&gt;HH Income (US)&lt;/td&gt;  &lt;td&gt;&lt;br /&gt;&lt;/td&gt;  &lt;td&gt;&lt;br /&gt;&lt;/td&gt; &lt;/tr&gt; &lt;tr&gt;  &lt;td&gt;    HHI USD: Less than 15,000&lt;/td&gt;  &lt;td&gt;6&lt;/td&gt;  &lt;td&gt;11.4&lt;/td&gt;  &lt;td&gt;12.9&lt;/td&gt; &lt;/tr&gt; &lt;tr&gt;  &lt;td&gt;    HHI US: Under $25K&lt;/td&gt;  &lt;td&gt;9.3&lt;/td&gt;  &lt;td&gt;22.8&lt;/td&gt;  &lt;td&gt;23.1&lt;/td&gt; &lt;/tr&gt; &lt;tr&gt;  &lt;td&gt;    HHI US: Under $60K&lt;/td&gt;  &lt;td&gt;44.5&lt;/td&gt;  &lt;td&gt;64.8&lt;/td&gt;  &lt;td&gt;60.5&lt;/td&gt; &lt;/tr&gt; &lt;tr&gt;  &lt;td&gt;    HHI US: $60K+&lt;/td&gt;  &lt;td&gt;55.5&lt;/td&gt;  &lt;td&gt;34.8&lt;/td&gt;  &lt;td&gt;39.1&lt;/td&gt; &lt;/tr&gt; &lt;tr&gt;  &lt;td&gt;    HHI US: $75K+&lt;/td&gt;  &lt;td&gt;43&lt;/td&gt;  &lt;td&gt;22.7&lt;/td&gt;  &lt;td&gt;27.5&lt;/td&gt; &lt;/tr&gt; &lt;tr&gt;  &lt;td&gt;    HHI USD: 15,000 - 24,999&lt;/td&gt;  &lt;td&gt;3.4&lt;/td&gt;  &lt;td&gt;11.4&lt;/td&gt;  &lt;td&gt;10.1&lt;/td&gt; &lt;/tr&gt; &lt;tr&gt;  &lt;td&gt;    HHI USD: 25,000 - 39,999&lt;/td&gt;  &lt;td&gt;9.9&lt;/td&gt;  &lt;td&gt;21.8&lt;/td&gt;  &lt;td&gt;18.9&lt;/td&gt; &lt;/tr&gt; &lt;tr&gt;  &lt;td&gt;    HHI USD: 40,000 - 59,999&lt;/td&gt;  &lt;td&gt;25.3&lt;/td&gt;  &lt;td&gt;20.2&lt;/td&gt;  &lt;td&gt;18.6&lt;/td&gt; &lt;/tr&gt; &lt;tr&gt;  &lt;td&gt;    HHI USD: 60,000 - 74,999&lt;/td&gt;  &lt;td&gt;12.6&lt;/td&gt;  &lt;td&gt;12.1&lt;/td&gt;  &lt;td&gt;11.6&lt;/td&gt; &lt;/tr&gt; &lt;tr&gt;  &lt;td&gt;    HHI USD: 75,000 - 99,999&lt;/td&gt;  &lt;td&gt;17.7&lt;/td&gt;  &lt;td&gt;10.2&lt;/td&gt;  &lt;td&gt;11.5&lt;/td&gt; &lt;/tr&gt; &lt;tr&gt;  &lt;td&gt;    HHI USD: 100,000 or more&lt;/td&gt;  &lt;td&gt;25.3&lt;/td&gt;  &lt;td&gt;12.5&lt;/td&gt;  &lt;td&gt;16&lt;/td&gt; &lt;/tr&gt; &lt;tr&gt;  &lt;td style="font-weight: bold;"&gt;&lt;a href="http://en.wikipedia.org/wiki/United_States_Census#Regions_and_divisions"&gt;Region (US)&lt;/a&gt;&lt;/td&gt;  &lt;td&gt;&lt;br /&gt;&lt;/td&gt;  &lt;td&gt;&lt;br /&gt;&lt;/td&gt; &lt;/tr&gt; &lt;tr&gt;  &lt;td&gt;    Region US:West North Central&lt;/td&gt;  &lt;td&gt;7.6&lt;/td&gt;  &lt;td&gt;5.8&lt;/td&gt;  &lt;td&gt;7.5&lt;/td&gt; &lt;/tr&gt; &lt;tr&gt;  &lt;td&gt;    Region US:Mountain&lt;/td&gt;  &lt;td&gt;6.9&lt;/td&gt;  &lt;td&gt;6.4&lt;/td&gt;  &lt;td&gt;7.4&lt;/td&gt; &lt;/tr&gt; &lt;tr&gt;  &lt;td&gt;    Region US:Pacific&lt;/td&gt;  &lt;td&gt;15.4&lt;/td&gt;  &lt;td&gt;13.3&lt;/td&gt;  &lt;td&gt;15.7&lt;/td&gt; &lt;/tr&gt; &lt;tr&gt;  &lt;td&gt;    Region US:New England&lt;/td&gt;  &lt;td&gt;5.5&lt;/td&gt;  &lt;td&gt;6.4&lt;/td&gt;  &lt;td&gt;4.7&lt;/td&gt; &lt;/tr&gt; &lt;tr&gt;  &lt;td&gt;    Region US:Mid Atlantic&lt;/td&gt;  &lt;td&gt;14.2&lt;/td&gt;  &lt;td&gt;13.9&lt;/td&gt;  &lt;td&gt;15.8&lt;/td&gt; &lt;/tr&gt; &lt;tr&gt;  &lt;td&gt;    Region US:South Atlantic&lt;/td&gt;  &lt;td&gt;18.7&lt;/td&gt;  &lt;td&gt;19.2&lt;/td&gt;  &lt;td&gt;19.9&lt;/td&gt; &lt;/tr&gt; &lt;tr&gt;  &lt;td&gt;    Region US:East South Central&lt;/td&gt;  &lt;td&gt;5.1&lt;/td&gt;  &lt;td&gt;8.3&lt;/td&gt;  &lt;td&gt;5.2&lt;/td&gt; &lt;/tr&gt; &lt;tr&gt;  &lt;td&gt;    Region US:West South Central&lt;/td&gt;  &lt;td&gt;10.5&lt;/td&gt;  &lt;td&gt;10.7&lt;/td&gt;  &lt;td&gt;9&lt;/td&gt; &lt;/tr&gt; &lt;tr&gt;  &lt;td&gt;    Region US:East North Central&lt;/td&gt;  &lt;td&gt;16.1&lt;/td&gt;  &lt;td&gt;15.7&lt;/td&gt;  &lt;td&gt;14.8&lt;/td&gt; &lt;/tr&gt; &lt;tr&gt;  &lt;td style="font-weight: bold;"&gt;Children&lt;/td&gt;  &lt;td&gt;&lt;br /&gt;&lt;/td&gt;  &lt;td&gt;&lt;br /&gt;&lt;/td&gt;  &lt;td&gt;&lt;br /&gt;&lt;/td&gt; &lt;/tr&gt; &lt;tr&gt;  &lt;td&gt;    Children:No&lt;/td&gt;  &lt;td&gt;39.3&lt;/td&gt;  &lt;td&gt;52.7&lt;/td&gt;  &lt;td&gt;57.6&lt;/td&gt; &lt;/tr&gt; &lt;tr&gt;  &lt;td&gt;    Children:Yes&lt;/td&gt;  &lt;td&gt;60.7&lt;/td&gt;  &lt;td&gt;47.3&lt;/td&gt;  &lt;td&gt;42.3&lt;/td&gt; &lt;/tr&gt; &lt;tr&gt;  &lt;td style="font-weight: bold;"&gt;HH Size&lt;/td&gt;  &lt;td&gt;&lt;br /&gt;&lt;/td&gt;  &lt;td&gt;&lt;br /&gt;&lt;/td&gt;  &lt;td&gt;&lt;br /&gt;&lt;/td&gt; &lt;/tr&gt; &lt;tr&gt;  &lt;td&gt;    HH Size: 1&lt;/td&gt;  &lt;td&gt;4.4&lt;/td&gt;  &lt;td&gt;17.7&lt;/td&gt;  &lt;td&gt;17.3&lt;/td&gt; &lt;/tr&gt; &lt;tr&gt;  &lt;td&gt;    HH Size: 2&lt;/td&gt;  &lt;td&gt;24.2&lt;/td&gt;  &lt;td&gt;28.9&lt;/td&gt;  &lt;td&gt;30.6&lt;/td&gt; &lt;/tr&gt; &lt;tr&gt;  &lt;td&gt;    HH Size: 3&lt;/td&gt;  &lt;td&gt;21.4&lt;/td&gt;  &lt;td&gt;19.7&lt;/td&gt;  &lt;td&gt;19.2&lt;/td&gt; &lt;/tr&gt; &lt;tr&gt;  &lt;td&gt;    HH Size: 4&lt;/td&gt;  &lt;td&gt;25.3&lt;/td&gt;  &lt;td&gt;20.5&lt;/td&gt;  &lt;td&gt;21.9&lt;/td&gt; &lt;/tr&gt; &lt;tr&gt;  &lt;td&gt;    HH Size: 5+&lt;/td&gt;  &lt;td&gt;24.8&lt;/td&gt;  &lt;td&gt;12.9&lt;/td&gt;  &lt;td&gt;10.7&lt;/td&gt; &lt;/tr&gt; &lt;tr&gt;  &lt;td&gt;    HH Size: 1-2&lt;/td&gt;  &lt;td&gt;28.5&lt;/td&gt;  &lt;td&gt;46.6&lt;/td&gt;  &lt;td&gt;47.8&lt;/td&gt; &lt;/tr&gt; &lt;tr&gt;  &lt;td&gt;    HH Size: 3+&lt;/td&gt;  &lt;td&gt;71.5&lt;/td&gt;  &lt;td&gt;33.5&lt;/td&gt;  &lt;td&gt;32.7&lt;/td&gt; &lt;/tr&gt; &lt;tr&gt;  &lt;td style="font-weight: bold;"&gt;Race&lt;/td&gt;  &lt;td&gt;&lt;br /&gt;&lt;/td&gt;  &lt;td&gt;&lt;br /&gt;&lt;/td&gt;  &lt;td&gt;&lt;br /&gt;&lt;/td&gt; &lt;/tr&gt; &lt;tr&gt;  &lt;td&gt;    Race:White&lt;/td&gt;  &lt;td&gt;87.3&lt;/td&gt;  &lt;td&gt;82.7&lt;/td&gt;  &lt;td&gt;82&lt;/td&gt; &lt;/tr&gt; &lt;tr&gt;  &lt;td&gt;    Race:Black&lt;/td&gt;  &lt;td&gt;8&lt;/td&gt;  &lt;td&gt;6.5&lt;/td&gt;  &lt;td&gt;5.3&lt;/td&gt; &lt;/tr&gt; &lt;tr&gt;  &lt;td&gt;    Race:Asian&lt;/td&gt;  &lt;td&gt;1.6&lt;/td&gt;  &lt;td&gt;5.7&lt;/td&gt;  &lt;td&gt;6.8&lt;/td&gt; &lt;/tr&gt; &lt;tr&gt;  &lt;td&gt;    Race:Other&lt;/td&gt;  &lt;td&gt;3.1&lt;/td&gt;  &lt;td&gt;4.9&lt;/td&gt;  &lt;td&gt;5.8&lt;/td&gt; &lt;/tr&gt; &lt;tr&gt;  &lt;td&gt;&lt;br /&gt;&lt;/td&gt;  &lt;td&gt;&lt;br /&gt;&lt;/td&gt;  &lt;td&gt;&lt;br /&gt;&lt;/td&gt;  &lt;td&gt;&lt;br /&gt;&lt;/td&gt; &lt;/tr&gt; &lt;tr&gt;  &lt;td&gt;&lt;br /&gt;&lt;/td&gt;  &lt;td&gt;&lt;br /&gt;&lt;/td&gt;  &lt;td&gt;&lt;br /&gt;&lt;/td&gt;  &lt;td&gt;&lt;br /&gt;&lt;/td&gt; &lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/7118563403027467631-637156019528150411?l=behind-the-enemy-lines.blogspot.com'/&gt;&lt;/div&gt;&lt;div class="feedflare"&gt;
&lt;a href="http://feeds.feedburner.com/~ff/AComputerScientistInABusinessSchool?a=UF7q8AbY2TE:E4Kf3qZjbYc:yIl2AUoC8zA"&gt;&lt;img src="http://feeds.feedburner.com/~ff/AComputerScientistInABusinessSchool?d=yIl2AUoC8zA" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/AComputerScientistInABusinessSchool?a=UF7q8AbY2TE:E4Kf3qZjbYc:4cEx4HpKnUU"&gt;&lt;img src="http://feeds.feedburner.com/~ff/AComputerScientistInABusinessSchool?i=UF7q8AbY2TE:E4Kf3qZjbYc:4cEx4HpKnUU" border="0"&gt;&lt;/img&gt;&lt;/a&gt;
&lt;/div&gt;&lt;img src="http://feeds.feedburner.com/~r/AComputerScientistInABusinessSchool/~4/UF7q8AbY2TE" height="1" width="1"/&gt;</content><link rel="replies" type="application/atom+xml" href="http://behind-the-enemy-lines.blogspot.com/feeds/637156019528150411/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="http://behind-the-enemy-lines.blogspot.com/2009/03/turker-demographics-vs-internet.html#comment-form" title="3 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/7118563403027467631/posts/default/637156019528150411?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/7118563403027467631/posts/default/637156019528150411?v=2" /><link rel="alternate" type="text/html" href="http://feedproxy.google.com/~r/AComputerScientistInABusinessSchool/~3/UF7q8AbY2TE/turker-demographics-vs-internet.html" title="Turker Demographics vs Internet Demographics" /><author><name>Panos Ipeirotis</name><uri>http://www.blogger.com/profile/15283752183704062501</uri><email>noreply@blogger.com</email><gd:extendedProperty name="OpenSocialUserId" value="12579610116507126669" /></author><thr:total xmlns:thr="http://purl.org/syndication/thread/1.0">3</thr:total><feedburner:origLink>http://behind-the-enemy-lines.blogspot.com/2009/03/turker-demographics-vs-internet.html</feedburner:origLink></entry><entry gd:etag="W/&quot;CU4FSXYzcCp7ImA9WxVUFU0.&quot;"><id>tag:blogger.com,1999:blog-7118563403027467631.post-7867199672468491167</id><published>2009-03-11T00:02:00.014-04:00</published><updated>2009-03-19T19:05:18.888-04:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2009-03-19T19:05:18.888-04:00</app:edited><category scheme="http://www.blogger.com/atom/ns#" term="statistics" /><category scheme="http://www.blogger.com/atom/ns#" term="mechanical turk" /><category scheme="http://www.blogger.com/atom/ns#" term="research" /><title>Mechanical Turk: Profitable or Not?</title><content type="html">I was chatting with a couple of students of mine, who are trying to build a startup using Mechanical Turk. So, they asked me what is the value of the tasks that are being posted every day on Mechanical Turk.&lt;br /&gt;&lt;br /&gt;Since &lt;a href="http://behind-the-enemy-lines.blogspot.com/2009/02/monitoring-dynamics-of-mechanical-turk.html"&gt;I have the archive&lt;/a&gt; of the tasks posted on MTurk for the last couple of months, this was an easy question to answer. A simple query on the database, find the new HITs posted every hour, group by day, and here is the plot:&lt;br /&gt;&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://2.bp.blogspot.com/_skxW9kDLt_c/Sbc5XxyNubI/AAAAAAAAAXQ/Jac-8xYXf0Y/s1600-h/Value-of-HITs-per-day.PNG"&gt;&lt;img style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer; width: 400px; height: 164px;" src="http://2.bp.blogspot.com/_skxW9kDLt_c/Sbc5XxyNubI/AAAAAAAAAXQ/Jac-8xYXf0Y/s400/Value-of-HITs-per-day.PNG" alt="" id="BLOGGER_PHOTO_ID_5311777366170515890" border="0" /&gt;&lt;/a&gt;Long story short, the average value of HITs posted in any day is around \$2000. I have not analyzed the distribution of the values, but it seems to be (not surprisingly) a power-law or a lognormal.&lt;br /&gt;&lt;br /&gt;The 2K/day value means that the average revenue per day for Amazon is around 200 per day (10% of the requester's payment), or 6K/month. This hardly covers the expense of dedicating a developer to the service!&lt;br /&gt;&lt;br /&gt;It seems that Mechanical Turk is not generating any significant revenue for Amazon. It is also unlikely that it generates any profit. And we know these days  what happens to products that are not generating profits...&lt;br /&gt;&lt;br /&gt;Thankfully Amazon uses Mechanical Turk for its own purposes, so there are second-degree benefits for Amazon to keep the service around. I truly hope though that the service will attract more customers soon.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;Update&lt;/span&gt;: &lt;span style="font-style: italic;"&gt;I should also clarify that my figures are slight underestimates of the actual figures: I can only "see" Mechanical Turk through the eyes of an average worker. So I cannot see if a requester asks multiple people to complete the same HIT and, sometimes, I cannot observe the details for HITs for which I have not passed the qualification test. I still think that the numbers will be at the same order of magnitude.&lt;br /&gt;&lt;br /&gt;&lt;/span&gt;&lt;span&gt;&lt;span style="font-weight: bold;"&gt;Update 2&lt;/span&gt;&lt;/span&gt;&lt;span&gt;:&lt;/span&gt;&lt;span style="font-style: italic;"&gt; &lt;/span&gt;&lt;span style="font-style: italic;"&gt;If you want more details see &lt;/span&gt;&lt;span style="font-style: italic;"&gt;my previous post on the &lt;a href="http://behind-the-enemy-lines.blogspot.com/2009/02/monitoring-dynamics-of-mechanical-turk.html"&gt;dynamics of Mechanical Turk&lt;/a&gt;&lt;/span&gt;&lt;span style="font-style: italic;"&gt; and the chart at &lt;a href="http://hyperion.stern.nyu.edu/mturk/"&gt;http://hyperion.stern.nyu.edu/mturk/&lt;/a&gt; that shows the available HITs and rewards at any given time.&lt;br /&gt;&lt;/span&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/7118563403027467631-7867199672468491167?l=behind-the-enemy-lines.blogspot.com'/&gt;&lt;/div&gt;&lt;div class="feedflare"&gt;
&lt;a href="http://feeds.feedburner.com/~ff/AComputerScientistInABusinessSchool?a=MzTE_omZv58:RsIJg5Hzuw0:yIl2AUoC8zA"&gt;&lt;img src="http://feeds.feedburner.com/~ff/AComputerScientistInABusinessSchool?d=yIl2AUoC8zA" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/AComputerScientistInABusinessSchool?a=MzTE_omZv58:RsIJg5Hzuw0:4cEx4HpKnUU"&gt;&lt;img src="http://feeds.feedburner.com/~ff/AComputerScientistInABusinessSchool?i=MzTE_omZv58:RsIJg5Hzuw0:4cEx4HpKnUU" border="0"&gt;&lt;/img&gt;&lt;/a&gt;
&lt;/div&gt;&lt;img src="http://feeds.feedburner.com/~r/AComputerScientistInABusinessSchool/~4/MzTE_omZv58" height="1" width="1"/&gt;</content><link rel="replies" type="application/atom+xml" href="http://behind-the-enemy-lines.blogspot.com/feeds/7867199672468491167/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="http://behind-the-enemy-lines.blogspot.com/2009/03/mechanical-turk-profitable-or-not.html#comment-form" title="19 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/7118563403027467631/posts/default/7867199672468491167?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/7118563403027467631/posts/default/7867199672468491167?v=2" /><link rel="alternate" type="text/html" href="http://feedproxy.google.com/~r/AComputerScientistInABusinessSchool/~3/MzTE_omZv58/mechanical-turk-profitable-or-not.html" title="Mechanical Turk: Profitable or Not?" /><author><name>Panos Ipeirotis</name><uri>http://www.blogger.com/profile/15283752183704062501</uri><email>noreply@blogger.com</email><gd:extendedProperty name="OpenSocialUserId" value="12579610116507126669" /></author><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="http://2.bp.blogspot.com/_skxW9kDLt_c/Sbc5XxyNubI/AAAAAAAAAXQ/Jac-8xYXf0Y/s72-c/Value-of-HITs-per-day.PNG" height="72" width="72" /><thr:total xmlns:thr="http://purl.org/syndication/thread/1.0">19</thr:total><feedburner:origLink>http://behind-the-enemy-lines.blogspot.com/2009/03/mechanical-turk-profitable-or-not.html</feedburner:origLink></entry><entry gd:etag="W/&quot;CU4CRXg-eCp7ImA9WxVVE0Q.&quot;"><id>tag:blogger.com,1999:blog-7118563403027467631.post-5793347870680571130</id><published>2009-03-06T21:39:00.003-05:00</published><updated>2009-03-06T21:46:04.650-05:00</updated><app:edited xmlns:app="http://www.w3.org/2007/app">2009-03-06T21:46:04.650-05:00</app:edited><category scheme="http://www.blogger.com/atom/ns#" term="mechanical turk" /><category scheme="http://www.blogger.com/atom/ns#" term="research" /><title>Human Computation Workshop (HCOMP 2009)</title><content type="html">Are you interested in Mechanical Turk and applications? Do you think that the ESP Game and ReCAPTHA's are great ideas and have some ideas that you want to share with the world? Or you are just interested in learning about human computation in general?&lt;br /&gt;&lt;br /&gt;Now you can! We (an embarrassingly long list of organizers) are organizing a half-day workshop at KDD 2009 in Paris this year. The workshop will be on &lt;span style="font-weight: bold;"&gt;June 28th&lt;/span&gt;, and &lt;span style="font-weight: bold;"&gt;submissions are due on April 18, 2009 8pm Eastern Time&lt;/span&gt;. You can find more details about the submission requirements and find other useful information at &lt;a href="http://www.hcomp2009.org/" target="_blank"&gt;http://www.hcomp2009.org&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;Below you can is the complete Call for Papers, for those that enjoy reading such stuff:&lt;br /&gt;&lt;blockquote&gt;                Human Computation Workshop (HCOMP 2009)&lt;br /&gt;                        KDD-09 Workshop, Paris France&lt;br /&gt;                                June 28, 2009&lt;br /&gt;&lt;br /&gt;We invite you to participate in the first annual Human Computation Workshop&lt;br /&gt;(HCOMP 2009), to be held on June 28th in conjunction with the KDD-09&lt;br /&gt;conference (&lt;a href="http://www.sigkdd.org/kdd2009/" target="_blank"&gt;http://www.sigkdd.org/&lt;wbr&gt;kdd2009/&lt;/a&gt;) in Paris, France.&lt;br /&gt;&lt;br /&gt;Human computation is a new research area that studies the process of&lt;br /&gt;channeling the vast internet population to perform tasks or provide data&lt;br /&gt;towards solving difficult problems that no known efficient computer&lt;br /&gt;algorithms can yet solve. The goal of this half-day workshop is to bring&lt;br /&gt;together academic and industry researchers in a stimulating discussion of&lt;br /&gt;existing human computation applications (e.g. games, CAPTCHAs, Mechanical&lt;br /&gt;Turk) and future directions of this new subject area.  We solicit papers&lt;br /&gt;related to various aspects of both general human computation techniques and&lt;br /&gt;specific applications, e.g. general design principles; implementation;&lt;br /&gt;cost-benefit analysis; theoretical approaches; privacy and security&lt;br /&gt;concerns; and incorporation of machine learning / artificial intelligence&lt;br /&gt;techniques. An integral part of this workshop will be a demo session where&lt;br /&gt;participants can showcase their human computation applications.&lt;br /&gt;&lt;br /&gt;Detailed information about the workshop and submission procedures can be&lt;br /&gt;found at &lt;a href="http://www.hcomp2009.org/" target="_blank"&gt;http://www.hcomp2009.org&lt;/a&gt;.  We solicit long papers (at most nine&lt;br /&gt;pages), short papers (at most four pages) and demos.  Demo submissions must&lt;br /&gt;include either a previously published paper or a one page extended abstract.&lt;br /&gt; Deadline for submission is April 18, 2009 8pm Eastern Time.&lt;br /&gt;&lt;br /&gt;Organizing Committee&lt;br /&gt;---------------------&lt;br /&gt;Paul Bennett (Microsoft Research)&lt;br /&gt;Raman Chandrasekar (Microsoft Research)&lt;br /&gt;Max Chickering (Microsoft Live Labs)&lt;br /&gt;Panos Ipeirotis (New York University)&lt;br /&gt;Edith Law (Carnegie Mellon University)&lt;br /&gt;Anton Mityagin (Microsoft Live Labs)&lt;br /&gt;Foster Provost (New York University)&lt;br /&gt;Luis von Ahn (Carnegie Mellon University)&lt;/blockquote&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/7118563403027467631-5793347870680571130?l=behind-the-enemy-lines.blogspot.com'/&gt;&lt;/div&gt;&lt;div class="feedflare"&gt;
&lt;a href="http://feeds.feedburner.com/~ff/AComputerScientistInABusinessSchool?a=v_6YeVw-qNY:7A2stFIoP6k:yIl2AUoC8zA"&gt;&lt;img src="http://feeds.feedburner.com/~ff/AComputerScientistInABusinessSchool?d=yIl2AUoC8zA" border="0"&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href="http://feeds.feedburner.com/~ff/AComputerScientistInABusinessSchool?a=v_6YeVw-qNY:7A2stFIoP6k:4cEx4HpKnUU"&gt;&lt;img src="http://feeds.feedburner.com/~ff/AComputerScientistInABusinessSchool?i=v_6YeVw-qNY:7A2stFIoP6k:4cEx4HpKnUU" border="0"&gt;&lt;/img&gt;&lt;/a&gt;
&lt;/div&gt;&lt;img src="http://feeds.feedburner.com/~r/AComputerScientistInABusinessSchool/~4/v_6YeVw-qNY" height="1" width="1"/&gt;</content><link rel="replies" type="application/atom+xml" href="http://behind-the-enemy-lines.blogspot.com/feeds/5793347870680571130/comments/default" title="Post Comments" /><link rel="replies" type="text/html" href="http://behind-the-enemy-lines.blogspot.com/2009/03/human-computation-workshop-hcomp-2009.html#comment-form" title="2 Comments" /><link rel="edit" type="application/atom+xml" href="http://www.blogger.com/feeds/7118563403027467631/posts/default/5793347870680571130?v=2" /><link rel="self" type="application/atom+xml" href="http://www.blogger.com/feeds/7118563403027467631/posts/default/5793347870680571130?v=2" /><link rel="alternate" type="text/html" href="http://feedproxy.google.com/~r/AComputerScientistInABusinessSchool/~3/v_6YeVw-qNY/human-computation-workshop-hcomp-2009.html" title="Human Computation Workshop (HCOMP 2009)" /><author><name>Panos Ipeirotis</name><uri>http://www.blogger.com/profile/15283752183704062501</uri><email>noreply@blogger.com</email><gd:extendedProperty name="OpenSocialUserId" value="12579610116507126669" /></author><thr:total xmlns:thr="http://purl.org/syndication/thread/1.0">2</thr:total><feedburner:origLink>http://behind-the-enemy-lines.blogspot.com/2009/03/human-computation-workshop-hcomp-2009.html</feedburner:origLink></entry></feed>
