<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" media="screen" href="/~d/styles/rss2full.xsl"?><?xml-stylesheet type="text/css" media="screen" href="http://feeds.feedburner.com/~d/styles/itemcontent.css"?><rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:wfw="http://wellformedweb.org/CommentAPI/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:sy="http://purl.org/rss/1.0/modules/syndication/" xmlns:slash="http://purl.org/rss/1.0/modules/slash/" xmlns:creativeCommons="http://backend.userland.com/creativeCommonsRssModule" xmlns:feedburner="http://rssnamespace.org/feedburner/ext/1.0" version="2.0">

<channel>
	<title>[citation needed]</title>
	
	<link>http://www.talyarkoni.org/blog</link>
	<description>...or you get no soup for one year!</description>
	<lastBuildDate>Wed, 08 Feb 2012 07:28:31 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3.1</generator>
		<atom10:link xmlns:atom10="http://www.w3.org/2005/Atom" rel="self" type="application/rss+xml" href="http://feeds.feedburner.com/citationNeeded" /><feedburner:info uri="citationneeded" /><atom10:link xmlns:atom10="http://www.w3.org/2005/Atom" rel="hub" href="http://pubsubhubbub.appspot.com/" /><creativeCommons:license>http://creativecommons.org/licenses/by-nc-sa/3.0/</creativeCommons:license><item>
		<title>a human and a monkey walk into an fMRI scanner…</title>
		<link>http://feedproxy.google.com/~r/citationNeeded/~3/V7IL1xrupxA/</link>
		<comments>http://www.talyarkoni.org/blog/2012/02/08/a-human-and-a-monkey-walk-into-an-fmri-scanner/#comments</comments>
		<pubDate>Wed, 08 Feb 2012 07:07:37 +0000</pubDate>
		<dc:creator>Tal Yarkoni</dc:creator>
				<category><![CDATA[fmri]]></category>
		<category><![CDATA[methods]]></category>
		<category><![CDATA[neuroscience]]></category>
		<category><![CDATA[apes]]></category>
		<category><![CDATA[interspecies activity]]></category>

		<guid isPermaLink="false">http://www.talyarkoni.org/blog/?p=954</guid>
		<description><![CDATA[Tor Wager and I have a &#8220;news and views&#8221; piece in Nature Methods this week; we discuss a paper by Mantini and colleagues (in the same issue) introducing a new method for identifying functional brain homologies across different species&#8211;essentially, identifying brain regions in humans and monkeys that seem to do roughly the same thing even if they&#8217;re [...]]]></description>
			<content:encoded><![CDATA[<p>Tor Wager and I have a <a href="http://www.nature.com/nmeth/journal/vaop/ncurrent/full/nmeth.1869.html">&#8220;news and views&#8221;</a> piece in Nature Methods this week; we discuss a <a href="http://www.nature.com/nmeth/journal/vaop/ncurrent/full/nmeth.1868.html">paper by Mantini and colleagues</a> (in the same issue) introducing a new method for identifying functional brain homologies across different species&#8211;essentially, identifying brain regions in humans and monkeys that seem to do roughly the same thing even if they&#8217;re not located in the same place anatomically. Mantini et al make some fairly strong claims about what their approach tells us about the evolution of the human brain (namely, that some cortical regions have undergone expansion relative to monkeys, while others have adapted substantively new functions). For reasons we articulate in our commentary, I&#8217;m personally not so convinced by the substantive conclusions, but I do think the core idea underlying the method is a very clever and potentially useful one:</p>
<blockquote><p>Their technique, interspecies activity correlation (ISAC), uses functional magnetic resonance imaging (fMRI) to identify brain regions in which humans and monkeys exposed to the same dynamic stimulus—a 30-minute clip from the movie <em>The Good, the Bad and the Ugly</em>—show correlated patterns of activity (<a href="http://www.nature.com/nmeth/journal/vaop/ncurrent/full/nmeth.1869.html#f1">Fig. 1</a>). The premise is that homologous regions should have similar patterns of activity across species. For example, a brain region sensitive to a particular configuration of features, including visual motion, hands, faces, object and others, should show a similar time course of activity in both species—even if its anatomical location differs across species and even if the precise features that drive the area&#8217;s neurons have not yet been specified.</p></blockquote>
<p><a href="http://www.guardian.co.uk/science/neurophilosophy">Mo Costandi</a> has more on the paper in <a href="http://www.guardian.co.uk/science/neurophilosophy/2012/feb/05/1">an excellent Guardian piece</a> (and I&#8217;m not just saying that because he quoted me a few times). All in all, I think it&#8217;s a very exciting method, and it&#8217;ll be interesting to see how it&#8217;s applied in future studies. I think there&#8217;s a fairly broad class of potential applications based loosely around the same idea of searching for correlated patterns. It&#8217;s an idea that&#8217;s already been used by <a href="http://www.hasson.org/">Uri Hasson</a> (an author on the Mantini et al paper) and others fairly widely in the fMRI literature to identify functional correspondences across different subjects; but you can easily imagine conceptually similar applications in other fields too&#8211;e.g., correlating gene expression profiles across species in order to identify structural homologies (actually, one could probably try this out pretty easily using the mouse and human data available in the <a href="http://www.brain-map.org/">Allen Brain Atlas</a>).</p>
<p><span style="float: left; padding: 5px;"><a href="http://www.researchblogging.org"><img style="border: 0;" src="http://www.researchblogging.org/public/citation_icons/rb2_large_gray.png" alt="ResearchBlogging.org" /></a></span><span class="Z3988" title="ctx_ver=Z39.88-2004&amp;rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&amp;rft.jtitle=Nature+methods&amp;rft_id=info%3Apmid%2F22306809&amp;rfr_id=info%3Asid%2Fresearchblogging.org&amp;rft.atitle=Interspecies+activity+correlations+reveal+functional+correspondence+between+monkey+and+human+brain+areas.&amp;rft.issn=1548-7091&amp;rft.date=2012&amp;rft.volume=&amp;rft.issue=&amp;rft.spage=&amp;rft.epage=&amp;rft.artnum=&amp;rft.au=Mantini+D&amp;rft.au=Hasson+U&amp;rft.au=Betti+V&amp;rft.au=Perrucci+MG&amp;rft.au=Romani+GL&amp;rft.au=Corbetta+M&amp;rft.au=Orban+GA&amp;rft.au=Vanduffel+W&amp;rfe_dat=bpr3.included=1;bpr3.tags=Neuroscience%2CCognitive+Neuroscience%2C+fMRI">Mantini D, Hasson U, Betti V, Perrucci MG, Romani GL, Corbetta M, Orban GA, &amp; Vanduffel W (2012). Interspecies activity correlations reveal functional correspondence between monkey and human brain areas. <span style="font-style: italic;">Nature methods</span> PMID: <a href="http://www.ncbi.nlm.nih.gov/pubmed/22306809" rev="review">22306809</a></span></p>
<p><span class="Z3988" title="ctx_ver=Z39.88-2004&amp;rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&amp;rft.jtitle=Nature+Methods&amp;rft_id=info%3Adoi%2F10.1038%2Fnmeth.1869&amp;rfr_id=info%3Asid%2Fresearchblogging.org&amp;rft.atitle=Establishing+homology+between+monkey+and+human+brains&amp;rft.issn=1548-7091&amp;rft.date=2012&amp;rft.volume=&amp;rft.issue=&amp;rft.spage=&amp;rft.epage=&amp;rft.artnum=http%3A%2F%2Fwww.nature.com%2Fdoifinder%2F10.1038%2Fnmeth.1869&amp;rft.au=Wager%2C+T.&amp;rft.au=Yarkoni%2C+T.&amp;rfe_dat=bpr3.included=1;bpr3.tags=Neuroscience%2CCognitive+Neuroscience%2C+fMRI">Wager, T., &amp; Yarkoni, T. (2012). Establishing homology between monkey and human brains <span style="font-style: italic;">Nature Methods</span> DOI: <a href="http://dx.doi.org/10.1038/nmeth.1869" rev="review">10.1038/nmeth.1869</a></span></p>
<img src="http://feeds.feedburner.com/~r/citationNeeded/~4/V7IL1xrupxA" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://www.talyarkoni.org/blog/2012/02/08/a-human-and-a-monkey-walk-into-an-fmri-scanner/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		<feedburner:origLink>http://www.talyarkoni.org/blog/2012/02/08/a-human-and-a-monkey-walk-into-an-fmri-scanner/</feedburner:origLink></item>
		<item>
		<title>no free lunch in statistics</title>
		<link>http://feedproxy.google.com/~r/citationNeeded/~3/3cFyL5H1aag/</link>
		<comments>http://www.talyarkoni.org/blog/2012/02/07/no-free-lunch-in-statistics/#comments</comments>
		<pubDate>Wed, 08 Feb 2012 06:35:37 +0000</pubDate>
		<dc:creator>Tal Yarkoni</dc:creator>
				<category><![CDATA[statistics]]></category>
		<category><![CDATA[correlation]]></category>
		<category><![CDATA[data mining]]></category>
		<category><![CDATA[MIC]]></category>
		<category><![CDATA[no free lunch]]></category>

		<guid isPermaLink="false">http://www.talyarkoni.org/blog/?p=952</guid>
		<description><![CDATA[Simon and Tibshirani recently posted a short comment on the Reshef et al MIC data mining paper I blogged about a while back: The proposal of Reshef et. al. (“MIC”) is an interesting new approach for discovering non-linear dependencies among pairs of measurements in exploratory data mining. However, it has a potentially serious drawback. The authors laud [...]]]></description>
			<content:encoded><![CDATA[<p>Simon and Tibshirani recently posted <a href="http://www-stat.stanford.edu/~tibs/reshef/comment.pdf">a short comment</a> on the <a href="http://www.sciencemag.org/content/334/6062/1518.abstract">Reshef et al</a> MIC data mining paper <a href="http://www.talyarkoni.org/blog/2011/12/17/large-scale-data-exploration-mic-style/">I blogged about</a> a while back:</p>
<blockquote><p>The proposal of Reshef et. al. (“MIC”) is an interesting new approach for discovering non-linear dependencies among pairs of measurements in exploratory data mining. However, it has a potentially serious drawback. The authors laud the fact that MIC has no preference for some alternatives over others, but as the authors know, there is no <em>free lunch in Statistics</em>: tests which strive to have high power against all alternatives can have low power in many important situations.</p></blockquote>
<p>They then report some simulation results clearly demonstrating that MIC is (very) underpowered relative to Pearson correlation in most situations, and performs even worse relative to Székely &amp; Rizzo&#8217;s distance correlation (which I hadn&#8217;t heard about, but will have to look into now). I mentioned low power as a potential concern in my own post, but figured it would be an issue under relatively specific circumstances (i.e., only for certain kinds of associations in relatively small samples). Simon &amp; Tibshirani&#8217;s simulations pretty clearly demonstrate that isn&#8217;t so. Which, needless to say, rather dampens the enthusiasm for the MIC statistic.</p>
<img src="http://feeds.feedburner.com/~r/citationNeeded/~4/3cFyL5H1aag" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://www.talyarkoni.org/blog/2012/02/07/no-free-lunch-in-statistics/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		<feedburner:origLink>http://www.talyarkoni.org/blog/2012/02/07/no-free-lunch-in-statistics/</feedburner:origLink></item>
		<item>
		<title>the neuroinformatics of Neopets</title>
		<link>http://feedproxy.google.com/~r/citationNeeded/~3/tGoRwKndX7I/</link>
		<comments>http://www.talyarkoni.org/blog/2012/01/26/the-neuroinformatics-of-neopets/#comments</comments>
		<pubDate>Fri, 27 Jan 2012 06:58:52 +0000</pubDate>
		<dc:creator>Tal Yarkoni</dc:creator>
				<category><![CDATA[data mining]]></category>
		<category><![CDATA[fun with data]]></category>
		<category><![CDATA[general silliness]]></category>
		<category><![CDATA[fishing expedition]]></category>
		<category><![CDATA[Google Correlate]]></category>
		<category><![CDATA[Illinois film office]]></category>
		<category><![CDATA[Neopets]]></category>
		<category><![CDATA[neuroinformatics]]></category>
		<category><![CDATA[secret avatars]]></category>

		<guid isPermaLink="false">http://www.talyarkoni.org/blog/?p=943</guid>
		<description><![CDATA[In the process of writing a short piece for the APS Observer, I was fiddling around with Google Correlate earlier this evening. It&#8217;s a very neat toy, but if you think neuroimaging or genetics have a big multiple comparisons problem, playing with Google Correlate for a few minutes will put things in perspective. Here&#8217;s a [...]]]></description>
			<content:encoded><![CDATA[<p>In the process of writing a short piece for the APS Observer, I was fiddling around with <a href="http://www.google.com/trends/correlate/">Google Correlate</a> earlier this evening. It&#8217;s a very neat toy, but if you think neuroimaging or genetics have a big multiple comparisons problem, playing with Google Correlate for a few minutes will put things in perspective. Here&#8217;s a line graph displaying the search term most strongly correlated (over time) <a href="http://www.google.com/trends/correlate/search?e=neuroinformatics&amp;t=monthly&amp;shift=3&amp;p=us#default,50">with searches for &#8220;neuroinformatics&#8221;</a>:</p>
<p><a href="http://www.talyarkoni.org/blog/wp-content/uploads/2012/01/illinois_film_office1.png"><img class="alignnone  wp-image-945" title="Illinois informaticians like to watch movies" src="http://www.talyarkoni.org/blog/wp-content/uploads/2012/01/illinois_film_office1.png" alt="" width="450" height="279" /></a></p>
<p>That&#8217;s right, the search term that covaries most strongly with &#8220;neuroinformatics&#8221; is none other than &#8220;Illinois film office&#8221; (which, to be fair, has a <a href="http://www.illinoisfilm.biz/">pretty appealing website</a>). Other top matches include &#8220;wma support&#8221;, &#8220;sim codes&#8221;, &#8220;bed-in-a-bag&#8221;, &#8220;neopets secret&#8221;, &#8220;neopets guild&#8221;, and &#8220;neopets secret avatars&#8221;.</p>
<p>I may not have learned much about neuroinformatics from this exercise, but I <em>did</em> get a pretty good sense of how neuroinformaticians like to spend their free time&#8230;</p>
<p>&nbsp;</p>
<p>p.s. I was pretty surprised to find that normalized search volume for just about every informatics-related term has fallen sharply in the last 10 years. I went in expecting the opposite! Maybe all the informaticians were early search adopters, and the rest of the world caught up? No, probably not. Anyway, enough of this; <a href="http://www.neopets.com/explore.phtml">Neopia</a> is calling me!</p>
<p>p.p.s. Seriously though, this is why data fishing expeditions are dangerous. Any one of these correlations is significant at p-less-than-point-whatever-you-like. And if your publication record depended on it, you could probably tell yourself a convincing story about why neuroinformaticians need to look up Garmin eMaps&#8230;</p>
<img src="http://feeds.feedburner.com/~r/citationNeeded/~4/tGoRwKndX7I" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://www.talyarkoni.org/blog/2012/01/26/the-neuroinformatics-of-neopets/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		<feedburner:origLink>http://www.talyarkoni.org/blog/2012/01/26/the-neuroinformatics-of-neopets/</feedburner:origLink></item>
		<item>
		<title>Attention publishers: the data in your tables want to be free! Free!</title>
		<link>http://feedproxy.google.com/~r/citationNeeded/~3/bsCABV4vEec/</link>
		<comments>http://www.talyarkoni.org/blog/2012/01/07/attention-publishers-the-data-in-your-tables-want-to-be-free-free/#comments</comments>
		<pubDate>Sat, 07 Jan 2012 15:13:23 +0000</pubDate>
		<dc:creator>Tal Yarkoni</dc:creator>
				<category><![CDATA[data mining]]></category>
		<category><![CDATA[publishing]]></category>
		<category><![CDATA[fmri]]></category>
		<category><![CDATA[Frontiers]]></category>
		<category><![CDATA[HTML]]></category>
		<category><![CDATA[neurosynth]]></category>
		<category><![CDATA[PLoS]]></category>

		<guid isPermaLink="false">http://www.talyarkoni.org/blog/?p=932</guid>
		<description><![CDATA[The Neurosynth database is getting an upgrade over the next couple of weeks; it&#8217;s going to go from 4,393 neuroimaging studies to around 5,800. Unfortunately, updating the database is kind of a pain, because academic publishers like to change the format of their full-text HTML articles, which has a nasty habit of breaking the publisher-specific [...]]]></description>
			<content:encoded><![CDATA[<p>The <a href="http://neurosynth.org">Neurosynth</a> database is getting an upgrade over the next couple of weeks; it&#8217;s going to go from 4,393 neuroimaging studies to around 5,800. Unfortunately, updating the database is kind of a pain, because academic publishers like to change the format of their full-text HTML articles, which has a nasty habit of breaking the publisher-specific HTML parsers I&#8217;ve written. When you expect ScienceDirect to give you &lt;table cellspacing=10&gt;, but you get &lt;table&gt; with no cellspacing attribute (the horror!), bad things happen in XPath land. And then those bad things need to be repaired. And I hate repairing stuff! So I don&#8217;t do it very often. Like, once every 6 to 9 months.</p>
<p>In an ideal world, there would be no need to write (and fix) custom filters for different publishers, because the publishers would all simultaneously make XML representations of their articles available (in addition to HTML, PDF, etc.), and then people who have legitimate data mining reasons for regularly downloading hundreds of articles at a time wouldn&#8217;t have to cry themselves to sleep every night. But as it stands, only one major publisher of neuroimaging articles (<a href="http://plos.org">PLoS</a>) provides XML versions of all articles. A minority of articles from other publishers are available in XML from BioMed Central, but that&#8217;s still just a fraction of the existing literature.</p>
<p>Anyway, the HTML thing is annoying, but it&#8217;s possible to work around it. What&#8217;s much more problematic is that some publishers lock up the data in the tables of their articles. To make Neurosynth work, I have to be able to identify rows in tables that look like brain activations. That is, things that look roughly like this:</p>
<p><a href="http://www.talyarkoni.org/blog/wp-content/uploads/2012/01/activation_table.jpg"><img class="alignnone  wp-image-934" title="activation_table" src="http://www.talyarkoni.org/blog/wp-content/uploads/2012/01/activation_table.jpg" alt="" width="450" height="203" /></a></p>
<p>Most publishers are nice enough to format article tables as HTML tables; which is to say, I can look for tags like &lt;table&gt; and then work down the XPath tree to identify all the the rows, and then scan each rows for values that look activation-like. Then those values go into the database, and poof, next thing you know, you have meta-analytic brain activation maps from hundreds of studies. But some publishers&#8211;most notably, <a href="http://frontiersin.org">Frontiers</a>&#8211;throw a wrench in the works by failing to format tables in HTML; instead, they present the tables as images (see for instance this <a href="http://www.frontiersin.org/files/TempImages/imagecache/12235_fnhum-05-00177-HTML/images/image_m/fnhum-05-00177-t004.jpg">JPEG table</a>, pulled from <a href="http://www.frontiersin.org/human_neuroscience/10.3389/fnhum.2011.00177/full">this</a> article). Which means I can&#8217;t really extract any data from them, and as a result, you&#8217;re not going to see activations from articles published in Frontiers journals in Neurosynth any time soon. So if you publish fMRI articles in Frontiers in Human Neuroscience regularly, and are wondering why I&#8217;ve been ignoring you (I like you! I promise!), now you know.</p>
<p>Anyway, on the remote chance that anyone reading this has any sway with people high up at Frontiers, could you please ask them to release their data? Pretty please? Lack of access to data in tables seems to be a pretty common complaint in the data mining community; I&#8217;ve talked to other people in the neuroinformatics world who&#8217;ve also expressed frustration about it, and I imagine the same is true of people in other disciplines. It&#8217;s particularly surprising given that Frontiers is, in theory, an open access publisher. I can <em>see</em> the data in your tables, Frontiers; why won&#8217;t you also let me <em>read</em> it?</p>
<p>Okay, I know this kind of stuff doesn&#8217;t really interest anyone; I&#8217;m just venting. The main point is, Neurosynth is going to be bigger and (very slightly) better in the near future.</p>
<img src="http://feeds.feedburner.com/~r/citationNeeded/~4/bsCABV4vEec" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://www.talyarkoni.org/blog/2012/01/07/attention-publishers-the-data-in-your-tables-want-to-be-free-free/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		<feedburner:origLink>http://www.talyarkoni.org/blog/2012/01/07/attention-publishers-the-data-in-your-tables-want-to-be-free-free/</feedburner:origLink></item>
		<item>
		<title>in which Discover Card decides that my wife is also my daughter</title>
		<link>http://feedproxy.google.com/~r/citationNeeded/~3/HsENSArzJko/</link>
		<comments>http://www.talyarkoni.org/blog/2012/01/04/in-which-discover-card-decides-that-my-wife-is-also-my-daughter/#comments</comments>
		<pubDate>Thu, 05 Jan 2012 05:49:25 +0000</pubDate>
		<dc:creator>Tal Yarkoni</dc:creator>
				<category><![CDATA[general silliness]]></category>
		<category><![CDATA[artificial intelligence]]></category>
		<category><![CDATA[college]]></category>
		<category><![CDATA[credit cards]]></category>
		<category><![CDATA[data mining]]></category>
		<category><![CDATA[Discover]]></category>
		<category><![CDATA[junk mail]]></category>
		<category><![CDATA[Olive Garden]]></category>
		<category><![CDATA[Pismo Beach]]></category>

		<guid isPermaLink="false">http://www.talyarkoni.org/blog/?p=920</guid>
		<description><![CDATA[Ever since I opted out of receiving preapproved credit card offers, I&#8217;ve stopped getting credit card spam in the mail (yay!). But companies I have an existing relationship with still have the right to send me various offers and updates, and there&#8217;s nothing I can do about that (except throw said offers in the trash [...]]]></description>
			<content:encoded><![CDATA[<p>Ever since I <a href="https://www.optoutprescreen.com">opted out</a> of receiving preapproved credit card offers, I&#8217;ve stopped getting credit card spam in the mail (yay!). But companies I have an existing relationship with still have the right to send me various offers and updates, and there&#8217;s nothing I can do about that (except throw said offers in the trash after inspecting them and deciding that, no, I do <em>not</em> want to purchase the premium yacht travel insurance policy that comes with a bonus free set of matching lawn gnomes and a voucher for a buy-one-get-one-free meal at the Olive Garden). Discover Card is one of these companies, and the clever devils regularly take advantage of my amicable nature by sending me all kinds of wonderful offers. Take for instance the one I received yesterday, which starts like this:</p>
<blockquote><p>Dear Tal,</p>
<p>You&#8217;ve worked for years to provide a better life for your children and prepare them for a successful future. Now that they&#8217;re in college, the overwhelming cost of higher education shouldn&#8217;t stand in the way of their success. We&#8217;re ready to help.</p></blockquote>
<p>This is undoubtedly a very generous offer, but it comes at an inconvenient time for me, because, as it so happens, I don&#8217;t <em>have</em> any children right now&#8211;let alone college-aged children who need their father to front them some money. Somewhere, somehow, it seems Discover Card took a left turn at Albuquerque, when all along they were trying to get to Pismo Beach:</p>
<p><iframe width="450" height="338" src="http://www.youtube.com/embed/v-s-_ME8Qns?start=84&#038;fs=1&#038;feature=oembed" frameborder="0" allowfullscreen></iframe></p>
<p>Of course, this isn&#8217;t a case of human error; I very much doubt that an overworked analyst is putting in long nights at Discover combing through random customers&#8217; accounts looking for purchases diagnostic of college attendance (you know, like Ritalin receipts). The blame almost certainly rests with an over-inclusive algorithm that combed through my purchase history and automagically decided that I fit the profile of a middle-aged man who&#8217;s worked hard for years to provide a better life for his children. (I suppose I can take solace in the fact that while Discover probably knows what brand of toothpaste I like, it must not know my age, given that there aren&#8217;t many 31-year-old men with college-aged children.)</p>
<p>Anyway, I spent some time pondering what purchases I&#8217;ve made that could have tripped up Discover&#8217;s parental alarm system. And after scanning several months of statements, I&#8217;m proud to report it almost certainly has something to do with the giant monthly rent charge from &#8220;CU Residence Halls&#8221; (my wife and I live in on-campus housing). Either that or the many book-and-coffee-related charges from places with names like &#8220;University of Colorado Bookstore&#8221; and &#8220;Pretentious Coffeehouse on CU Campus&#8221;.</p>
<p>So that&#8217;s easy enough, right? It&#8217;s the on-campus purchases, stupid! Ah, but wait! That&#8217;s only one part of the mystery! The other, perhaps more interesting, part is this: who exactly does Discover think my college-aged child <em>is</em>, seeing as they clearly think I&#8217;m <em>not</em> the one caffeinating myself at the altar of higher education? Well, after thinking about <em>that</em> for a while, another clear answer emerges: it&#8217;s my wife! Discover thinks I have a college-aged daughter who also happens to be my wife! There&#8217;s no other explanation; to my knowledge, I don&#8217;t live with anyone else besides my wife (though, admittedly, I don&#8217;t check the storage closet very often).</p>
<p>Now, setting aside the fact that such a thing would be illegal in all fifty states, my wife and I are not very amused by this. We&#8217;re <em>mildly</em> amused, but we&#8217;re not very amused. But we&#8217;re refraining from making too big a fuss about it, because we&#8217;re still hoping we can get our hands on some of those sweet, sweet college loans.</p>
<p>In the interim, here are some questions I find myself pondering:</p>
<ul>
<li>Who writes the logic that does this kind of thing? I&#8217;m not asking for names; no need to rat out your best friend who works in Discover&#8217;s data mining department. I&#8217;m just curious to know what kind of background the people who come up with these things have. Artificial intelligence? Marketing research? Dental surgery?</li>
<li>How sophisticated are the rules used to screen customers for these mailings? Is there some serious business logic operating behind the scenes that happened to go wrong here, or is a well-meaning Discover employee just running SQL queries like &#8220;SELECT name, address FROM members WHERE description LIKE &#8216;%residence hall%&#8217;&#8221; on their lunch break?</li>
<li>Do credit card companies that do this kind of thing (which I imagine is pretty much all of them) actually validate their logic against test datasets (in this case, a large group of Discover members whose parental status has been independently verified), or do they just pick some criteria that seem to make sense and immediately start blanketing the United States with flyers?</li>
<li>What proportion of false positives is considered reasonable? Clearly, with any kind of program like this, some small number of customers is almost invariably going to get a letter that makes some very bad lifestyle assumptions. At what point does the risk of a backlash start to outweigh the potential for increased revenue? Obviously, the vast majority of people are probably going to chalk this type of thing down to a harmless error, but I imagine some small proportion of people are going to get upset and call up Discover to rant and rave about how they don&#8217;t have any children at all, and how dare Discover mine their records like this, and doesn&#8217;t Discover have any respect for them as loyal long-standing cardholders, and what&#8217;s that, why yes, of course, they&#8217;d be quite happy to accept Discover&#8217;s apology for this tragic error if it came with a two-for-one gift certificate to the Olive Garden.</li>
<li>Most importantly: is it considered fraud if I knowingly fill out an application for student loans in my lovely wife-daughter&#8217;s name?</li>
</ul>
<img src="http://feeds.feedburner.com/~r/citationNeeded/~4/HsENSArzJko" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://www.talyarkoni.org/blog/2012/01/04/in-which-discover-card-decides-that-my-wife-is-also-my-daughter/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		<feedburner:origLink>http://www.talyarkoni.org/blog/2012/01/04/in-which-discover-card-decides-that-my-wife-is-also-my-daughter/</feedburner:origLink></item>
		<item>
		<title>what Ben Parker wants you to know about neuroimaging</title>
		<link>http://feedproxy.google.com/~r/citationNeeded/~3/MWw3g5-LUZI/</link>
		<comments>http://www.talyarkoni.org/blog/2011/12/19/what-ben-parker-wants-you-to-know-about-neuroimaging/#comments</comments>
		<pubDate>Mon, 19 Dec 2011 12:00:41 +0000</pubDate>
		<dc:creator>Tal Yarkoni</dc:creator>
				<category><![CDATA[fmri]]></category>
		<category><![CDATA[general silliness]]></category>
		<category><![CDATA[opinion]]></category>
		<category><![CDATA[ice cream]]></category>
		<category><![CDATA[power]]></category>
		<category><![CDATA[reverse inference]]></category>
		<category><![CDATA[Spiderman]]></category>

		<guid isPermaLink="false">http://www.talyarkoni.org/blog/?p=907</guid>
		<description><![CDATA[I have a short opinion piece in the latest issue of The European Health Psychologist that discusses some of the caveats and limits of functional MRI. It&#8217;s a short and (I think) pretty readable piece; I touch on a couple of issues I&#8217;ve discussed frequently in other papers as well as here on the blog&#8211;namely, [...]]]></description>
			<content:encoded><![CDATA[<p>I have <a href="http://www.ehps.net/ehp/issues/2011/v13iss4_December2011/13_4_Yarkoni.pdf">a short opinion piece</a> in the latest issue of <a href="http://www.ehps.net/ehp/">The European Health Psychologist</a> that discusses <a href="http://www.ehps.net/ehp/issues/2011/v13iss4_December2011/13_4_Yarkoni.pdf">some of the caveats and limits of functional MRI</a>. It&#8217;s a short and (I think) pretty readable piece; I touch on a couple of issues I&#8217;ve discussed frequently in <a href="http://talyarkoni.org/papers/Yarkoni_PPS_commentary.pdf">other</a> <a href="http://talyarkoni.org/papers/Yarkoni_et_al_TICS.pdf">papers</a> as well as here on the blog&#8211;namely, the relatively low power of most fMRI analyses and the difficulties inherent in drawing causal inferences from neuroimaging results.</p>
<p>More importantly, though, I&#8217;ve finally fulfilled my long held goal of sneaking a Spiderman reference into an academic article (though, granted, one that wasn&#8217;t peer-reviewed). It would be going too far to say I can die happy now, but at least I can have an extra large serving of ice cream for dessert tonight without feeling guilty*. And no, I&#8217;m not going to spoil the surprise by revealing what Spidey has to do with fMRI. Though I will say that if you actually fall for the hook and <a href="http://www.ehps.net/ehp/issues/2011/v13iss4_December2011/13_4_Yarkoni.pdf">go read the article</a> just to find that out, you&#8217;re likely to be sorely disappointed.</p>
<p>&nbsp;</p>
<p>* So okay, the truth is, I never, <em>ever</em> feel guilty for eating ice cream, no matter the serving size.</p>
<img src="http://feeds.feedburner.com/~r/citationNeeded/~4/MWw3g5-LUZI" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://www.talyarkoni.org/blog/2011/12/19/what-ben-parker-wants-you-to-know-about-neuroimaging/feed/</wfw:commentRss>
		<slash:comments>8</slash:comments>
		<feedburner:origLink>http://www.talyarkoni.org/blog/2011/12/19/what-ben-parker-wants-you-to-know-about-neuroimaging/</feedburner:origLink></item>
		<item>
		<title>large-scale data exploration, MIC-style</title>
		<link>http://feedproxy.google.com/~r/citationNeeded/~3/KkS8oIMxdFs/</link>
		<comments>http://www.talyarkoni.org/blog/2011/12/17/large-scale-data-exploration-mic-style/#comments</comments>
		<pubDate>Sun, 18 Dec 2011 04:22:33 +0000</pubDate>
		<dc:creator>Tal Yarkoni</dc:creator>
				<category><![CDATA[fun with data]]></category>
		<category><![CDATA[methods]]></category>
		<category><![CDATA[statistics]]></category>
		<category><![CDATA[data mining]]></category>
		<category><![CDATA[mutual information]]></category>
		<category><![CDATA[obesity]]></category>

		<guid isPermaLink="false">http://www.talyarkoni.org/blog/?p=892</guid>
		<description><![CDATA[UPDATE 2/8/2012: Simon &#38; Tibshirani posted a critical commentary on this paper here. See additional thoughts here. Real-world data are messy. Relationships between two variables can take on an infinite number of forms, and while one doesn&#8217;t see, say, umbrella-shaped data very often, strange things can happen. When scientists talk about correlations or associations between [...]]]></description>
			<content:encoded><![CDATA[<p><em>UPDATE 2/8/2012: Simon &amp; Tibshirani posted a critical commentary on this paper <a href="http://www-stat.stanford.edu/~tibs/reshef/comment.pdf">here</a>. See additional thoughts <a href="http://www.talyarkoni.org/blog/2012/02/07/no-free-lunch-in-statistics/">here</a>.</em></p>
<p>Real-world data are messy. Relationships between two variables can take on an infinite number of forms, and while one doesn&#8217;t see, say, umbrella-shaped data very often, strange things can happen. When scientists talk about correlations or associations between variables, they&#8217;re usually referring to one very specific form of relationship&#8211;namely, a linear one. The assumption is that most associations between pairs of variables are reasonably well captured by positing that one variable increases in proportion to the other, with some added noise. In reality, of course, many associations aren&#8217;t linear, or even approximately so. For instance, many associations are cyclical (e.g., hours at work versus day of week), or curvilinear (e.g., heart attacks become precipitously more frequent past middle age), and so on.</p>
<p>Detecting a non-linear association is potentially just as easy as detecting a linear relationship if we know the form of that association up front. But there, of course, lies the rub: we generally<em> don&#8217;t</em> have strong intuitions about how most variables are likely to be non-linearly related. A more typical situation in many &#8216;big data&#8217; scientific disciplines is that we have a giant dataset full of thousands or millions of observations and hundreds or thousands of variables, and we want to determine which of the many associations between different variables are potentially important&#8211;without knowing anything about their potential shape. The problem, then, is that traditional measures of association don&#8217;t work very well; they&#8217;re only likely to detect associations to the extent that those associations approximate a linear fit.</p>
<p>A <a href="http://www.sciencemag.org/content/334/6062/1518.abstract">new paper in Science</a> by David Reshef and colleagues (and as <a href="http://www.columbia.edu/~lya2103/">a friend</a> pointed out, it&#8217;s a feat in and of itself just to get a statistics paper into Science) directly targets this data mining problem by introducing an elegant new measure of association called the <a href="http://www.sciencemag.org/content/334/6062/1518.abstract">Maximal Information Coefficient</a> (MIC; see also the authors&#8217; <a href="http://exploredata.net">project website</a>).  The clever insight at the core of the paper is that one can detect a systematic (i.e., non-random) relationship between two variables by quantifying and normalizing their maximal mutual information. <a href="http://en.wikipedia.org/wiki/Mutual_information">Mutual information</a> (MI) is an information theory measure of how much information you have about one variable given knowledge of the other. You have high MI when you can accurately predict the level of one variable given knowledge of the other, and low MI when knowledge of one variable is unhelpful in predicting the other. Importantly, unlike other measures (e.g., the correlation coefficient), MI makes no assumptions about the form of the relationship between the variables; one can have high mutual information for non-linear associations as well as linear ones.</p>
<p>MI and various derivative measures have been around for a long time now; what&#8217;s innovative about the Reshef et al paper is that the authors figured out a way to efficiently estimate and normalize the maximal MI one can obtain for any two variables. The very clever approach the authors use is to overlay a series of grids on top of the data, and to keep altering the resolution of the grid and moving its lines around until one obtains the maximum possible MI. In essence, it&#8217;s like dropping a wire mesh on top of a scatterplot and playing with it until you&#8217;ve boxed in all of the data points in the most informative way possible. And the neat thing is, you can apply the technique to any kind of data at all, and capture a very broad range of systematic relationships, not just linear ones.</p>
<p>To give you an intuitive sense of how this works, consider this Figure from the supplemental material:</p>
<p><a href="http://www.talyarkoni.org/blog/wp-content/uploads/2011/12/MICing_a_sinusoid.jpg"><img class="alignnone  wp-image-893" title="MICing_a_sinusoid" src="http://www.talyarkoni.org/blog/wp-content/uploads/2011/12/MICing_a_sinusoid.jpg" alt="" width="450" height="135" /></a></p>
<p>The underlying function here is sinusoidal. This is a potentially common type of association in many domains&#8211;e.g., it might explain the cyclical relationship between, say, coffee intake and hour of day (more coffee in the early morning and afternoon; less in between). But the linear correlation is essentially zero, so a typical analysis wouldn&#8217;t pick it up at all. On the other hand, the relationship itself is perfectly deterministic; if we can correctly identify the generative function in this case, we would have perfect information about Y given X. The question is how to capture this intuition algorithmically&#8211;especially given that real data are noisy.</p>
<p>This is where Reshef et al&#8217;s grid-based approach comes in. In the left panel above, you have a 2 x 8 grid overlaid on a sinusoidal function (the use of a 2 x 8 resolution here is just illustrative; the algorithm actually produces estimates for a wide range of grid resolutions). Even though it&#8217;s the optimal grid of that particular resolution, it still isn&#8217;t very good: knowing which row a particular point along the line falls into doesn&#8217;t tell you a whole lot about which column it falls into, and vice versa. In other words, mutual information is low. By contrast, the optimal 8 x 2 grid on the right side of the figure has a (perfect) MIC of 1: if you know which row in the grid a point on the line falls into, you can also determine which column it falls into with perfect accuracy. So the MIC approach will detect that there&#8217;s a perfectly systematic relationship between these two variables without any trouble, whereas the standard pearson correlation would be 0 (i.e., no relation at all). There are a couple of other steps involved (e.g., one needs to normalize the MIC to account for differences in grid resolution), but that&#8217;s the gist of it.</p>
<p>If the idea seems surprisingly simple, it is. But as with many very good ideas, hindsight is 20/20; it&#8217;s an idea that seems obvious once you hear it, but clearly wasn&#8217;t trivial to come up with (or someone would have done it a long time ago!). And of course, the simplicity of the core idea also shouldn&#8217;t blind us to the fact that there was undoubtedly a lot of very sophisticated work involved in figuring out how to normalize and bound the measure, provin that the approach works and implementing a dynamic algorithm capable of computing good MIC estimates in a reasonable amount of time (this <a href="http://news.harvard.edu/gazette/story/2011/12/dealing-with-data/">Harvard Gazette article</a> suggests Reshef and colleagues worked on the various problems for three years).</p>
<p>The utility of MIC and its improvement over existing measures is probably best captured in Figure 2 from the paper:</p>
<p><a href="http://www.talyarkoni.org/blog/wp-content/uploads/2011/12/MIC_check.jpg"><img class="alignnone  wp-image-894" title="MIC_check" src="http://www.talyarkoni.org/blog/wp-content/uploads/2011/12/MIC_check.jpg" alt="" width="450" height="691" /></a></p>
<p>Panel A shows the values one obtains with different measures when trying to capture different kinds of noiseless relationships (e.g., linear, exponential, and sinusoidal ones). The key point is that MIC assigns a value of 1 (the maximum) to every kind of association, whereas no other measure is capable of detecting the same range of associations with the same degree of sensitivity (and most fail horribly). By contrast, when given random data, MIC produces a value that tends towards zero (though it&#8217;s still not quite zero, a point I&#8217;ll come back to later). So what you effectively have is a measure that, with some caveats, can capture a very broad range of associations and place them on the same metric. The latter aspect is nicely captured in Panel G, which gives one a sense of what real (i.e., noisy) data corresponding to different MIC levels would look like. The main point is that, unlike other measures, a given value can correspond to very different types of associations. Admittedly, this may be a mixed blessing, since the flip side is that knowing the MIC value tells you almost nothing about what the association actually looks like (though <a href="http://en.wikipedia.org/wiki/Anscombe's_quartet">Anscombe&#8217;s Quartet</a> famously demonstrates that even a linear correlation can be misleading in this respect). But on the whole, I think it represents a potentially big advance in our ability to detect novel associations in a data-driven way.</p>
<p>Having introduced and explained the method, Reshef et al then go on to apply it to 4 very different datasets. I&#8217;ll just focus on one here&#8211;a set of global indicators from the World Health Organization (WHO). The data set contains 357 variables, or 63,546 variable pairs. When plotting MIC against the Pearson correlation coefficient the data look like this (panel A; click to blow up the figure):</p>
<p><a href="http://www.talyarkoni.org/blog/wp-content/uploads/2011/12/MIC_vs_PCC.jpg"><img class="alignnone  wp-image-895" title="MIC_vs_PCC" src="http://www.talyarkoni.org/blog/wp-content/uploads/2011/12/MIC_vs_PCC.jpg" alt="" width="450" height="201" /></a></p>
<p>The main point to note is that while MIC detects most strong linear effects (e.g., panel D), it also detects quite a few associations that have low linear correlations (e.g., E, F, and G). Reshef et al note that many of these effects have sensible interpretations (e.g., they argue that the left trend line in panel F reflects predominantly Pacific Island nations where obesity is culturally valued, and hence increases with income), but would be completely overlooked by an automated data mining approach that focuses only on linear correlations. They go on to report a number of other interesting examples ranging from analyses of gut bacteria to baseball statistics. All in all, it&#8217;s a compelling demonstration of a new metric that could potentially play an important role in large-scale data mining analyses going forward.</p>
<p>That said, while the paper clearly represents an important advance for large-scale data mining efforts, it&#8217;s also quite light on caveats and limitations (even for a length-constrained Science paper). Some potential concerns that come to mind:</p>
<ul>
<li>Reshef et al are understandably going to put their best foot forward, so we can expect that the &#8216;representative&#8217; examples they display (e.g., the WHO scatter plots above) are among the cleanest effects in the data, and aren&#8217;t necessarily typical. There&#8217;s nothing wrong with this, but it&#8217;s worth keeping in mind that much (and perhaps most) of the time, the associations MIC identifies aren&#8217;t going to be quite so clear-cut. Reshef&#8217;s et al approach can help identify potentially interesting associations, but once they&#8217;re identified, it&#8217;s still up to the investigator to figure out how to characterize them.</li>
<li>MIC is a (potentially quite heavily) biased measure. While it&#8217;s true, as the authors suggest, that it will &#8220;tend to 0 for statistically independent variables&#8221;, in most situations, the observed value will be substantially larger than 0 even when variables are completely uncorrelated. This falls directly out of the &#8216;M&#8217; in MIC, because when you take the maximal value from some larger search space as your estimate, you&#8217;re almost invariably going to end up capitalizing on chance to some degree. MIC will only tend to 0 when the sample size is very large; as this figure (from the supplemental material) shows, even with a sample size of n = 204, the MIC for uncorrelated variables will tend to hover somewhere around .15 for the parameterization used throughout the paper (the red line):<br />
<a href="http://www.talyarkoni.org/blog/wp-content/uploads/2011/12/MIC_bias.jpg"><img class="alignnone  wp-image-896" title="MIC_bias" src="http://www.talyarkoni.org/blog/wp-content/uploads/2011/12/MIC_bias.jpg" alt="" width="450" height="363" /></a>This isn&#8217;t a huge deal, but it does mean that interpretation of small MIC values is going to be very difficult in practice, since the lower end of the distribution is going to depend heavily on sample size. And it&#8217;s quite unpleasant to have a putatively standardized metric of effect size whose interpretation depends to some extent on sample parameters.</li>
<li>Reshef et al don&#8217;t report any analyses quantifying the sensitivity of MIC compared to conventional metrics like Pearson&#8217;s correlation coefficient. Obviously, MIC can pick up on effects Pearson can&#8217;t; but a crucial question is whether MIC shows comparable sensitivity when effects <em>are</em> linear. Similarly, we don&#8217;t know how well MIC performs when sample sizes are substantially smaller than those Reshef et al use in their simulations and empirical analyses. If it breaks down with n&#8217;s on the order of, say, 50 &#8211; 100, that would be important to know. So it would be great to see follow-up work characterizing performance under such circumstances&#8211;preferably before a flood of papers is published that all use MIC to do data mining in relatively small data sets.</li>
<li>As Andrew Gelman points out <a href="http://andrewgelman.com/2011/12/mr-pearson-meet-mr-mandelbrot-detecting-novel-associations-in-large-data-sets/">here</a>, it&#8217;s not entirely clear that one <em>wants</em> a measure that gives a high r-square-like value for pretty much any non-random association between variables. For instance, a perfect circle would get an MIC of 1 at the limit, which is potentially weird given that you can&#8217;t never deterministically predict y from x. I don&#8217;t have a strong feeling about this one way or the other, but can see why this might bother someone.</li>
</ul>
<p>Caveats aside though, from my perspective&#8211;as someone who likes to play with very large datasets but isn&#8217;t terribly statistically savvy&#8211;the Reshef et al paper seems like a really impressive piece of work that could have a big impact on at least some kinds of data mining analyses. I&#8217;d be curious to hear what more quantitatively sophisticated folks have to say.</p>
<p><span style="float: left; padding: 5px;"><a href="http://www.researchblogging.org"><img style="border: 0;" src="http://www.researchblogging.org/public/citation_icons/rb2_large_gray.png" alt="ResearchBlogging.org" /></a></span><br />
<span class="Z3988" title="ctx_ver=Z39.88-2004&amp;rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&amp;rft.jtitle=Science+%28New+York%2C+N.Y.%29&amp;rft_id=info%3Apmid%2F22174245&amp;rfr_id=info%3Asid%2Fresearchblogging.org&amp;rft.atitle=Detecting+novel+associations+in+large+data+sets.&amp;rft.issn=0036-8075&amp;rft.date=2011&amp;rft.volume=334&amp;rft.issue=6062&amp;rft.spage=1518&amp;rft.epage=24&amp;rft.artnum=&amp;rft.au=Reshef+DN&amp;rft.au=Reshef+YA&amp;rft.au=Finucane+HK&amp;rft.au=Grossman+SR&amp;rft.au=McVean+G&amp;rft.au=Turnbaugh+PJ&amp;rft.au=Lander+ES&amp;rft.au=Mitzenmacher+M&amp;rft.au=Sabeti+PC&amp;rfe_dat=bpr3.included=1;bpr3.tags=Probability+and+Statistics%2C+Data+Mining%2C+Information+Theory">Reshef DN, Reshef YA, Finucane HK, Grossman SR, McVean G, Turnbaugh PJ, Lander ES, Mitzenmacher M, &amp; Sabeti PC (2011). Detecting novel associations in large data sets. <span style="font-style: italic;">Science (New York, N.Y.), 334</span> (6062), 1518-24 PMID: <a href="http://www.ncbi.nlm.nih.gov/pubmed/22174245" rev="review">22174245</a></span></p>
<img src="http://feeds.feedburner.com/~r/citationNeeded/~4/KkS8oIMxdFs" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://www.talyarkoni.org/blog/2011/12/17/large-scale-data-exploration-mic-style/feed/</wfw:commentRss>
		<slash:comments>5</slash:comments>
		<feedburner:origLink>http://www.talyarkoni.org/blog/2011/12/17/large-scale-data-exploration-mic-style/</feedburner:origLink></item>
		<item>
		<title>tracking replication attempts in psychology–for real this time</title>
		<link>http://feedproxy.google.com/~r/citationNeeded/~3/MTYD90SNLV8/</link>
		<comments>http://www.talyarkoni.org/blog/2011/11/22/tracking-replication-attempts-in-psychology-for-real-this-time/#comments</comments>
		<pubDate>Wed, 23 Nov 2011 02:05:36 +0000</pubDate>
		<dc:creator>Tal Yarkoni</dc:creator>
				<category><![CDATA[publishing]]></category>
		<category><![CDATA[research]]></category>
		<category><![CDATA[technology]]></category>
		<category><![CDATA[file drawer]]></category>
		<category><![CDATA[null results]]></category>
		<category><![CDATA[psychoinformatics]]></category>
		<category><![CDATA[psychology]]></category>
		<category><![CDATA[replication]]></category>

		<guid isPermaLink="false">http://www.talyarkoni.org/blog/?p=886</guid>
		<description><![CDATA[I&#8217;ve written a few posts on this blog about how the development of better online infrastructure could help address and even solve many of the problems psychologists and other scientists face (e.g., the low reliability of peer review, the &#8216;fudge factor&#8217; in statistical reporting, the sheer size of the scientific literature, etc.). Actually, that general [...]]]></description>
			<content:encoded><![CDATA[<p>I&#8217;ve written a few posts on this blog about how the development of better online infrastructure could help address and even solve many of the problems psychologists and other scientists face (e.g., <a href="http://www.talyarkoni.org/blog/2011/08/23/building-better-platforms-for-evaluating-science-a-request-for-feedback">the low reliability of peer review</a>, <a href="http://www.talyarkoni.org/blog/2011/02/12/too-much-p-048-towards-partial-automation-of-scientific-evaluation">the &#8216;fudge factor&#8217; in statistical reporting</a>, <a href="http://www.talyarkoni.org/blog/2010/01/31/better-tools-for-mining-the-scientific-literature">the sheer size of the scientific literature</a>, etc.). Actually, that general question&#8211;how we can use technology to do better science&#8211;occupies a good chunk of my research these days (see e.g., <a href="http://neurosynth.org">Neurosynth</a>). One question I&#8217;ve been interested in for a long time is <a href="http://www.talyarkoni.org/blog/2009/11/26/solving-the-file-drawer-problem-by-making-the-internet-the-drawer/">how to keep track not only of &#8216;successful&#8217; studies</a> (i.e., those that produce sufficiently interesting effects to make it into the published literature), but also replication failures (or successes of limited interest) that wind up in researchers&#8217; <a href="http://www.talyarkoni.org/blog/2009/11/26/solving-the-file-drawer-problem-by-making-the-internet-the-drawer/">file drawers</a>. A couple of years ago I went so far as to build a prototype website for <a href="http://failuretoreplicate.com">tracking replication attempts in psychology</a>. Unfortunately, it never went anywhere, partly (okay, mostly) because the site really sucked, and partly because I didn&#8217;t really invest much effort in drumming up interest (mostly due to lack of time). But I still think the idea is a valuable one in principle, and a lot of other people have independently had the same idea (which means it <em>must </em>be right, right?).</p>
<p>Anyway, it looks like someone finally had the cleverness, time, and money to get this right. <a href="http://pashler.com">Hal Pashler</a>, <a href="http://psy4.ucsd.edu/~seankang/">Sean Kang</a>*, and colleagues at UCSD have been developing <a href="http://psychfiledrawer.org">an online database for tracking attempted replications</a> of psychology studies for a while now, and it looks like it&#8217;s now in beta. <a href="http://psychfiledrawer.org">PsychFileDrawer</a> is a very slick, full-featured platform that really should&#8211;if there&#8217;s any justice in the world&#8211;provide the kind of service everyone&#8217;s been saying we need for a long time now. If it <em>doesn&#8217;t</em> work, I think we&#8217;ll have some collective soul-searching to do, because I don&#8217;t think it&#8217;s going to get any easier than this to add and track attempted replications. So go use it!</p>
<p>&nbsp;</p>
<p>*Full disclosure: Sean Kang is a good friend of mine, so I&#8217;m not completely impartial in plugging this (though I&#8217;d do it anyway). Sean also happens to be amazingly smart and in search of a faculty job right now. If I were you, I&#8217;d hire him.</p>
<img src="http://feeds.feedburner.com/~r/citationNeeded/~4/MTYD90SNLV8" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://www.talyarkoni.org/blog/2011/11/22/tracking-replication-attempts-in-psychology-for-real-this-time/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		<feedburner:origLink>http://www.talyarkoni.org/blog/2011/11/22/tracking-replication-attempts-in-psychology-for-real-this-time/</feedburner:origLink></item>
		<item>
		<title>naked dense bodies provoke depression (and other tall scientific tales)</title>
		<link>http://feedproxy.google.com/~r/citationNeeded/~3/TsEt9yO2p78/</link>
		<comments>http://www.talyarkoni.org/blog/2011/11/09/naked-dense-bodies-provoke-depression-and-other-tall-scientific-tales/#comments</comments>
		<pubDate>Wed, 09 Nov 2011 07:47:15 +0000</pubDate>
		<dc:creator>Tal Yarkoni</dc:creator>
				<category><![CDATA[general silliness]]></category>
		<category><![CDATA[Asterix]]></category>
		<category><![CDATA[beer]]></category>
		<category><![CDATA[I really don't know how to use tags]]></category>
		<category><![CDATA[Japan]]></category>
		<category><![CDATA[Marvel]]></category>
		<category><![CDATA[Mendeley]]></category>
		<category><![CDATA[names]]></category>
		<category><![CDATA[silliness]]></category>

		<guid isPermaLink="false">http://www.talyarkoni.org/blog/?p=875</guid>
		<description><![CDATA[I&#8217;ve been using Mendeley for about a year now, and while there are plenty of kinks left for the developers iron out (mostly related to the Word plug-in), I have to say I like it a lot overall. I could say more about why I like it a lot, but I won&#8217;t, because this isn&#8217;t [...]]]></description>
			<content:encoded><![CDATA[<p>I&#8217;ve been using <a href="http://mendeley.com">Mendeley</a> for about a year now, and while there are plenty of kinks left for the developers iron out (mostly related to the Word plug-in), I have to say I like it a lot overall. I could say more about <em>why</em> I like it a lot, but I won&#8217;t, because this isn&#8217;t really a post about Mendeley. Rather, it&#8217;s a post about one particular group on Mendeley (groups on Mendeley are basically curated sets of thematically related scientific articles). Specifically, the &#8220;<a href="http://www.mendeley.com/groups/536621/creatively-named-research-papers/papers/">Creatively named research papers</a>&#8221; group.</p>
<p>Since the title of the group is self-explanatory, I&#8217;ll just list some of the more noteworthy entries, along with some of the corresponding notes I jotted down (you know, in case I need to refer back to these papers):</p>
<p>&nbsp;</p>
<p><strong>Naked Dense Bodies Provoke Depression</strong></p>
<p><em>I don&#8217;t think depression is the normative response to this stimulus; this must be a case report.</em><strong></strong></p>
<p>&nbsp;</p>
<p><strong>Marvel Universe looks almost like a real social network</strong></p>
<p><em>&#8220;We would like to mention that the actual number of collaborations is 569,770, but this value counts all collaborations in the Marvel Universe history, and while there are 91,040 pairs of characters that have only met once, other pairs have met quite often: for instance, every pair of members of the Fantastic Four has jointly appeared in around 700 comic books (more specifically, this range of collaborations of the members of the Fantastic Four runs between 668 joint appearances of the Thing and the Invisible Woman to 744 joint appearances of the Thing and the Human Torch).&#8221; (p. 7)</em><strong></strong></p>
<p>&nbsp;</p>
<p><strong>Are Analytic Philosophers Shallow and Stupid?</strong></p>
<p><em>I&#8217;ll leave this one up to the analytic philosophers to mull over. We&#8217;ll check back on their progress in another ten or twenty years.</em><strong></strong></p>
<p>&nbsp;</p>
<p><strong>Are full or empty beer bottles sturdier and does their fracture-threshold suffice to break the human skull?</strong></p>
<p><em>Spoiler: the answers are &#8216;empty&#8217; and &#8216;yes&#8217;, respectively.</em><strong></strong></p>
<p>&nbsp;</p>
<p><strong>A woman&#8217;s history of vaginal orgasm is discernible from her walk</strong></p>
<p><em>I don&#8217;t want to offend anyone, so I&#8217;m going to tread very delicately here and just tiptoe away quietly.</em><strong></strong></p>
<p>&nbsp;</p>
<p><strong>Traumatic brain injuries in illustrated literature: experience from a series of over 700 head injuries in the Asterix comic books</strong></p>
<p><em>At some point you kind of start to feel bad for the Romans.</em></p>
<p><img class="alignnone" title="le pauvre, pauvre, Romans." src="http://www.asterix.com/horoscope/images/romast004.gif" alt="" width="450" height="299" /><strong></strong></p>
<p>&nbsp;</p>
<p><strong>Skillful writing of an awful research paper</strong></p>
<p><em>Pretty sure I already know everything discussed in this article.</em><strong></strong></p>
<p>&nbsp;</p>
<p><strong>Chemical processes in the deep interior of Uranus</strong></p>
<p><em>Obvious joke is obvious.</em><strong></strong></p>
<p>&nbsp;</p>
<p><strong>Japan’s Phillips Curve Looks Like Japan</strong></p>
<p><a href="http://qed.econ.queensu.ca/working_papers/papers/qed_wp_1083.pdf">A pretty remarkable article</a>. Gregor Smith isn&#8217;t kidding; here&#8217;s Japan&#8217;s <a href="http://en.wikipedia.org/wiki/Phillips_curve">Phillips Curve</a>:</p>
<p><a href="http://www.talyarkoni.org/blog/wp-content/uploads/2011/11/Japan_Phillips_curve.jpg"><img class="alignnone size-full wp-image-876" title="Japan's Phillips curve" src="http://www.talyarkoni.org/blog/wp-content/uploads/2011/11/Japan_Phillips_curve.jpg" alt="" width="450" height="448" /></a><strong></strong></p>
<p>&nbsp;</p>
<p><strong>Is a jumper angrier than a tree?</strong></p>
<p><em>Possibly even better than the title of this paper is the set of papers Mendeley thinks are related, which include &#8220;The greater-than-g acceleration of a bungee jumper&#8221;, &#8220;When is a tree more than a tree?&#8221;, and my personal favorite, &#8220;The Angry, the Angrier, and the Angriest: Relationship Implications&#8221;.</em><strong></strong></p>
<p>&nbsp;</p>
<p><strong>The Penetration of a Finger into a Viscous Fluid in a Channel and Tube</strong></p>
<p><em>It&#8217;s not often you find your finger stuck in an oil-filled Chinese finger trap, but when it inevitably does happen, you&#8217;ll be very glad you read this paper.</em><strong></strong></p>
<p>&nbsp;</p>
<p><strong>Executive Decision-Making in the Domestic Sheep</strong></p>
<p><em>I&#8217;m a big fan of <a href="http://www.talyarkoni.org/blog/2009/12/22/the-parable-of-zoltan-and-his-twelve-sheep-or-why-a-little-skepticism-goes-a-long-way/">studies involving clever sheep</a>.</em><strong></strong></p>
<p>&nbsp;</p>
<p><strong>Numerical simulation of fundamental trapped sausage modes</strong></p>
<p><em>Alternative title: What&#8217;s the optimal amount of time to microwave a midnight snack for?</em><strong></strong></p>
<p>&nbsp;</p>
<p><strong>Accidental condom inhalation</strong></p>
<p><em>You&#8217;re doing it wrong.</em><strong></strong></p>
<p>&nbsp;</p>
<p><strong>On the Effectiveness of Aluminium Foil Helmets: An Empirical Study</strong></p>
<p><em>Pfft. Like anyone who wears one of these things is going to believe results published by agents of the scientific-industrial complex.</em><strong></strong></p>
<p>&nbsp;</p>
<p><strong>Experiments with genitalia : a commentary</strong></p>
<p><em>Abstract: &#8220;There has been a recent burst of studies of the function of genitalia, many of which share several important shortcomings. Given that further studies on this topic are likely (there are probably millions of species showing rapid genital divergence), I discuss the studies critically to promote clear formulation of hypotheses and interpretation of results in the future. I also emphasize some possibly important but neglected variables, including female stimulation, phylogenetic contexts, and the behavior of male genitalia, and outline simple techniques that could improve future studies.&#8221;</em><strong></strong></p>
<p>&nbsp;</p>
<p><strong>The earth is round (p &lt; . 05)</strong></p>
<p><em>For shame! This one has no business being in this group! It&#8217;s an excellent title to <a href="http://www.faculty.biol.ttu.edu/strauss/Stats/Readings/Cohen1994.pdf">one of the best commentaries</a> on psychological methods ever written!</em><strong></strong></p>
<p>&nbsp;</p>
<p><strong>Amusing titles in scientific journals and article citation</strong></p>
<p><em>Yes, you&#8217;re very clever, person who added this self-referential article to the group.</em><strong></strong></p>
<p>&nbsp;</p>
<p><strong>The ethics of eating a drug-company donut</strong></p>
<p><em>It starts with a donut, and before you know it, you&#8217;re spending your lunch break stuffing boxes full of Pfizer pens down your shirt pocket.</em><strong></strong></p>
<p>&nbsp;</p>
<p><strong>Rectal impalement by pirate ship: A case report</strong></p>
<p><em>You&#8217;re <span style="text-decoration: underline;">definitely</span> doing it wrong.</em></p>
<p>&nbsp;</p>
<p>Anyway, I&#8217;m sure this is just a tiny fraction of the creatively-named scientific literature. If you know of (or have authored) any worthy candidates, add them to the Mendeley group&#8211;or just indulge me and post them below in the comments. Note that in this context &#8216;creatively named&#8217; seems to mean humorous rather than clever. There are probably many more clever titles out there than funny ones (a trend abetted by the fact that a clever title is pretty much a prerequisite for publishing in Psychological Science at this point), but for purposes of this thread, we don&#8217;t want to hear about your naked dense bodies unless they&#8217;re funny-looking!</p>
<img src="http://feeds.feedburner.com/~r/citationNeeded/~4/TsEt9yO2p78" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://www.talyarkoni.org/blog/2011/11/09/naked-dense-bodies-provoke-depression-and-other-tall-scientific-tales/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		<feedburner:origLink>http://www.talyarkoni.org/blog/2011/11/09/naked-dense-bodies-provoke-depression-and-other-tall-scientific-tales/</feedburner:origLink></item>
		<item>
		<title>see me flub my powerpoint slides on NIF tv!</title>
		<link>http://feedproxy.google.com/~r/citationNeeded/~3/K8fEqlCUOMc/</link>
		<comments>http://www.talyarkoni.org/blog/2011/10/31/see-me-flub-my-powerpoint-slides-on-nif-tv/#comments</comments>
		<pubDate>Tue, 01 Nov 2011 05:44:35 +0000</pubDate>
		<dc:creator>Tal Yarkoni</dc:creator>
				<category><![CDATA[fmri]]></category>
		<category><![CDATA[methods]]></category>
		<category><![CDATA[research]]></category>
		<category><![CDATA[video]]></category>
		<category><![CDATA[data]]></category>
		<category><![CDATA[neurosynth]]></category>
		<category><![CDATA[NIF]]></category>
		<category><![CDATA[webinar]]></category>

		<guid isPermaLink="false">http://www.talyarkoni.org/blog/?p=869</guid>
		<description><![CDATA[&#160; UPDATE: the webcast is now archived here for posterity. This is kind of late notice and probably of interest to few people, but I&#8217;m giving the NIF webinar tomorrow (or today, depending on your time zone&#8211;either way, we&#8217;re talking about November 1st). I&#8217;ll be talking about Neurosynth, and focusing in particular on the methods [...]]]></description>
			<content:encoded><![CDATA[<p>&nbsp;</p>
<p><em>UPDATE: the webcast is now archived <a href="http://nif.adobeconnect.com/p86yigjoc6p/">here</a> for posterity.</em></p>
<p>This is kind of late notice and probably of interest to few people, but I&#8217;m giving the <a href="http://www.neuinfo.org/webinars/">NIF webinar</a> tomorrow (or today, depending on your time zone&#8211;either way, we&#8217;re talking about November 1st). I&#8217;ll be talking about <a href="http://neurosynth.org">Neurosynth</a>, and focusing in particular on the methods and data, since that&#8217;s what <a href="http://www.neuinfo.org/">NIF</a> (which stands for Neuroscience Information Framework) is all about. Assuming all goes well, the webinar should start at 11 am PST. But since I haven&#8217;t done a webcast of any kind before, and have a surprising knack for breaking audiovisual equipment at a distance, all may not go well. Which I suppose could make for a more interesting presentation. In any case, here&#8217;s the abstract:</p>
<blockquote><p>The explosive growth of the human neuroimaging literature has led to major advances in understanding of human brain function, but has also made aggregation and synthesis of neuroimaging findings increasingly difficult. In this webinar, I will describe a highly automated brain mapping framework called NeuroSynth that uses text mining, meta-analysis and machine learning techniques to generate a large database of mappings between neural and cognitive states. The NeuroSynth framework can be used to automatically conduct large-scale, high-quality neuroimaging meta-analyses, address long-standing inferential problems in the neuroimaging literature (e.g., how to infer cognitive states from distributed activity patterns), and support accurate ‘decoding’ of broad cognitive states from brain activity in both entire studies and individual human subjects. This webinar will focus on (a) the methods used to extract the data, (b) the structure of the resulting (publicly available) datasets, and (c) some major limitations of the current implementation. If time allows, I&#8217;ll also provide a walk-through of the associated web interface (<a href="http://neurosynth.org/" target="_blank">http://neurosynth.org</a>) and will provide concrete examples of some potential applications of the framework.</p></blockquote>
<p>There&#8217;s some more info (including details about how to connect, which might be important) <a href="http://www.neuinfo.org/webinars/">here</a>. And now I&#8217;m off to prepare my slides. And script some evasive and totally non-committal answers to deploy in case of difficult questions from the <del>peanut gallery</del> respected audience.</p>
<img src="http://feeds.feedburner.com/~r/citationNeeded/~4/K8fEqlCUOMc" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://www.talyarkoni.org/blog/2011/10/31/see-me-flub-my-powerpoint-slides-on-nif-tv/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		<feedburner:origLink>http://www.talyarkoni.org/blog/2011/10/31/see-me-flub-my-powerpoint-slides-on-nif-tv/</feedburner:origLink></item>
	</channel>
</rss>

