<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" media="screen" href="/~d/styles/rss2full.xsl"?><?xml-stylesheet type="text/css" media="screen" href="http://feeds.feedburner.com/~d/styles/itemcontent.css"?><rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:wfw="http://wellformedweb.org/CommentAPI/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:sy="http://purl.org/rss/1.0/modules/syndication/" xmlns:slash="http://purl.org/rss/1.0/modules/slash/" version="2.0">

<channel>
	<title>Prefrontal.org</title>
	
	<link>http://prefrontal.org/blog</link>
	<description>A personal weblog of developmental cognitive neuroscience.</description>
	<lastBuildDate>Fri, 26 Feb 2010 05:02:51 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.9.2</generator>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<atom10:link xmlns:atom10="http://www.w3.org/2005/Atom" rel="self" type="application/rss+xml" href="http://feeds.feedburner.com/prefrontal" /><feedburner:info xmlns:feedburner="http://rssnamespace.org/feedburner/ext/1.0" uri="prefrontal" /><atom10:link xmlns:atom10="http://www.w3.org/2005/Atom" rel="hub" href="http://pubsubhubbub.appspot.com/" /><item>
		<title>PAPER: How reliable are the results from functional magnetic resonance imaging?</title>
		<link>http://prefrontal.org/blog/2010/02/paper-how-reliable-are-the-results-from-functional-magnetic-resonance-imaging/</link>
		<comments>http://prefrontal.org/blog/2010/02/paper-how-reliable-are-the-results-from-functional-magnetic-resonance-imaging/#comments</comments>
		<pubDate>Fri, 26 Feb 2010 05:02:51 +0000</pubDate>
		<dc:creator>prefrontal</dc:creator>
				<category><![CDATA[CogNeuro]]></category>
		<category><![CDATA[MRI]]></category>
		<category><![CDATA[Statistics]]></category>

		<guid isPermaLink="false">http://prefrontal.org/blog/?p=893</guid>
		<description><![CDATA[- Current Citation:
Bennett CM, Miller MB. (in press). How reliable are the results from functional magnetic resonance imaging?  Annals of the New York Academy of Sciences.
- Abstract:
Functional magnetic resonance imaging is one of the most important methods for in vivo investigation of cognitive processes in the human brain.  Within the last two decades [...]]]></description>
			<content:encoded><![CDATA[<p><strong>- Current Citation:</strong><br />
Bennett CM, Miller MB. (in press). How reliable are the results from functional magnetic resonance imaging?  <em>Annals of the New York Academy of Sciences.</em></p>
<p><strong>- Abstract:</strong><br />
Functional magnetic resonance imaging is one of the most important methods for in vivo investigation of cognitive processes in the human brain.  Within the last two decades an explosion of research has emerged using fMRI, revealing the underpinnings of everything from motor and sensory processes to the foundations of social cognition.  While these results have revealed the potential of neuroimaging, important questions regarding the reliability of these results remain unanswered.  In this chapter we take a close look at what is currently known about the reliability of fMRI findings.  First, we examine the many factors that influence the quality of acquired fMRI data.  We also conduct a review of the existing literature to determine if some measure of agreement has emerged regarding the reliability of fMRI.  Finally, we provide commentary on ways to improve fMRI reliability and what questions remain unanswered.  Reliability is the foundation on which scientific investigation is based.  How reliable are the results from fMRI?</p>
<p><strong>- Downloadable Versions:</strong><br />
[<a href="http://prefrontal.org/files/papers/Bennett-NYAS-2010.pdf">Manuscript PDF</a>]</p>
<p><span id="more-893"></span><br />
<strong>- Full Text:</strong></p>
<p>Reliability is the cornerstone of any scientific enterprise. Issues of research validity and significance are relatively meaningless if the results of our experiments are not trustworthy.  It is the case that reliability can vary greatly depending on the tools being used and what is being measured. Therefore, it is imperative that any scientific endeavor be aware of the reliability of its measurements. </p>
<p>Surprisingly, most fMRI researchers have only a vague idea of how reliable their results are.  Reliability is not a typical topic of conversation between most investigators and only a small fraction of papers investigating fMRI reliability have been published.  This became an important issue in 2009 as a paper by Vul, Harris, Winkielman, and Pashler set the stage for debate (2009).  Their paper, originally entitled “Voodoo Correlations in Social Neuroscience”, was focused on a statistical problem known as the ‘non-independence error’.  Critical to their argument was the reliability of functional imaging results.  Vul et al. argued that test-retest variability of fMRI results placed an ‘upper bound’ on the strength of possible correlations between fMRI data and behavioral measures:</p>
<p><center><img src="http://prefrontal.org/blog/wp-content/uploads/2010/02/Bennett-Reliability-Eq1.png" alt="" title="Bennett-Reliability-Eq1" width="390" height="23" class="alignnone size-full wp-image-1052" /></center></p>
<p>This calculation reflects that the strength of a correlation between two measures is a product of the measured relationship and the reliability of the measurements (Nunnally, 1970; Vul et al., 2009).  Vul et al. specified that behavioral measures of personality and emotion have a reliability of around 0.8 and that fMRI results have a reliability of around 0.7.  Not everyone agreed.  Across several written exchanges multiple research groups debated what the “actual reliability” of fMRI was.  Jabbi et al. stated that the reliability of fMRI could be as high as 0.98 (2009).  Lieberman et al. split the difference and argued that fMRI reliability was likely around 0.90 (2009).  While much ink was spilled debating the reliability of fMRI results, very little consensus was reached regarding an appropriate approximation of its value.</p>
<p>The difficulty of detecting signal (what we are trying to measure) from amongst a sea of noise (everything else we don’t care about) is a constant struggle for all scientists.  It influences what effects can be examined and is directly tied to the reliability of research results.  What follows in this chapter is a multifaceted examination of fMRI reliability.  We examine why reliability is a critical metric of fMRI data, discuss what factors influence the quality of the blood oxygen level dependent (BOLD) signal, and investigate the existing reliability literature to determine if some measure of agreement has emerged across studies.  Fundamentally, there is one critical question that this chapter seeks to address: if you repeat your fMRI experiment, what is the likelihood you will get the same result?</p>
<p><b>Pragmatics of Reliability</b></p>
<p>Why worry about reliability at all?  As long as investigators are following accepted statistical practices and being conservative in the generation of their results, why should the field be bothered with how reproducible the results might be?  There are, at least, four primary reasons why test-retest reliability should be a concern for all fMRI researchers.</p>
<p><u>Scientific truth.</u>  While it is a simple statement that can be taken straight out of an undergraduate research methods course, an important point must be made about reliability in research studies: it is the foundation on which scientific knowledge is based.  Without reliable, reproducible results no study can effectively contribute to scientific knowledge.  After all, if a researcher obtains a different set of results today than they did yesterday, what has really been discovered?  To ensure the long-term success of functional neuroimaging it is critical to investigate the many sources of variability that impact reliability.  It is a strong statement, but if results do not generalize from one set of subjects to another or from one scanner to another then the findings are of little value scientifically.</p>
<p><u>Clinical and Diagnostic Applications.</u>  The longitudinal assessment of changes in regional brain activity is becoming increasingly important for the diagnosis and treatment of clinical disorders.  One potential use of fMRI is for the localization of specific cognitive functions before surgery.  A good example is the localization of language function prior to tissue resection for epilepsy treatment (Fernandez et al., 2003).  This is truly a case where an investigator does not want a slightly different result each time they conduct the scan.  If fMRI is to be used for surgical planning or clinical diagnostics then any issues of reliability must be quantified and addressed.  </p>
<p><u>Evidentiary Applications.</u>  The results from functional imaging are increasingly being submitted as evidence into the United States legal system.  For example, results from a commercial company called No Lie MRI (San Diego, CA; http://www.noliemri.com/) were introduced into a juvenile sex abuse case in San Diego during the spring of 2009.  The defense was attempting to introduce the fMRI results as scientific justification of their client’s claim of innocence.  A concerted effort from imaging scientists, including in-person testimony from Marc Raichle, eventually forced the defense to withdraw the request.  While the fMRI results never made it into this case, it is clear that fMRI evidence will be increasingly common in the courtroom.  What are the larger implications if the reliability of this evidence is not as trustworthy as we assume?</p>
<p><u>Scientific Collaboration.</u>  A final pragmatic dimension of fMRI reliability is the ability to share data between researchers.  This is already a difficult challenge, as each scanner has its own unique sources of error that become part of the data (Jovicich et al., 2006).  Early evidence has indicated that the results from a standard cognitive task can be quite similar across scanners (Casey et al., 1998; Friedman et al., 2008).  Still, concordance of results remains an issue that must be addressed for large-scale, collaborative inter-center investigations. The ultimate level of reliability is the reproducibility of results from any equivalent scanner around the world and the ability to integrate this data into larger investigations.</p>
<p><center><br />
<h4>- What Factors Influence fMRI Reliability? -</h4>
<p></center></p>
<p>The ability of fMRI to detect meaningful signals is limited by a number of factors that add error to each measurement.  Some of these factors include thermal noise, system noise in the scanner, physiological noise from the subject, non-task related cognitive processes, and changes in cognitive strategy over time (Huettel et al., 2008; Kruger and Glover, 2001).  The concept of reliability is, at its core, a representation of the ability to routinely detect relevant signals from this background of meaningless noise.  If a voxel timeseries contains a large amount of signal then the primary sources of variability are actual changes in blood flow related to neural activity within the brain.  Conversely, in a voxel containing a large amount of noise the measurements are dominated by error and would not contain meaningful information.  By increasing the amount of signal, or decreasing the amount of noise, a researcher can effectively increase the quality and reliability of acquired data.  </p>
<p>The quality of data in magnetic resonance imaging is typically measured using the signal-to-noise ratio (SNR) of the acquired images.  The goal is to maximize this ratio.  Two kinds of SNR are important for functional MRI.  The first is the image SNR.  It is related to the quality of data acquired in a single fMRI volume. Image SNR is typically computed as the mean signal value of all voxels divided by the standard deviation of all voxels in a single image:</p>
<p><center><img src="http://prefrontal.org/blog/wp-content/uploads/2010/02/Bennett-Reliability-Eq2.png" alt="" title="Bennett-Reliability-Eq2" width="173" height="26" class="alignnone size-full wp-image-1054" /></center></p>
<p>Increasing the image SNR will improve the quality of data at a single point in time.  However, most important for functional neuroimaging is the amount of signal present in the data across time.  This makes the temporal SNR (tSNR) perhaps the most important metric of data for functional MRI.  It represents the signal-to-noise ratio of the timeseries at each voxel:</p>
<p><center><img src="http://prefrontal.org/blog/wp-content/uploads/2010/02/Bennett-Reliability-Eq3.png" alt="" title="Bennett-Reliability-Eq3" width="220" height="26" class="alignnone size-full wp-image-1055" /></center></p>
<p>The tSNR is not the same across all voxels in the brain.  Some regions will have higher or lower tSNR depending on location and constitution.  For example, there are documented differences in tSNR between gray matter and white matter (Bodurka et al., 2005).  The typical tSNR of fMRI can also vary depending on the same factors that influence image SNR.  </p>
<p>Another metric of data quality is the contrast-to-noise ratio (CNR).  This refers to the ability to maximize differences between signal intensity in different areas in an image (image CNR) or to maximize differences between different points in time (temporal CNR).  With regard to functional neuroimaging, the temporal CNR represents the maximum relative difference in signal intensity that is represented within a single voxel.  In a voxel with low CNR there would be very little difference between two conditions of interest.  Conversely, in a voxel with high CNR there would be relatively large differences between two conditions of interest.  The image CNR is not critical to fMRI, but having a high temporal CNR is very important for detecting task effects.</p>
<p>It is generally accepted that fMRI is a rather noisy measurement with a characteristically low tSNR, requiring extensive signal averaging to achieve effective signal detection (Murphy et al., 2007).  The following sections provide greater detail on the influence of specific factors on the SNR/tSNR of functional MRI data.  We break these factors down by the influence of differences in image acquisition, the image analysis pipeline, and the contribution of the subjects themselves.</p>
<p><b>SNR influences of MRI acquisition</b></p>
<p>The typical high-field MRI scanner is a precision superconducting device constructed to very exact manufacturing tolerances.  Still, the images it produces can be somewhat variable depending on a number of hardware and software variables.  With regard to hardware, one well-known influence on the signal to noise ratio of MRI is the strength of the primary B0 magnetic field (Bandettini et al., 1994; Ogawa et al., 1993).  Doubling this field, such as moving from 1.5 Tesla to a 3.0 Tesla field strength, can theoretically double the SNR of the data.  The B0 field strength is especially important for fMRI, which relies on magnetic susceptibility effects to create the blood oxygen level dependent (BOLD) signal (Turner et al., 1993).  Hoenig et al. showed that, relative to a 1.5 Tesla magnet, a 3.0 Tesla fMRI acquisition had 60-80% more significant voxels (2005).  They also demonstrated that the CNR of the results was 1.3 times higher than those obtained at 1.5 Tesla.  The strength and slew rate of the gradient magnets can have a similar impact on SNR.  Advances in head coil design are also notable, as parallel acquisition head coils have increased radiofrequency reception sensitivity.</p>
<p>It is important to note that there are negative aspects of higher field strength as well.  Artifacts due to physiological effects and susceptibility are all increasingly pronounced at higher fields.  The increased contribution of physiological noise reduces the expected gains in SNR at high field (Kruger and Glover, 2001).  The increasing contribution of susceptibility artifacts can virtually wipe out areas of orbital prefrontal cortex and inferior temporal cortex (Jezzard and Clare, 1999).  Also, in terms of tSNR there are diminishing returns with each step up in B0 field strength.  At typical fMRI spatial resolution values tSNR approaches an asymptotic limit between 3 Tesla and 7 Tesla (Kruger and Glover, 2001; Triantafyllou et al., 2005).</p>
<p>Looking beyond the scanner hardware, the parameters of the fMRI acquisition can also have a significant impact on the SNR/CNR of the final images.  For example, small changes in the voxel size of a sequence can dramatically alter the final SNR.  Moving from 1.5 mm3 voxels to 3.0 mm3 voxels can potentially increase the acquisition SNR by a factor of eight, but at a cost of spatial resolution.  Some other acquisition variables that will influence the acquired SNR/CNR are : repetition time (TR), echo time (TE), bandwidth, slice gap, and k-space trajectory.  For example, Moser et al. found that optimizing the flip angle of their acquisition could approximately double the SNR of their data in a visual stimulation task (1996).  Further, the effect of each parameter varies according to the field strength of the magnet (Triantafyllou et al., 2005).  The optimal parameter set for a 3 Tesla system may not be optimal with a 7 Tesla system.</p>
<p>The ugly truth is that any number of factors in the control room or magnet suite can increase noise in the images.  A famous example from one imaging center was when the broken filament from a light bulb in a distant corner of the magnet suite started causing visible sinusoidal striations in the acquired EPI images.  This is an extreme example, but it makes the point that the scanner is a precision device that is designed to operate in a narrow set of well-defined circumstances.  Any deviation from those circumstances will increase noise, thereby reducing SNR and reliability.</p>
<p><b>SNR considerations of analysis methods</b></p>
<p>The methods used to analyze fMRI data will affect the reliability of the final results.  In particular, those steps taken to reduce known sources of error are critical to increasing the final SNR/CNR of preprocessed images.  For example, spatial realignment of the EPI data can have a dramatic effect on lowering movement-related variance and has become a standard part of fMRI preprocessing (Oakes et al., 2005; Zhilkin and Alexander, 2004).  Recent algorithms can also help remove remaining signal variability due to magnetic susceptibility induced by movement (Andersson et al., 2001).  Temporal filtering of the EPI timeseries can reduce undesired sources of noise by frequency.  The use of a high-pass filter is a common method to remove low-frequency noise, such as signal drift due to the scanner (Kiebel and Holmes, 2007).  Spatial smoothing of the data can also improve the SNR/CNR of an image.  There is some measure of random noise added to the true signal of each voxel during acquisition.  Smoothing across voxels can help to average out error across the area of the smoothing filter (Mikl et al., 2008).  It can also help account for local differences in anatomy across subjects.  Smoothing is most often done using a Gaussian kernel of approximately 6-12 mm3 FWHM.  </p>
<p>There has been some degree of standardization regarding preprocessing and statistical approaches in fMRI.  For instance, Mumford and Nichols found that approximately 92% of group fMRI results were computed using an ordinary least squares (OLS) estimation of the general linear model (2009).  Comparison studies with carefully standardized processing procedures have shown that the output of standard software packages can be very similar (Gold et al., 1998; Morgan et al., 2007).  However, in actual practice the diversity of tools and approaches in fMRI increases the variability between sets of results.  The functional imaging analysis contest (FIAC) in 2005 demonstrated that prominent differences existed between fMRI results generated by different groups using the same original dataset.  On reviewing the results the organizers concluded that brain regions exhibiting robust signal changes could be quite similar across analysis techniques, but the detection of areas with lower signal was highly variable (Poline et al., 2006).  It remains the case that decisions made by the researcher regarding how to analyze the data will impact what results are found.</p>
<p>Strother et al. have done a great deal of research into the influence of image processing pipelines using a predictive modeling framework (2004; 2002; Zhang et al., 2009).  They found that small changes in the processing pipeline of fMRI images have a dramatic impact on the final statistics derived from that data.  Some steps, such as slice timing correction, were found to have little influence on the results from experiments with a block design.  This is logical, given the relative insensitivity of block designs to small temporal shifts.  However, the steps of motion correction, high-pass filtering, and spatial smoothing were found to significantly improve the analysis.  They reported that the optimization of preprocessing pipelines improved both intra-subject and between-subject reproducibility of results (Zhang et al., 2009).   Identifying an optimal set of processing steps and parameters can dramatically improve the sensitivity of an analysis.</p>
<p><b>SNR influences of participants</b></p>
<p>The MRI system and fMRI analysis methods have received a great deal of attention with regard to SNR.  However, one area that may have the greatest contribution to fMRI reliability is how stable/unstable the patterns of activity within a single subject can be.  After all, a test-retest methodology involving human beings is akin to hitting a moving target.  Any discussion of test-retest reliability in fMRI has to take into consideration the fact that the cognitive state of a subject is variable over time.</p>
<p>There are two important ways that a subject can influence reliability within a test-retest experimental design.  The first involves within-subject changes that take place over the course of a single session.  For instance, differences in attention and arousal can significantly modulate subsequent responses to sensory stimulation (Munneke et al., 2008; Peyron et al., 1999; Sterr et al., 2007).  Variability can also be caused by evolving changes in cognitive strategy used during tasks like episodic retrieval (Miller et al., 2001; Miller et al., 2002).  If a subject spontaneously shifts to a new decision criterion midway during a session then the resulting data may reflect the results of two different cognitive processes.  Finally, learning will take place with continued task experience, shifting the pattern of activity as brain regions are engaged and disengaged during task-relevant processing (Grafton et al., 1995; Poldrack et al., 1999; Rostami et al., 2009).  For studies investigating learning this is a desired effect, but for others this is an undesired source of noise.</p>
<p>The second influence on reliability is related to physiological and cognitive changes that may take place within a subject between the test and retest sessions.  Within 24 hours an infinite variety of reliability-reducing events can take place.  All of the above factors may show changes over the days, weeks, months, or years between scans.  These changes may be even more dramatic depending on the amount of time between scanning sessions.</p>
<p><center><br />
<h4>- Estimates of fMRI Reliability -</h4>
<p></center></p>
<p>A diverse array of methods have been created for measuring the reliability of fMRI.  What differs between them is the specific facet of reliability they are intended to quantify.  Some methods are only concerned with significant voxels.  Other methods address similarity in the magnitude of estimated activity across all voxels.  The choice of how to calculate reliability often comes down to which aspect of the results are desired to remain stable over time.</p>
<p><b>Measuring stability of super-threshold extent.</b></p>
<p>Do you want the voxels that are significant during the test scan to still be significant during the retest scan?  This would indicate that super-threshold voxels are to remain above the threshold during subsequent sessions.  The most prevalent method to quantify this reliability is the cluster overlap method.  The cluster overlap method is a measure revealing what set of voxels are considered to be super-threshold during both test and retest sessions.  </p>
<p>Two approaches have been used to calculate cluster overlap.  The first, and by far most prevalent, is a measure of similarity known as the Dice coefficient.  It was first used to calculate fMRI cluster overlap by Rombouts et al. and has become a standard measure of result similarity (1997).  It is typically calculated by the following equation:</p>
<p><center><img src="http://prefrontal.org/blog/wp-content/uploads/2010/02/Bennett-Reliability-Eq4.png" alt="" title="Bennett-Reliability-Eq4" width="231" height="28" class="alignnone size-full wp-image-1057" /></center></p>
<p>Results from the Dice equation can be interpreted as the number of voxels that will overlap divided by the average number of significant voxels across sessions.  Another approach to calculating similarity is the Jaccard index.  The Jaccard index has the advantage of being readily interpretable as the percent of voxels that are shared, but is infrequently used in the investigation of reliability.  It is typically calculated by the following equation:</p>
<p><center><img src="http://prefrontal.org/blog/wp-content/uploads/2010/02/Bennett-Reliability-Eq5.png" alt="" title="Bennett-Reliability-Eq5" width="254" height="27" class="alignnone size-full wp-image-1058" /></center></p>
<p>Results from the Jaccard equation can be interpreted as the number of overlapping voxels divided by the total number of unique voxels in all sessions.  For both the Dice and Jaccard methods a value of 1.0 would indicate that all super-threshold voxels identified during the test scan were also active in the retest scan, and vice-versa.  A value of 0.0 would indicate that no voxels in either scan were shared between the test and retest sessions.  See Figure 1 for a graphical representation of overlapping results from two runs in an example dataset.</p>
<p><center><br />
<table width='500'>
<tr>
<td align="center">
<a href="http://prefrontal.org/blog/wp-content/uploads/2010/02/Bennett-Reliability-Figure1LG.png"><img src="http://prefrontal.org/blog/wp-content/uploads/2010/02/Bennett-Reliability-Figure1SM.png" alt="" title="Bennett-Reliability-Figure1SM" width="250" height="213" class="alignnone size-full wp-image-1027" /></a>
</td>
</tr>
<tr>
<td>
Figure 1. Visualization of cluster overlap using two runs of data from a two-back working memory task. The regions in red represent significant clusters from the first run and regions in blue represent significant clusters from the second run. The crosshatched region represents the overlapping voxels that were significant in both runs. Important to note is that not all significant voxels remained significant across the two runs. One cluster in the cerebellum did not replicate at all. Data is from Bennett, Guerin, and Miller (2009).
</td>
</tr>
</table>
<p></center>&nbsp;</p>
<p>The main limitation of all cluster overlap methods is that they are highly dependent on the statistical threshold used to define what is ‘active’.  Duncan et al. demonstrated that the reported reliability of the cluster overlap method decreases as the significance threshold is increased (2009).  Similar results were reported by Rombouts et al., who found nonlinear changes in cluster overlap reliability across multiple levels of significance (1998).</p>
<p>These overlap statistics seek to represent the proportion of voxels that remain significant across repetitions relative to the proportion that are significant in only a subset of the results.  Another, similar approach would be to conduct a formal conjunction analysis between the repetitions.  The goal of this approach would be to uniquely identify those voxels that are significant in all sessions.  One example of this approach would be the ‘Minimum Statistic compared to the Conjunction Null’ (MS/CN) of Nichols et al (2005).  Using this approach a researcher could threshold the results, allowing for the investigation of reliability with a statistical criterion.</p>
<p>A method similar to cluster overlap, called voxel counting, was reported in early papers.  The use of voxel counting simply evaluated the total number of activated voxels in the test and retest images.  This has proven to be a suboptimal approach for the examination of reliability, as it is done without regard to the spatial location of significant voxels (Cohen and DuBois, 1999).  An entirely different set of results could be observed in each image yet they could contain the same number of significant voxels.  As a result this method is no longer used.</p>
<p><b>Measuring stability of activity in significant clusters.</b></p>
<p>Do you want the estimated magnitude of activity in each cluster to be stable between the test scan and the retest scan?  This is a more stringent criteria than simple extent reliability, as it is necessary to replicate the exact degree of activation and not simply what survives thresholding.  The most standard method to quantify this reliability is through an intra-class correlation (ICC) of the time1-time2 cluster values.  The intra-class correlation is different from the traditional Pearson product-moment correlation as it is specialized for data of one type, or class.  While there are many versions of the ICC, it is typically taken to be a ratio of the variance of interest divided by the total variance (Bartko, 1966; Shrout and Fleiss, 1979).  The ICC can be computed as follows:</p>
<p><center><img src="http://prefrontal.org/blog/wp-content/uploads/2010/02/Bennett-Reliability-Eq6.png" alt="" title="Bennett-Reliability-Eq6" width="245" height="30" class="alignnone size-full wp-image-1059" /></center></p>
<p>One of the best reviews of the ICC was completed by Shrout and Fleiss, who detailed six types of ICC calculation and when each is appropriate to use (1979).  One advantage of the ICC is that it can be interpreted similarly to the Pearson correlation.  A value of 1.0 would indicate near-perfect agreement between the values of the test and retest sessions, as there would be no influence of within-subject variability.  A value of 0.0 would indicate that there was no agreement between the values of the test and retest sessions, since within-subject variability would dominate the equation.</p>
<p>Studies examining reliability using intra-class correlations are often computed based on summary values from regions of interest (ROIs).  Caceras et al. compared four methods commonly used to compute ROI reliability using intraclass correlations (2009).  The median(ICC) is the median of the ICC values from within a ROI.  ICCmed is the median ICC of the contrast values.  ICCmax is the calculation of ICC values at the peak activated voxel within an activated cluster.  ICCv is defined the intra-voxel reliability, a measure of the total variability that can be explained by the intra-voxel variance.</p>
<p>There are several notable weaknesses to the use of ICC in calculating reliability.  First, the generalization of ICC results is limited because calculation is specific to the dataset under investigation.  An experiment with high inter-subject variability could have different ICC values relative to an experiment with low inter-subject variability, even if the stability of values over time is the same.  As discussed later in this chapter, this can be particularly problematic when comparing the reliability of clinical disorders to that of normal controls.  Second, because of the variety of ICC subtypes there can often be confusion regarding which one to use.  Using an incorrect subtype can result in quite different reliability estimates (Muller and Buttner, 1994).</p>
<p><b>Measuring voxelwise reliability of the whole brain.</b></p>
<p>Do you want to know the reliability of results on a whole-brain, voxelwise basis?  Completing a voxelwise calculation would indicate that the level of activity in all voxels should remain consistent between the test and retest scans.  This is the strictest criterion for reliability.  It yields a global measure of concordance that indicates how effectively activity across the whole brain is represented in each test-retest pairing. Very few studies have examined reliability using this approach, but it may be one of the most valuable metrics of fMRI reliability.  This is one of the few methods that gives weight to the idea that the estimated activity should remain consistent between test and retest, even if the level of activity is close to zero.</p>
<p><center><br />
<table width='500'>
<tr>
<td align="center">
<a href="http://prefrontal.org/blog/wp-content/uploads/2010/02/Bennett-Reliability-Figure2LG.png"><img src="http://prefrontal.org/blog/wp-content/uploads/2010/02/Bennett-Reliability-Figure2SM.png" alt="" title="Bennett-Reliability-Figure2SM" width="250" height="169" class="alignnone size-full wp-image-1029" /></a>
</td>
</tr>
<tr>
<td>
Figure 2. Histogram showing the frequency of voxelwise ICC values during a two-back working memory task. The histogram was computed from a dataset of sixteen subjects using 100 bins between ICC values of 1.0 and -1.0. The distribution of values is negatively skewed, with a mean ICC value of ICC = 0.44 and the most frequently occurring value of ICC = 0.57. Data is from Bennett, Guerin, and Miller (2009).
</td>
</tr>
</table>
<p></center>&nbsp;</p>
<p>Figure 2 is an example histogram plot from our own data that shows the frequency of ICC values for all voxels across the whole brain during a two-back working memory task (Bennett et al., 2009).  The mean and mode of the distribution is plotted.  It is quickly apparent that there is a wide range of ICC reliability values across the whole brain, with some voxels having almost no reliability and others approaching near perfect reliability.</p>
<p><b>Other reliability methods.</b></p>
<p>Numerous other methods have also been used to measure the reliability of estimated activity.  Some of these include maximum likelihood (ML), coefficient of variation (CV), and variance decomposition.  While these methods are in the minority by frequency of use, this does not diminish their utility in examining reliability.  This is especially true with regard to identifying the sources of test-retest variability that can influence the stability of results.</p>
<p>One particularly promising approach for the quantification of reliability is predictive modeling.  Predictive modeling measures the ability of a training set of data to predict the structure of a testing set of data.  One of the best established modeling techniques within functional neuroimaging is the nonparametric prediction, activation, influence, and reproducibility sampling (NPAIRS) approach by Strother et al. (2004; 2002).  Within the NPAIRS modeling framework separate metrics of prediction and reproducibility are generated (Zhang et al., 2008).  The first, prediction accuracy, evaluates classification in the temporal domain, predicting which condition of the experiment each scan belongs to.  The second metric, reproducibility, evaluates the model in the spatial domain, comparing patterns of regional brain activity over time.  While this approach is far more complicated than the relatively simple cluster overlap or ICC metrics, predictive modeling does not suffer from many of the drawbacks that these methods have.  NPAIRS, and other predictive modeling approaches, enable a much more thorough examination of fMRI reliability.</p>
<p>Some studies have investigated fMRI reliability using the Pearson product-moment (r) correlation.  Intuitively this is a logical method to use, as it measures the relationship between two variables.  However, it is generally held that the Pearson product-moment correlation is not an ideal measure of test-retest reliability.  Safrit identified three reasons why the product-moment correlation should not be used to calculate reliability (1976).  First, the Pearson product-moment correlation is setup to determine the relationship between two variables, not the stability of a single variable.  Second, it is difficult to measure reliability with the Pearson product-moment correlation beyond a single test-retest pair.  It becomes increasingly awkward to quantify reliability with two or more retest sessions.  One can try to average over multiple pairwise Pearson product-moment correlations between the multiple sessions, but it is far easier to take the ANOVA approach of the ICC and examine it from the standpoint of between- and within-subject variability.  Third, the Pearson product-moment correlation cannot detect systematic error.  This would be the case when the retest values deviate by a similar degree, such as adding a constant value to all of the original test values.  The Pearson product-moment correlation would remain the same, while an appropriate ICC would indicate that the test-retest agreement is not exact.  While the use of ICC measures has its own set of issues, it is generally a more appropriate tool for the investigation of test-retest reliability.</p>
<p><center><br />
<h4>- Review of Existing Reliability Estimates -</h4>
<p></center></p>
<p>Since the advent of fMRI some results have been common and quite easily replicated.  For example, activity in primary visual cortex during visual stimulation has been thoroughly studied.  Other fMRI results have been somewhat difficult to replicate.  What does the existing literature have to say regarding the reliability of fMRI results?</p>
<p>There have been a number of individual studies investigating the test-retest reliability of fMRI results, but few articles have reviewed the entire body of literature to find trends across studies.  To obtain a more effective estimate of fMRI reliability we conducted a survey of the existing literature on fMRI reliability.  To find papers for this investigation we searched for “test-retest fMRI” using the NCBI PubMed database (www.pubmed.gov).  This search yielded a total of 183 papers, 37 of which used fMRI as a method of investigation, used a general linear model to compute their results, and provided test-retest measures of reliability.  To broaden the scope of the search we then went through the reference section of the 37 papers found using PubMed to look for additional works not identified in the initial search. There were 26 additional papers added to the investigation through this secondary search method.  The total number of papers retrieved was 63.  Each paper was examined with regard to the type of cognitive task, kind of fMRI design, number of subjects, and basis of reliability calculation.  </p>
<p>We have separated out the results into three groups: those that used the voxel overlap method, those that used intraclass correlation, and papers that used other calculation methods.  The results of this investigation can be seen in Tables 1, 2, and 3.  In the examination of cluster overlap values in the literature we attempted to only include values that were observed at a similar significance threshold across all of the papers.  The value we chose as the standard was p(uncorrected) < 0.001.  Other deviations from this standard approach are noted in the tables.</p>
<p><center><br />
<table width='500'>
<tr>
<td align='center'>
<a href='http://prefrontal.org/blog/wp-content/uploads/2010/02/Bennett-Reliability-Table1.pdf'><img src="http://prefrontal.org/blog/wp-content/uploads/2010/02/Icon-PDF.png" alt="" title="Icon-PDF" width="61" height="70" class="alignnone size-full wp-image-1043" /><br />
Table1</a>
</td>
<td align='center'>
<a href='http://prefrontal.org/blog/wp-content/uploads/2010/02/Bennett-Reliability-Table2.pdf'><img src="http://prefrontal.org/blog/wp-content/uploads/2010/02/Icon-PDF.png" alt="" title="Icon-PDF" width="61" height="70" class="alignnone size-full wp-image-1043" /><br />
Table2</a>
</td>
<td align='center'>
<a href='http://prefrontal.org/blog/wp-content/uploads/2010/02/Bennett-Reliability-Table3.pdf'><img src="http://prefrontal.org/blog/wp-content/uploads/2010/02/Icon-PDF.png" alt="" title="Icon-PDF" width="61" height="70" class="alignnone size-full wp-image-1043" /><br />
Table3</a>
</td>
</tr>
</table>
<p></center>&nbsp;</p>
<p><b>Conclusions From the Reliability Review</b></p>
<p>What follows are some general points that can be taken away from the reliability survey.  Some of the conclusions that follow are quantitative results from the review and some are qualitative descriptions of trends that were observed as we conducted the review.</p>
<p><u>A diverse collection of methods have been used to assess fMRI reliability.</u>  The first finding mirrors the above discussion on reliability calculation.  A very diverse collection of methods has been used to investigate fMRI reliability.  This list includes: intra-class correlation (ICC), cluster overlap, voxel counts, receiver operating characteristic (ROC) curves, maximum likelihood (ML), conjunction analysis, Cohen’s kappa index, coefficient of variation (CV), Kendall’s W, laterality index (LI), variance component decomposition, Pearson correlation, predictive modeling, and still others.  While this diversity of methods has created converging evidence of fMRI reliability, it has also limited the ability to compare and contrast the results of existing reliability studies.</p>
<p><u>Intra-class correlation and cluster overlap methods dominate the calculation of test-retest reliability.</u> While there have been a number of methods used to investigate reliability, the two that stand out by frequency of use are cluster overlap and intra-class correlation.  One advantage of these methods is that they are easy to calculate.  The equations are simple to understand, easy to implement, and fast to process.  A second advantage of these methods is their easy interpretation by other scientists.  Even members of the general public can understand the concept behind the overlapping of clusters and most everyone is familiar with correlation values.  While these techniques certainly have limitations and caveats, they seem to be the emerging standard for the analysis of fMRI reliability.</p>
<p><u>Most previous studies of reliability and reproducibility have been done with relatively few subjects.</u>  What sample size is necessary to conduct effective reliability research?  Most of the studies that were reviewed used less than 10 subjects to calculate their reliability measures, with 11 subjects being the overall average across the investigation.  Should reliability studies have more subjects?  Since a large amount of the error variance is coming from subject-specific factors it may be wise to use larger sample sizes when assessing study reliability, as a single anomalous subject could sway study reliability in either direction.  Another notable factor is that a large percentage of studies using fMRI are completed with a restricted range of subjects.  Most samples will typically be recruited from a pool of university undergraduates.  These samples may have a different reliability than a sample pulled at random from the larger population.  Because of sample restriction the results of most test-retest investigations may not reflect the true reliability of other populations, such as children, the elderly, and individuals with clinical disorders.</p>
<p><u>Reliability varies by test-retest interval.</u>  Generally, increased amounts of time between the initial test scan and the subsequent retest scan will lower reliability.  Still, even back-to-back scans are not perfectly reliable.  The average Jaccard overlap of studies where the test and retest scans took place within the same hour was 33%.  Many studies with intervals lasting three months or more had a lower overlap percentage.  This is a somewhat loose guideline though.  Notably, the results reported by Aron et al. had one of the longest test-retest intervals but also possessed the highest average ICC score (2006).</p>
<p><u>Reliability varies by cognitive task and experimental design.</u>  Motor and sensory tasks seem to have greater reliability than tasks involving higher cognition.  Caceras et al. found that the reliability of an N-back task was generally higher than that of an auditory target detection task (2009).  Differences in the design of an fMRI experiment also seem to affect the reliability of results.  Specifically, block designs appear to have a slight advantage over event-related designs in terms of reliability.  This may be a function of the greater statistical power inherent in a block design and its increased SNR. </p>
<p><u>Significance is related to reliability, but it is not a strong correlation.</u>  Several studies have illustrated that super-threshold voxels are not necessarily more reliable than sub-threshold voxels.  Caceras et al. examined the joint probability distribution of significance and reliability (2009).  They found that there were some highly activated ROIs with low reliability and some sub-threshold regions that had high reliability. These ICC results fit in well with the data from cluster overlap studies.  The average cluster overlap was 29%.  This means that, across studies, the average number of significant voxels that will replicate is roughly one-third.  This evidence speaks against the assumption that significant voxels will be far more reliable in an investigation of test-retest reliability.</p>
<p><u>An optimal threshold of reliability has not been established.</u>  There is no consensus value regarding what constitutes an acceptable level of reliability in fMRI.  Is an ICC value of 0.50 enough?  Should studies be required to achieve an ICC of 0.70?  All of the studies in the review simply reported what the reliability values were.  Few studies proposed any kind of criteria to be considered a ‘reliable’ result.  Cicchetti and Sparrow did propose some qualitative descriptions of data based on the ICC-derived reliability of results (1981).  They proposed that results with an ICC above 0.75 be considered ‘excellent’, results between 0.59 and 0.75 be considered ‘good’, results between .40 and .58 be considered ‘fair’, and results lower than 0.40 be considered ‘poor’.  More specifically to neuroimaging, Eaton et al. (2008) used a threshold of ICC > 0.4 as the mask value for their study while Aron et al. (2006) used an ICC cutoff of ICC > 0.5 as the mask value. </p>
<p><u>Inter-individual variability is consistently greater than intra-individual variability.</u>  Many studies reported both within-subject and between-subject reliability values in their results.  In every case the within-subject reliability far exceeded the between-subjects reliability.  Miller et al. explicitly examined variability across subjects and concluded that there are large-scale, stable differences between individuals on almost any cognitive task (2001; 2002).  More recently, Miller et al. directly contrasted within- and between-subject variability (2009).  They concluded that between-subject variability was far higher than any within-subject variability.  They further demonstrated that the results from one subject completing two different cognitive tasks are typically more similar than the data from two subjects doing the same task.  These results are mirrored by those of Costafreda et al. who found that well over half (57%) of the variability in their fMRI data was due to between-subject variation (2007).  It seems to be the case that within-subject measurements over time may vary, but they vary far less than differences in the overall pattern of activity between individuals.</p>
<p><u>There is little agreement regarding the true reliability of fMRI results.</u>  While we mention this as a final conclusion from the literature review, it is perhaps the most important point.  Some studies have estimated the reliability of fMRI data to be quite high, or even close to perfect for some tasks and brain regions (Aron et al., 2006; Maldjian et al., 2002; Raemaekers et al., 2007).  Other studies have been less enthusiastic, showing fMRI reliability to be relatively low (Duncan et al., 2009; Rau et al., 2007).  Across the survey of fMRI test-retest reliability we found that the average ICC value was 0.50 and the average cluster overlap value was 29% of voxels (Dice overlap = 0.45, Jaccard overlap = 0.29).  This represents an average across many different cognitive tasks, fMRI experimental designs, test-retest time periods, and other variables.  While these numbers may not be representative of any one experiment, they do provide an effective overview of fMRI reliability.</p>
<p><center><br />
<h4>- Other Issues and Comparisons -</h4>
<p></center></p>
<p><b>Test-Retest Reliability in Clinical Disorders</b></p>
<p>There have been few examinations of test-retest reliability in clinical disorders relative to the number of studies with normal controls.  A contributing factor to this problem may be that the scientific understanding of brain disorders is still in its infancy.  It may be premature to examine clinical reliability if there is only a vague understanding of anatomical and functional abnormalities in the brain.  Still, some investigators have taken significant steps forward in the clinical realm.  These few investigations suggest that reliability in clinical disorders is typically lower than the reliability of data from normal controls.  Some highlights of these results are listed below, categorized by disorder.</p>
<p><u>Epilepsy.</u>  Functional imaging has enormous potential to aid in the clinical diagnosis of epileptiform disorders.   Focusing on fMRI, research by Di Bonaventura et al. found that the spatial extent of activity associated with fixation off sensitivity (FOS) was stable over time in epileptic patients (2005).  Of greater research interest for epilepsy has been the reliability of combined EEG/fMRI imaging.  Symms et al. reported that they could reliably localize interictal epileptiform discharges using EEG-triggered fMRI (1999).  Waites et al. also reported the reliable detection of discharges with combined EEG/fMRI at levels significantly above chance (2005).  Functional imaging also has the potential to assist in the localization of cognitive function prior to resection for epilepsy treatment.  One possibility would be to use noninvasive fMRI measures to replace cerebral sodium amobarbital anesthetization (Wada Test).  Fernandez et al. reported good reliability of lateralization indices (whole-brain test-retest r = 0.82) and cluster overlap measures (Dice overlap = .43, Jaccard overlap = 0.27) (2003).</p>
<p><u>Stroke.</u>  Many aspects of stroke recovery can impact the results of functional imaging data.  The lesion location, size, and time elapsed since the stroke event each have the potential to alter function within the brain.  These factors can also lead to increased between-subject variability relative to groups of normal controls.  This is especially true when areas proximal to the lesion location contribute to specific aspects of information processing, such as speech production.  Kimberley et al. found that stroke patients had generally higher ICC values relative to normal controls (2008).  This mirrors the findings of Eaton et al., who showed that the average reliability of aphasia patients was approximately equal to that of normal controls as measured by ICC (2008).  These results may be indicative of equivalent fMRI reliability in stroke victims, or it may be an artifact of the ICC calculation.  Kimberly et al. state that increased between-subject variability of stroke patients can lead to inflated ICC estimates (2008).  They argue that fMRI reliability in stroke patients likely falls within the moderate range of values (0.4 < ICC < 0.6).</p>
<p><u>Schizophrenia.</u>  Schizophrenia is a multidimensional mental disorder characterized by a wide array of cognitive and perceptual dysfunctions (Freedman, 2003; Morrison and Murray, 2005).  While there have been a number of studies on the reliability of anatomical measures in schizophrenia there have been few that have focused on function.  Manoach et al. demonstrated that the fMRI results from schizophrenic patients on a working memory task were less reliable overall than that of normal controls (2001).  The reliability of significant ROIs in the schizophrenic group ranged from ICC values of -0.20 to 0.57.  However, the opposite effect was found by Whalley et al. in a group of subjects at high genetic risk for schizophrenia (no psychotic symptoms) (2009).  The ICC values for these subjects were equally reliable relative to normal controls on a sentence completion task.  More research is certainly needed to find consensus on reliability in schizophrenia.</p>
<p><u>Aging.</u>  The anatomical and functional changes that take place during aging can increase the variability of fMRI results at all levels (MacDonald et al., 2006).  Clement et al. reported that cluster overlap percentages and the cluster-wise ICC values were not significantly different between normal elderly controls and patients with mild cognitive impairment (MCI) (2009).  On an episodic retrieval task healthy controls had ICC values averaging 0.69 while patients diagnosed with MCI had values averaging 0.70.  However, they also reported that all values for the older samples were lower than those reported for younger adults on similar tasks.  Marshall et al. found that while the qualitative reproducibility of results was high, the reliability of activation magnitude during aging was quite low (2004).</p>
<p>It is clear that the use of intra-class correlations in clinical research must be approached carefully.  As mentioned by Bosnell et al. and Kimberly et al., extreme levels of between-subject variability will artificially inflate the resulting ICC reliability estimate (Bosnell et al., 2008; Kimberley et al., 2008).  Increased between-subject variability is a characteristic found in many clinical populations.  Therefore, it may be the case that comparing two populations with different levels of between-subject variability may be impossible when using an ICC measure.</p>
<p><b>Reliability Across Scanners / Multicenter Studies</b></p>
<p>One area of increasing research interest is the ability to combine the data from multiple scanners into larger, integrative data sets (Van Horn and Toga, 2009).  There are two areas of reliability that are important for such studies.  The first is subject-level reliability, or how stable the activity of one person will be scan-to-scan.  The second is group-level reliability, or how stable the group fMRI results will be from one set of subjects to another or from one scanner to another.  Given the importance of multi-center collaboration it is critical to evaluate how results will differ when the data comes from a heterogeneous group of MRI scanners as opposed to a single machine.  Generally, the concordance of fMRI results from center to center is quite good, but not perfect.  </p>
<p>Casey et al. was one of the first groups to examine the reliability of results across scanners (1998).  Between three imaging centers they found a ‘strong similarity’ in the location and distribution of significant voxel clusters.  More recently, Friedman et al. found that inter-center reliability was somewhat worse than test-retest reliability across several centers with an identical hardware configuration (2008).  The median ICC of their inter-center results was ICC = 0.22.  Costafreda et al. also examined the reproducibility of results from identical fMRI setups (2007).   Using a variance components analysis they determined that the MR system accounted for roughly 8% of the variation in the BOLD signal.  This compares favorably relative to the level of between-subject variability (57%).</p>
<p>The reliability of results from one scanner to another seems to be approximately equal to or slightly less than the values of test-retest reliability with the same MRI hardware.  Special calibration and quality control steps can be taken to ensure maximum concordance across scanners.  For instance, before conducting anatomical MRI scans in the Alzheimer’s Disease Neuroimaging Initiative (ADNI, http://www.loni.ucla.edu/ADNI/) a special MR phantom is typically scanned.  This allows for correction of magnet-specific field inhomogeneity and maximizes the ability to compare data from separate scanners.  Similar calibration measures are being discussed for functional MRI (Chiarelli et al., 2007; Friedman and Glover, 2006; Thomason et al., 2007).  It may be the case that as calibration becomes standardized it will lead to increased inter-center reliability. </p>
<p><b>Other Statistical Issues in fMRI</b></p>
<p>It is important to note that a number of important fMRI statistical issues have gone unmentioned in this chapter.  First, there is the problem of conducting thousands of statistical comparisons without an appropriate threshold adjustment.  Correction for multiple comparisons is a necessary step in fMRI analysis that is often skipped or ignored (Bennett et al., in press).  Another statistical issue in fMRI is temporal autocorrelation in the acquired timeseries.  This refers to the fact that any single timepoint of data is not necessarily independent of the acquisitions that came before and after (Smith et al., 2007; Woolrich et al., 2001).  Autocorrelation correction is widely available, but is not implemented by most investigators.  Finally, throughout the last year the ‘non-independence error’ has been discussed at length.  Briefly, this refers to selecting a set of voxels to create a region of interest (ROI) and then using the same measure to evaluate some statistical aspect of that region.  Ideally, an independent data set should be used after the ROI has been initially defined.  It is important to address these issues because they are still debated within the field and often ignored in fMRI analysis.  Their correction can have a dramatic impact on how reproducible the results will be from study to study.</p>
<p><center><br />
<h4>- Conclusions -</h4>
<p></center></p>
<p><b>How can a researcher improve fMRI reliability?</b></p>
<p>The generation of highly reliable results requires that sources of error be minimized across a wide array of factors.  An issue within any single factor can significantly reduce reliability.  Problems with the scanner, a poorly designed task, or an improper analysis method could each be extremely detrimental.  Conversely, elimination of all such issues is necessary for high reliability.  A well maintained scanner, well designed tasks, and effective analysis techniques are all prerequisites for reliable results.</p>
<p>There are a number of practical ways that fMRI researchers can improve the reliability of their results.  For example, Friedman and Glover reported that simply increasing the number of fMRI runs improved the reliability of their results from ICC = 0.26 to ICC = 0.58 (2006).  That is quite a large jump for an additional ten or fifteen minutes of scanning.  Below are some general areas where reliability can be improved.</p>
<p><u>Increase the SNR and CNR of the acquisition.</u>  One area of attention is to improve the signal-to-noise and contrast-to-noise ratios of the data collection.  An easy way to do this would be to simply acquire more data.  It is a zero-sum game, as increasing the number of TRs that are acquired will help improve the SNR but will also increase the task length.  Subject fatigue, scanner time limitations, and the diminishing returns with each duration increase will all play a role in limiting the amount of time that can be dedicated to any one task.  Still, a researcher considering a single six-minute EPI scan for their task might add additional data collection to improve the SNR of the results.  With regard to the magnet, every imaging center should verify acquisition quality before scanning.  Many sites conduct quality assurance scans (QA) at the beginning of each day to ensure stable operation.  This has proven to be an effective method of detecting issues with the MR system before they cause trouble for investigators.  It is a hassle to cancel a scanning session when there are subtle artifacts present, but this is a better option than acquiring noisy data that does not make a meaningful contribution to the investigation.  As a final thought, research groups can always start fundraising to purchase a new magnet with improved specifications.  If data acquisition is being done on a 1.5 Tesla magnet with a quadrature head coil enormous gains in SNR can be made by moving to 3.0 Tesla or higher and using a parallel-acquisition head coil (Simmons et al., 2009; Zou et al., 2005).</p>
<p><u>Minimize individual differences in cognitive state, both across subjects and over time.</u>  Because magnet time is expensive and precious the critical component of effective task instruction can often be overlooked.  Researchers would rather be acquiring data as opposed to spending additional time giving detailed instructions to a subject.  However, this is a very easy way to improve the quality of the final data set.  If it takes ten trials for the participant to really ‘get’ the task then those trials have been wasted, adding unnecessary noise to the final results.  Task training in a separate laboratory session in conjunction with time in a mock MRI scanner can go a long way toward homogenizing the scanner experience for subjects.  It may not always be possible to fully implement these steps, but they should not be avoided simply to reduce the time spent per subject.  </p>
<p>For multi-session studies steps can be taken to help stabilize intra-subject changes over time.  Scanning test and retest session at the same time of day can help due to circadian changes in hormone level and cognitive performance (Carrier and Monk, 2000; Huang et al., 2006; Salthouse et al., 2006).  A further step to consider is minimizing the time between sessions to help stabilize the results.  Much more can change over the course of a month than over the course of a week.</p>
<p><u>Maximize the experiment’s statistical power.</u>  Power represents the ability of an experiment to reject the null hypothesis when the null hypothesis is indeed false (Cohen, 1977).  For fMRI this ability is often discussed in terms of the number of subjects that will be scanned and the design of the task that will be administered, including how many volumes of data will be acquired from each subject.  More subjects and more volumes almost always contribute to increasing power, but there are occasions when one may improve power more than the other.  For example, Mumford and Nichols demonstrated that, when scanner time was limited, different combinations of subjects and trials could be used to achieve high levels of power (2008).  For their hypothetical task it would take only five 15 second blocks to achieve 80% power if there were 23 subjects, but it would take 25 blocks if there were only 18 subjects.  These kinds of power estimations are quite useful in determining the best use of available scanner time.  Tools like fmripower (http://fmripower.org) can utilize data from existing experiments to yield new information on how many subjects and scans a new experiment will require to reach a desired power level (Mumford and Nichols, 2008; Mumford et al., 2007 2007; Van Horn et al., 1998).</p>
<p>The structure of the stimulus presentation has a strong influence on an experiment’s statistical power.  The dynamic interplay between stimulus presentation and inter-stimulus jitter are important, as is knowing what contrasts will be completed once the data has been acquired.  Each of these parameters can influence the power and efficiency of the experiment, later impacting the reliability of the results.  Block designs tend to have greater power relative to event-related designs.  One can also increase power by increasing block length, but care should be exercised not to make blocks so long that they approach the low frequencies associated with scanner drift.  There are several good software tools available that will help researchers create an optimal design for fMRI experiments.  OptSeq is a program that helps to maximize the efficiency of an event-related fMRI design (1999).  OptimizeDesign is a set of Matlab scripts that utilize a genetic search algorithm to maximize specific aspects of the design (Wager and Nichols, 2003).  Researchers can separately weight statistical power, HRF estimation efficiency, stimulus counterbalancing, and maintenance of stimulus frequency.  These two programs, and others like them, are valuable tools for ensuring that the ability to detect meaningful signals is effectively maximized.</p>
<p>It is important to state that the reliability of a study in no way implies that an experiment has accurately assessed a specific cognitive process.  The validity of a study can be quite orthogonal to its reliability – it is possible to have very reliable results from a task that mean little with regard to the cognitive process under investigation.  No increase in SNR or optimization of event timing can hope to improve an experiment that is testing for the wrong thing.  This makes task selection of paramount importance in the planning of an experiment.  It also places a burden on the researcher in terms of effective interpretation of fMRI results once the analysis is done. </p>
<p><b>Where does neuroimaging go next?</b></p>
<p>In many ways cognitive neuroscience is still at the beginning of fMRI as a research tool.  Looking back on the last two decades it is clear that functional MRI has made enormous gains in both statistical methodology and popularity.  However, there is still much work to do.  With specific regard to reliability, there are some specific next steps that must be taken for the continued improvement of this method.</p>
<p><u>Better Characterization of the Factors that Influence Reliability.</u>  Additional research is necessary to effectively understand what factors influence the reliability of fMRI results.  The field has a good grasp of the acquisition and analysis factors that influence SNR.  Still, there is relatively little knowledge regarding how stable individuals are over time and what influences that stability.  Large-scale studies specifically investigating reliability and reproducibility should therefore be conducted across several cognitive domains.  The end goal of this research would be to better characterize the reliability of fMRI across multiple dimensions of influence within a homogeneous set of data.  Such a study would also create greater awareness of fMRI reliability in the field as a whole.  The direct comparison of reliability analysis methods, including predictive modeling, should also be completed.</p>
<p><u>Meta/Mega Analysis.</u>  The increased pooling of data from across multiple studies can give a more generalized view of important cognitive processes.  One method, meta-analysis, refers to pooling the statistical results of numerous studies to identify those results that are concordant and discordant with others.  For example, one could obtain the MNI coordinates of significant clusters from several studies having to do with response inhibition and plot them in the same stereotaxic space to determine their concordance.  One popular method of performing such an analysis is the creation of an Activation Likelihood Estimate, or ALE (Eickhoff et al., 2009; Turkeltaub et al., 2002).  This method allows for the statistical thresholding of meta-analysis results, making it a powerful tool to examine the findings of many studies at once.  Another method, mega-analysis, refers to reprocessing the raw data from numerous studies in a new statistical analysis with much greater power.  Using this approach any systematic error introduced by any one study will contribute far less to the final statistical result (Costafreda, in press).  Mega-analyses are far more difficult to implement since the raw imaging data from multiple studies must be obtained and reprocessed.  Still, the increase in detection power and the greater generalizability of the results are strong reasons to engage in such an approach.</p>
<p>One roadblock to collaborative multi-center studies is the lack of data provenance in functional neuroimaging.  Provenance refers to complete detail regarding the origin of a dataset and the history of operations that have been preformed on the data.  Having a complete history of the data enables analysis by other researchers and provides information that is critical for replication studies (Mackenzie-Graham et al., 2008).  Moving forward there will be an additional focus on provenance to enable increased understanding of individual studies and facilitate integration into larger analyses.</p>
<p><u>New Emphasis on Replication.</u>  The non-independence debate of 2009 was less about effect sizes and more about reproducibility.  The implicit argument made about studies that were ‘non-independent’ was that if researchers ran a non-independent study over again the resulting correlation would be far lower with a new, independent dataset.  There should be a greater emphasis on the replicability of studies in the future.  This can be frustrating because it is expensive and time consuming to acquire and process a replication study.  However, moving forward this may become increasingly important to validate important results and conclusions.</p>
<p><b>General Conclusions</b></p>
<p>One thing is abundantly clear: fMRI is an effective research tool that has opened broad new horizons of investigation to scientists around the world.  However, the results from fMRI research may be somewhat less reliable than many researchers implicitly believe.  While it may be frustrating to know that fMRI results are not perfectly replicable, it is beneficial to take a longer-term view regarding the scientific impact of these studies.  In neuroimaging, as in other scientific fields, errors will be made and some results will not replicate.  Still, over time some measure of truth will accrue.  This chapter is not intended to be an accusation against fMRI as a method.  Quite the contrary, it is meant to increase the understanding of how much each fMRI result can contribute to scientific knowledge.  If only 30% of the significant voxels in a cluster will replicate then that value represents an important piece of contextual information to be aware of.  Likewise, if the magnitude of a voxel is only reliable at a level of ICC = 0.50 then that value represents important information when examining scatter plots comparing estimates of activity against a behavioral measure.</p>
<p>There are a variety of methods that can be used to evaluate reliability, and each can provide information on unique aspects of the results.  Our findings speak strongly to the question of why there is no agreed-upon average value for fMRI reliability.  There are so many factors spread out across so many levels of influence that it is almost impossible to summarize the reliability of fMRI with a single value.  While our average ICC value of 0.50 and our average overlap value of 30% are effective summaries of fMRI as a whole, these values may be higher or lower on a study-to-study basis.  The best characterization of fMRI reliability would be to give a window within which fMRI results are typically reliable.  Breaking up the range of 0.0 to 1.0 into thirds, it is appropriate to say that most fMRI results are reliable in the ICC = 0.33 to 0.66 range.</p>
<p>To conclude, functional neuroimaging with fMRI is no longer in its infancy.  Instead it has reached a point of adolescence, where knowledge and methods have made enormous progress but there is still much development left to be done.  Our growing pains from this point forward are going to be a more complete understanding of its strengths, weaknesses, and limitations.  A working knowledge of fMRI reliability is key to this understanding.  The reliability of fMRI may not be the high relative to other scientific measures, but it is presently the best tool available for the in vivo investigation of brain function.  </p>
<p><center><br />
<h4>- References -</h4>
<p></center></p>
<p>Andersson, J.L., Hutton, C., Ashburner, J., Turner, R., Friston, K., 2001. Modeling geometric deformations in EPI time series. Neuroimage 13, 903-919.</p>
<p>Aron, A.R., Gluck, M.A., Poldrack, R.A., 2006. Long-term test-retest reliability of functional MRI in a classification learning task. Neuroimage 29, 1000-1006.</p>
<p>Bandettini, P.A., Wong, E.C., Jesmanowicz, A., Hinks, R.S., Hyde, J.S., 1994. Spin-echo and gradient-echo EPI of human brain activation using BOLD contrast: a comparative study at 1.5 T. NMR Biomed 7, 12-20.</p>
<p>Bartko, J., 1966. The intraclass correlation coefficient as a measure of reliability. Psychological Reports 19, 3-11.</p>
<p>Bennett, C.M., Guerin, S.A., Miller, M.B., 2009. The impact of experimental design on the detection of individual variability in fMRI. Cognitive Neuroscience Society, San Francisco, CA.</p>
<p>Bennett, C.M., Wolford, G.L., Miller, M.B., in press. The principled control of false positives in neuroimaging. Social Cognitive and Affective Neuroscience.</p>
<p>Bodurka, J., Ye, F., Petridou, N., Bandettini, P.A., 2005. Determination of the brain tissue-specific temporal signal to noise limit of 3 T BOLD-weighted time course data., Proc. Intl. Soc. Mag. reson. Med., Miami.</p>
<p>Bosnell, R., Wegner, C., Kincses, Z.T., Korteweg, T., Agosta, F., Ciccarelli, O., De Stefano, N., Gass, A., Hirsch, J., Johansen-Berg, H., Kappos, L., Barkhof, F., Mancini, L., Manfredonia, F., Marino, S., Miller, D.H., Montalban, X., Palace, J., Rocca, M., Enzinger, C., Ropele, S., Rovira, A., Smith, S., Thompson, A., Thornton, J., Yousry, T., Whitcher, B., Filippi, M., Matthews, P.M., 2008. Reproducibility of fMRI in the clinical setting: implications for trial designs. Neuroimage 42, 603-610.</p>
<p>Caceres, A., Hall, D.L., Zelaya, F.O., Williams, S.C., Mehta, M.A., 2009. Measuring fMRI reliability with the intra-class correlation coefficient. Neuroimage 45, 758-768.</p>
<p>Carrier, J., Monk, T.H., 2000. Circadian rhythms of performance: new trends. Chronobiol Int 17, 719-732.</p>
<p>Casey, B.J., Cohen, J.D., O&#8217;Craven, K., Davidson, R.J., Irwin, W., Nelson, C.A., Noll, D.C., Hu, X., Lowe, M.J., Rosen, B.R., Truwitt, C.L., Turski, P.A., 1998. Reproducibility of fMRI results across four institutions using a spatial working memory task. Neuroimage 8, 249-261.</p>
<p>Chen, E.E., Small, S.L., 2007. Test-retest reliability in fMRI of language: group and task effects. Brain Lang 102, 176-185.</p>
<p>Chiarelli, P.A., Bulte, D.P., Wise, R., Gallichan, D., Jezzard, P., 2007. A calibration method for quantitative BOLD fMRI based on hyperoxia. Neuroimage 37, 808-820.</p>
<p>Cicchetti, D., Sparrow, S., 1981. Developing criteria for establishing interrater reliability of specific items: Applications to assessment of adaptive behavior. Am J Ment Defic 86, 127-137.</p>
<p>Clement, F., Belleville, S., 2009. Test-retest reliability of fMRI verbal episodic memory paradigms in healthy older adults and in persons with mild cognitive impairment. Hum Brain Mapp.</p>
<p>Cohen, J., 1977. Statistical power analysis for the behavioral sciences., (revised edition) ed. Academic Press, New York, NY.</p>
<p>Cohen, M.S., DuBois, R.M., 1999. Stability, repeatability, and the expression of signal magnitude in functional magnetic resonance imaging. J Magn Reson Imaging 10, 33-40.</p>
<p>Costafreda, S.G., in press. Pooling fMRI data: meta-analysis, mega-analysis and multi-center studies. . Frontiers in Neuroinformatics.</p>
<p>Costafreda, S.G., Brammer, M.J., Vencio, R.Z., Mourao, M.L., Portela, L.A., de Castro, C.C., Giampietro, V.P., Amaro, E., Jr., 2007. Multisite fMRI reproducibility of a motor task using identical MR systems. J Magn Reson Imaging 26, 1122-1126.</p>
<p>Dale, A., 1999. Optimal Experimental Design for Event-Related fMRI. Human Brain Mapping 8, 109-114.</p>
<p>Di Bonaventura, C., Vaudano, A.E., Carni, M., Pantano, P., Nucciarelli, V., Garreffa, G., Maraviglia, B., Prencipe, M., Bozzao, L., Manfredi, M., Giallonardo, A.T., 2005. Long-term reproducibility of fMRI activation in epilepsy patients with Fixation Off Sensitivity. Epilepsia 46, 1149-1151.</p>
<p>Duncan, K.J., Pattamadilok, C., Knierim, I., Devlin, J.T., 2009. Consistency and variability in functional localisers. Neuroimage 46, 1018-1026.</p>
<p>Eaton, K.P., Szaflarski, J.P., Altaye, M., Ball, A.L., Kissela, B.M., Banks, C., Holland, S.K., 2008. Reliability of fMRI for studies of language in post-stroke aphasia subjects. Neuroimage 41, 311-322.</p>
<p>Eickhoff, S.B., Laird, A.R., Grefkes, C., Wang, L.E., Zilles, K., Fox, P.T., 2009. Coordinate-based activation likelihood estimation meta-analysis of neuroimaging data: a random-effects approach based on empirical estimates of spatial uncertainty. Hum Brain Mapp 30, 2907-2926.</p>
<p>Feredoes, E., Postle, B.R., 2007. Localization of load sensitivity of working memory storage: quantitatively and qualitatively discrepant results yielded by single-subject and group-averaged approaches to fMRI group analysis. Neuroimage 35, 881-903.</p>
<p>Fernandez, G., Specht, K., Weis, S., Tendolkar, I., Reuber, M., Fell, J., Klaver, P., Ruhlmann, J., Reul, J., Elger, C.E., 2003. Intrasubject reproducibility of presurgical language lateralization and mapping using fMRI. Neurology 60, 969-975.</p>
<p>Freedman, R., 2003. Schizophrenia. N Engl J Med 349, 1738-1749.</p>
<p>Freyer, T., Valerius, G., Kuelz, A.K., Speck, O., Glauche, V., Hull, M., Voderholzer, U., 2009. Test-retest reliability of event-related functional MRI in a probabilistic reversal learning task. Psychiatry Res.</p>
<p>Friedman, L., Glover, G.H., 2006. Reducing interscanner variability of activation in a multicenter fMRI study: controlling for signal-to-fluctuation-noise-ratio (SFNR) differences. Neuroimage 33, 471-481.</p>
<p>Friedman, L., Stern, H., Brown, G.G., Mathalon, D.H., Turner, J., Glover, G.H., Gollub, R.L., Lauriello, J., Lim, K.O., Cannon, T., Greve, D.N., Bockholt, H.J., Belger, A., Mueller, B., Doty, M.J., He, J., Wells, W., Smyth, P., Pieper, S., Kim, S., Kubicki, M., Vangel, M., Potkin, S.G., 2008. Test-retest and between-site reliability in a multicenter fMRI study. Hum Brain Mapp 29, 958-972.</p>
<p>Gold, S., Christian, B., Arndt, S., Zeien, G., Cizadlo, T., Johnson, D.L., Flaum, M., Andreasen, N.C., 1998. Functional MRI statistical software packages: a comparative analysis. Hum Brain Mapp 6, 73-84.</p>
<p>Gountouna, V.E., Job, D.E., McIntosh, A.M., Moorhead, T.W., Lymer, G.K., Whalley, H.C., Hall, J., Waiter, G.D., Brennan, D., McGonigle, D.J., Ahearn, T.S., Cavanagh, J., Condon, B., Hadley, D.M., Marshall, I., Murray, A.D., Steele, J.D., Wardlaw, J.M., Lawrie, S.M., 2009. Functional Magnetic Resonance Imaging (fMRI) reproducibility and variance components across visits and scanning sites with a finger tapping task. Neuroimage.</p>
<p>Grafton, S., Hazeltine, E., Ivry, R., 1995. Functional mapping of sequence learning in normal humans. Journal of Cognitive Neuroscience 7, 497-510.</p>
<p>Harrington, G.S., Buonocore, M.H., Farias, S.T., 2006a. Intrasubject reproducibility of functional MR imaging activation in language tasks. AJNR Am J Neuroradiol 27, 938-944.</p>
<p>Harrington, G.S., Tomaszewski Farias, S., Buonocore, M.H., Yonelinas, A.P., 2006b. The intersubject and intrasubject reproducibility of FMRI activation during three encoding tasks: implications for clinical applications. Neuroradiology 48, 495-505.</p>
<p>Havel, P., Braun, B., Rau, S., Tonn, J.C., Fesl, G., Bruckmann, H., Ilmberger, J., 2006. Reproducibility of activation in four motor paradigms. An fMRI study. J Neurol 253, 471-476.</p>
<p>Hoenig, K., Kuhl, C.K., Scheef, L., 2005. Functional 3.0-T MR assessment of higher cognitive function: are there advantages over 1.5-T imaging? Radiology 234, 860-868.</p>
<p>Huang, J., Katsuura, T., Shimomura, Y., Iwanaga, K., 2006. Diurnal changes of ERP response to sound stimuli of varying frequency in morning-type and evening-type subjects. J Physiol Anthropol 25, 49-54.</p>
<p>Huettel, S.A., Song, A.W., McCarthy, G., 2008. Functional Magnetic Resonance Imaging, 2nd ed. Sinauer Associates, Sunderland, MA.</p>
<p>Jabbi, M., Keysers, C., Singer, T., Stephan, K.E., 2009. Response to &#8220;Voodoo Correlations in Social Neuroscience&#8221; by Vul et al.</p>
<p>Jansen, A., Menke, R., Sommer, J., Forster, A.F., Bruchmann, S., Hempleman, J., Weber, B., Knecht, S., 2006. The assessment of hemispheric lateralization in functional MRI&#8211;robustness and reproducibility. Neuroimage 33, 204-217.</p>
<p>Jezzard, P., Clare, S., 1999. Sources of distortion in functional MRI data. Hum Brain Mapp 8, 80-85.</p>
<p>Johnstone, T., Somerville, L.H., Alexander, A.L., Oakes, T.R., Davidson, R.J., Kalin, N.H., Whalen, P.J., 2005. Stability of amygdala BOLD response to fearful faces over multiple scan sessions. Neuroimage 25, 1112-1123.</p>
<p>Jovicich, J., Czanner, S., Greve, D., Haley, E., van der Kouwe, A., Gollub, R., Kennedy, D., Schmitt, F., Brown, G., Macfall, J., Fischl, B., Dale, A., 2006. Reliability in multi-site structural MRI studies: effects of gradient non-linearity correction on phantom and human data. Neuroimage 30, 436-443.</p>
<p>Kiebel, S., Holmes, A., 2007. The general linear model. In: Friston, K., Ashburner, J., Kiebel, S., Nichols, T., Penny, W. (Eds.), Statistical Parametric Mapping: The Analysis of Functional Brain Images. Academic Press, London.</p>
<p>Kiehl, K.A., Liddle, P.F., 2003. Reproducibility of the hemodynamic response to auditory oddball stimuli: a six-week test-retest study. Hum Brain Mapp 18, 42-52.</p>
<p>Kimberley, T.J., Khandekar, G., Borich, M., 2008. fMRI reliability in subjects with stroke. Exp Brain Res 186, 183-190.</p>
<p>Kong, J., Gollub, R.L., Webb, J.M., Kong, J.T., Vangel, M.G., Kwong, K., 2007. Test-retest study of fMRI signal change evoked by electroacupuncture stimulation. Neuroimage 34, 1171-1181.</p>
<p>Kruger, G., Glover, G.H., 2001. Physiological noise in oxygenation-sensitive magnetic resonance imaging. Magn Reson Med 46, 631-637.</p>
<p>Leontiev, O., Buxton, R.B., 2007. Reproducibility of BOLD, perfusion, and CMRO2 measurements with calibrated-BOLD fMRI. Neuroimage 35, 175-184.</p>
<p>Lieberman, M.D., Berkman, E.T., Wager, T.D., 2009. Correlations in social neuroscience aren&#8217;t voodoo: Commentary on Vul et al. (2009). Perspectives on Psychological Science 4.</p>
<p>Liou, M., Su, H.R., Savostyanov, A.N., Lee, J.D., Aston, J.A., Chuang, C.H., Cheng, P.E., 2009. Beyond p-values: averaged and reproducible evidence in fMRI experiments. Psychophysiology 46, 367-378.</p>
<p>Liu, J.Z., Zhang, L., Brown, R.W., Yue, G.H., 2004. Reproducibility of fMRI at 1.5 T in a strictly controlled motor task. Magn Reson Med 52, 751-760.</p>
<p>Loubinoux, I., Carel, C., Alary, F., Boulanouar, K., Viallard, G., Manelfe, C., Rascol, O., Celsis, P., Chollet, F., 2001. Within-session and between-session reproducibility of cerebral sensorimotor activation: a test&#8211;retest effect evidenced with functional magnetic resonance imaging. J Cereb Blood Flow Metab 21, 592-607.</p>
<p>MacDonald, S.W., Nyberg, L., Backman, L., 2006. Intra-individual variability in behavior: links to brain structure, neurotransmission and neuronal activity. Trends Neurosci 29, 474-480.</p>
<p>Machielsen, W.C., Rombouts, S.A., Barkhof, F., Scheltens, P., Witter, M.P., 2000. FMRI of visual encoding: reproducibility of activation. Hum Brain Mapp 9, 156-164.</p>
<p>Mackenzie-Graham, A.J., Van Horn, J.D., Woods, R.P., Crawford, K.L., Toga, A.W., 2008. Provenance in neuroimaging. Neuroimage 42, 178-195.</p>
<p>Magon, S., Basso, G., Farace, P., Ricciardi, G.K., Beltramello, A., Sbarbati, A., 2009. Reproducibility of BOLD signal change induced by breath holding. Neuroimage 45, 702-712.</p>
<p>Maitra, R., 2009. Assessing certainty of activation or inactivation in test-retest fMRI studies. Neuroimage 47, 88-97.</p>
<p>Maitra, R., Roys, S.R., Gullapalli, R.P., 2002. Test-retest reliability estimation of functional MRI data. Magn Reson Med 48, 62-70.</p>
<p>Maldjian, J.A., Laurienti, P.J., Driskill, L., Burdette, J.H., 2002. Multiple reproducibility indices for evaluation of cognitive functional MR imaging paradigms. AJNR Am J Neuroradiol 23, 1030-1037.</p>
<p>Manoach, D.S., Halpern, E.F., Kramer, T.S., Chang, Y., Goff, D.C., Rauch, S.L., Kennedy, D.N., Gollub, R.L., 2001. Test-retest reliability of a functional MRI working memory paradigm in normal and schizophrenic subjects. Am J Psychiatry 158, 955-958.</p>
<p>Marshall, I., Simonotto, E., Deary, I.J., Maclullich, A., Ebmeier, K.P., Rose, E.J., Wardlaw, J.M., Goddard, N., Chappell, F.M., 2004. Repeatability of motor and working-memory tasks in healthy older volunteers: assessment at functional MR imaging. Radiology 233, 868-877.</p>
<p>Mayer, A.R., Xu, J., Pare-Blagoev, J., Posse, S., 2006. Reproducibility of activation in Broca&#8217;s area during covert generation of single words at high field: a single trial FMRI study at 4 T. Neuroimage 32, 129-137.</p>
<p>McGonigle, D.J., Howseman, A.M., Athwal, B.S., Friston, K.J., Frackowiak, R.S., Holmes, A.P., 2000. Variability in fMRI: an examination of intersession differences. Neuroimage 11, 708-734.</p>
<p>Meindl, T., Teipel, S., Elmouden, R., Mueller, S., Koch, W., Dietrich, O., Coates, U., Reiser, M., Glaser, C., 2009. Test-retest reproducibility of the default-mode network in healthy individuals. Hum Brain Mapp.</p>
<p>Miki, A., Liu, G.T., Englander, S.A., Raz, J., van Erp, T.G., Modestino, E.J., Liu, C.J., Haselgrove, J.C., 2001. Reproducibility of visual activation during checkerboard stimulation in functional magnetic resonance imaging at 4 Tesla. Jpn J Ophthalmol 45, 151-155.</p>
<p>Miki, A., Raz, J., van Erp, T.G., Liu, C.S., Haselgrove, J.C., Liu, G.T., 2000. Reproducibility of visual activation in functional MR imaging and effects of postprocessing. AJNR Am J Neuroradiol 21, 910-915.</p>
<p>Mikl, M., Marecek, R., Hlustik, P., Pavlicova, M., Drastich, A., Chlebus, P., Brazdil, M., Krupa, P., 2008. Effects of spatial smoothing on fMRI group inferences. Magn Reson Imaging 26, 490-503.</p>
<p>Miller, M.B., Donovan, C.L., Van Horn, J.D., German, E., Sokol-Hessner, P., Wolford, G.L., 2009. Unique and persistent individual patterns of brain activity across different memory retrieval tasks. Neuroimage 48, 625-635.</p>
<p>Miller, M.B., Handy, T.C., Cutler, J., Inati, S., Wolford, G.L., 2001. Brain activations associated with shifts in response criterion on a recognition test. Can J Exp Psychol 55, 162-173.</p>
<p>Miller, M.B., Van Horn, J.D., Wolford, G.L., Handy, T.C., Valsangkar-Smyth, M., Inati, S., Grafton, S., Gazzaniga, M.S., 2002. Extensive individual differences in brain activations associated with episodic retrieval are reliable over time. J Cogn Neurosci 14, 1200-1214.</p>
<p>Morgan, V.L., Dawant, B.M., Li, Y., Pickens, D.R., 2007. Comparison of fMRI statistical software packages and strategies for analysis of images containing random and stimulus-correlated motion. Comput Med Imaging Graph 31, 436-446.</p>
<p>Morrison, P.D., Murray, R.M., 2005. Schizophrenia. Curr Biol 15, R980-984.</p>
<p>Moser, E., Teichtmeister, C., Diemling, M., 1996. Reproducibility and postprocessing of gradient-echo functional MRI to improve localization of brain activity in the human visual cortex. Magn Reson Imaging 14, 567-579.</p>
<p>Muller, R., Buttner, P., 1994. A critical discussion of intraclass correlation coefficients. Stat Med 13, 2465-2476.</p>
<p>Mumford, J.A., Nichols, T., 2009. Simple group fMRI modeling and inference. Neuroimage 47, 1469-1475.</p>
<p>Mumford, J.A., Nichols, T.E., 2008. Power calculation for group fMRI studies accounting for arbitrary design and temporal autocorrelation. Neuroimage 39, 261-268.</p>
<p>Mumford, J.A., Poldrack, R.A., Nichols, T., 2007. FMRIpower: A Power Calculation Tool for 2-Stage fMRI models. Human Brain Mapping, Chicago, IL.</p>
<p>Munneke, J., Heslenfeld, D.J., Theeuwes, J., 2008. Directing attention to a location in space results in retinotopic activation in primary visual cortex. Brain Res 1222, 184-191.</p>
<p>Murphy, K., Bodurka, J., Bandettini, P.A., 2007. How long to scan? The relationship between fMRI temporal signal to noise ratio and necessary scan duration. Neuroimage 34, 565-574.</p>
<p>Neumann, J., Lohmann, G., Zysset, S., von Cramon, D.Y., 2003. Within-subject variability of BOLD response dynamics. Neuroimage 19, 784-796.</p>
<p>Nichols, T., Brett, M., Andersson, J., Wager, T., Poline, J.B., 2005. Valid conjunction inference with the minimum statistic. Neuroimage 25, 653-660.</p>
<p>Nunnally, J., 1970. Introduction to psychological measurement. McGraw Hill, New York.</p>
<p>Oakes, T.R., Johnstone, T., Ores Walsh, K.S., Greischar, L.L., Alexander, A.L., Fox, A.S., Davidson, R.J., 2005. Comparison of fMRI motion correction software tools. Neuroimage 28, 529-543.</p>
<p>Ogawa, S., Menon, R.S., Tank, D.W., Kim, S.G., Merkle, H., Ellermann, J.M., Ugurbil, K., 1993. Functional brain mapping by blood oxygenation level-dependent contrast magnetic resonance imaging. A comparison of signal characteristics with a biophysical model. Biophys J 64, 803-812.</p>
<p>Peelen, M.V., Downing, P.E., 2005. Within-subject reproducibility of category-specific visual activation with functional MRI. Hum Brain Mapp 25, 402-408.</p>
<p>Peyron, R., Garcia-Larrea, L., Gregoire, M.C., Costes, N., Convers, P., Lavenne, F., Mauguiere, F., Michel, D., Laurent, B., 1999. Haemodynamic brain responses to acute pain in humans: sensory and attentional networks. Brain 122 ( Pt 9), 1765-1780.</p>
<p>Phan, K.L., Liberzon, I., Welsh, R.C., Britton, J.C., Taylor, S.F., 2003. Habituation of rostral anterior cingulate cortex to repeated emotionally salient pictures. Neuropsychopharmacology 28, 1344-1350.</p>
<p>Poldrack, R.A., Prabhakaran, V., Seger, C.A., Gabrieli, J.D., 1999. Striatal activation during acquisition of a cognitive skill. Neuropsychology 13, 564-574.</p>
<p>Poline, J.B., Strother, S.C., Dehaene-Lambertz, G., Egan, G.F., Lancaster, J.L., 2006. Motivation and synthesis of the FIAC experiment: Reproducibility of fMRI results across expert analyses. Hum Brain Mapp 27, 351-359.</p>
<p>Raemaekers, M., Vink, M., Zandbelt, B., van Wezel, R.J., Kahn, R.S., Ramsey, N.F., 2007. Test-retest reliability of fMRI activation during prosaccades and antisaccades. Neuroimage 36, 532-542.</p>
<p>Ramsey, N., Tallent, K., van Gelderen, P., Frank, J., Moonen, C., Weinberger, D., 1996. Reproducibility of Human 3D fMRI Brain Maps Acquired During a Motor Task. Human Brain Mapping 4, 113-121.</p>
<p>Rau, S., Fesl, G., Bruhns, P., Havel, P., Braun, B., Tonn, J.C., Ilmberger, J., 2007. Reproducibility of activations in Broca area with two language tasks: a functional MR imaging study. AJNR Am J Neuroradiol 28, 1346-1353.</p>
<p>Rombouts, S.A., Barkhof, F., Hoogenraad, F.G., Sprenger, M., Scheltens, P., 1998. Within-subject reproducibility of visual activation patterns with functional magnetic resonance imaging using multislice echo planar imaging. Magn Reson Imaging 16, 105-113.</p>
<p>Rombouts, S.A., Barkhof, F., Hoogenraad, F.G., Sprenger, M., Valk, J., Scheltens, P., 1997. Test-retest analysis with functional MR of the activated area in the human visual cortex. AJNR Am J Neuroradiol 18, 1317-1322.</p>
<p>Rostami, M., Hosseini, S.M., Takahashi, M., Sugiura, M., Kawashima, R., 2009. Neural bases of goal-directed implicit learning. Neuroimage 48, 303-310.</p>
<p>Rutten, G.J., Ramsey, N.F., van Rijen, P.C., van Veelen, C.W., 2002. Reproducibility of fMRI-determined language lateralization in individual subjects. Brain Lang 80, 421-437.</p>
<p>Safrit, M., 1976. Reliability theory. American Alliance for Health, Physical Education, and Recreation, Washington, DC.</p>
<p>Salli, E., Korvenoja, A., Visa, A., Katila, T., Aronen, H.J., 2001. Reproducibility of fMRI: effect of the use of contextual information. Neuroimage 13, 459-471.</p>
<p>Salthouse, T.A., Nesselroade, J.R., Berish, D.E., 2006. Short-term variability in cognitive performance and the calibration of longitudinal change. J Gerontol B Psychol Sci Soc Sci 61, P144-151.</p>
<p>Schunck, T., Erb, G., Mathis, A., Jacob, N., Gilles, C., Namer, I.J., Meier, D., Luthringer, R., 2008. Test-retest reliability of a functional MRI anticipatory anxiety paradigm in healthy volunteers. J Magn Reson Imaging 27, 459-468.</p>
<p>Shehzad, Z., Kelly, A.M., Reiss, P.T., Gee, D.G., Gotimer, K., Uddin, L.Q., Lee, S.H., Margulies, D.S., Roy, A.K., Biswal, B.B., Petkova, E., Castellanos, F.X., Milham, M.P., 2009. The resting brain: unconstrained yet reliable. Cereb Cortex 19, 2209-2229.</p>
<p>Shrout, P., Fleiss, J., 1979. Intraclass Correlations: Uses in Assessing Rater Reliability. Psychological Bulletin 86, 420-428.</p>
<p>Simmons, W.K., Reddish, M., Bellgowan, P.S., Martin, A., 2009. The Selectivity and Functional Connectivity of the Anterior Temporal Lobes. Cereb Cortex.</p>
<p>Smith, A.T., Singh, K.D., Balsters, J.H., 2007. A comment on the severity of the effects of non-white noise in fMRI time-series. Neuroimage 36, 282-288.</p>
<p>Smith, S.M., Beckmann, C.F., Ramnani, N., Woolrich, M.W., Bannister, P.R., Jenkinson, M., Matthews, P.M., McGonigle, D.J., 2005. Variability in fMRI: a re-examination of inter-session differences. Hum Brain Mapp 24, 248-257.</p>
<p>Specht, K., Willmes, K., Shah, N.J., Jancke, L., 2003. Assessment of reliability in functional imaging studies. J Magn Reson Imaging 17, 463-471.</p>
<p>Stark, R., Schienle, A., Walter, B., Kirsch, P., Blecker, C., Ott, U., Schafer, A., Sammer, G., Zimmermann, M., Vaitl, D., 2004. Hemodynamic effects of negative emotional pictures &#8211; a test-retest analysis. Neuropsychobiology 50, 108-118.</p>
<p>Sterr, A., Shen, S., Zaman, A., Roberts, N., Szameitat, A., 2007. Activation of SI is modulated by attention: a random effects fMRI study using mechanical stimuli. Neuroreport 18, 607-611.</p>
<p>Strother, S., La Conte, S., Kai Hansen, L., Anderson, J., Zhang, J., Pulapura, S., Rottenberg, D., 2004. Optimizing the fMRI data-processing pipeline using prediction and reproducibility performance metrics: I. A preliminary group analysis. Neuroimage 23 Suppl 1, S196-207.</p>
<p>Strother, S.C., Anderson, J., Hansen, L.K., Kjems, U., Kustra, R., Sidtis, J., Frutiger, S., Muley, S., LaConte, S., Rottenberg, D., 2002. The quantitative evaluation of functional neuroimaging experiments: the NPAIRS data analysis framework. Neuroimage 15, 747-771.</p>
<p>Swallow, K.M., Braver, T.S., Snyder, A.Z., Speer, N.K., Zacks, J.M., 2003. Reliability of functional localization using fMRI. Neuroimage 20, 1561-1577.</p>
<p>Symms, M.R., Allen, P.J., Woermann, F.G., Polizzi, G., Krakow, K., Barker, G.J., Fish, D.R., Duncan, J.S., 1999. Reproducible localization of interictal epileptiform discharges using EEG-triggered fMRI. Phys Med Biol 44, N161-168.</p>
<p>Tegeler, C., Strother, S.C., Anderson, J.R., Kim, S.G., 1999. Reproducibility of BOLD-based functional MRI obtained at 4 T. Hum Brain Mapp 7, 267-283.</p>
<p>Thomason, M.E., Foland, L.C., Glover, G.H., 2007. Calibration of BOLD fMRI using breath holding reduces group variance during a cognitive task. Hum Brain Mapp 28, 59-68.</p>
<p>Triantafyllou, C., Hoge, R.D., Krueger, G., Wiggins, C.J., Potthast, A., Wiggins, G.C., Wald, L.L., 2005. Comparison of physiological noise at 1.5 T, 3 T and 7 T and optimization of fMRI acquisition parameters. Neuroimage 26, 243-250.</p>
<p>Turkeltaub, P.E., Guinevere, F.E., Jones, K.M., Zeffiro, T.A., 2002. Meta-Analysis of the Functional Neuroanatomy of Single-Word Reading: Method and Validation. Neuroimage 16, 765-780.</p>
<p>Turner, R., Jezzard, P., Wen, H., Kwong, K.K., Le Bihan, D., Zeffiro, T., Balaban, R.S., 1993. Functional mapping of the human visual cortex at 4 and 1.5 tesla using deoxygenation contrast EPI. Magn Reson Med 29, 277-279.</p>
<p>Van Horn, J.D., Ellmore, T.M., Esposito, G., Berman, K.F., 1998. Mapping voxel-based statistical power on parametric images. Neuroimage 7, 97-107.</p>
<p>Van Horn, J.D., Toga, A.W., 2009. Multisite neuroimaging trials. Curr Opin Neurol 22, 370-378.</p>
<p>Vul, E., Harris, C., Winkielman, P., Pashler, H., 2009. Puzzlingly high correlations in fMRI studies of emotion, personality, and social cognition. Perspectives on Psychological Science 4.</p>
<p>Wager, T.D., Nichols, T., 2003. Optimization of experimental design in fMRI: a general framework using a genetic algorithm. Neuroimage 18, 293-309.</p>
<p>Wagner, K., Frings, L., Quiske, A., Unterrainer, J., Schwarzwald, R., Spreer, J., Halsband, U., Schulze-Bonhage, A., 2005. The reliability of fMRI activations in the medial temporal lobes in a verbal episodic memory task. Neuroimage 28, 122-131.</p>
<p>Waites, A.B., Shaw, M.E., Briellmann, R.S., Labate, A., Abbott, D.F., Jackson, G.D., 2005. How reliable are fMRI-EEG studies of epilepsy? A nonparametric approach to analysis validation and optimization. Neuroimage 24, 192-199.</p>
<p>Waldvogel, D., van Gelderen, P., Immisch, I., Pfeiffer, C., Hallett, M., 2000. The variability of serial fMRI data: correlation between a visual and a motor task. Neuroreport 11, 3843-3847.</p>
<p>Wei, X., Yoo, S.S., Dickey, C.C., Zou, K.H., Guttmann, C.R., Panych, L.P., 2004. Functional MRI of auditory verbal working memory: long-term reproducibility analysis. Neuroimage 21, 1000-1008.</p>
<p>Whalley, H.C., Gountouna, V.E., Hall, J., McIntosh, A.M., Simonotto, E., Job, D.E., Owens, D.G., Johnstone, E.C., Lawrie, S.M., 2009. fMRI changes over time and reproducibility in unmedicated subjects at high genetic risk of schizophrenia. Psychol Med 39, 1189-1199.</p>
<p>White, T., O&#8217;Leary, D., Magnotta, V., Arndt, S., Flaum, M., Andreasen, N.C., 2001. Anatomic and functional variability: the effects of filter size in group fMRI data analysis. Neuroimage 13, 577-588.</p>
<p>Woolrich, M.W., Ripley, B.D., Brady, M., Smith, S.M., 2001. Temporal autocorrelation in univariate linear modeling of FMRI data. Neuroimage 14, 1370-1386.</p>
<p>Yetkin, F.Z., McAuliffe, T.L., Cox, R., Haughton, V.M., 1996. Test-retest precision of functional MR in sensory and motor task activation. AJNR Am J Neuroradiol 17, 95-98.</p>
<p>Yoo, S.S., O&#8217;Leary, H.M., Lee, J.H., Chen, N.K., Panych, L.P., Jolesz, F.A., 2007. Reproducibility of trial-based functional MRI on motor imagery. Int J Neurosci 117, 215-227.</p>
<p>Yoo, S.S., Wei, X., Dickey, C.C., Guttmann, C.R., Panych, L.P., 2005. Long-term reproducibility analysis of fMRI using hand motor task. Int J Neurosci 115, 55-77.</p>
<p>Zandbelt, B.B., Gladwin, T.E., Raemaekers, M., van Buuren, M., Neggers, S.F., Kahn, R.S., Ramsey, N.F., Vink, M., 2008. Within-subject variation in BOLD-fMRI signal changes across repeated measurements: quantification and implications for sample size. Neuroimage 42, 196-206.</p>
<p>Zhang, J., Anderson, J.R., Liang, L., Pulapura, S.K., Gatewood, L., Rottenberg, D.A., Strother, S.C., 2009. Evaluation and optimization of fMRI single-subject processing pipelines with NPAIRS and second-level CVA. Magn Reson Imaging 27, 264-278.</p>
<p>Zhang, J., Liang, L., Anderson, J.R., Gatewood, L., Rottenberg, D.A., Strother, S.C., 2008. A Java-based fMRI processing pipeline evaluation system for assessment of univariate general linear model and multivariate canonical variate analysis-based pipelines. Neuroinformatics 6, 123-134.</p>
<p>Zhilkin, P., Alexander, M.E., 2004. Affine registration: a comparison of several programs. Magn Reson Imaging 22, 55-66.</p>
<p>Zou, K.H., Greve, D.N., Wang, M., Pieper, S.D., Warfield, S.K., White, N.S., Manandhar, S., Brown, G.G., Vangel, M.G., Kikinis, R., Wells, W.M., 3rd, 2005. Reproducibility of functional MR imaging: preliminary results of prospective multi-institutional study performed by Biomedical Informatics Research Network. Radiology 237, 781-789.</p>
]]></content:encoded>
			<wfw:commentRss>http://prefrontal.org/blog/2010/02/paper-how-reliable-are-the-results-from-functional-magnetic-resonance-imaging/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>LOVE Conference Wrapup</title>
		<link>http://prefrontal.org/blog/2010/02/love-conference-wrapup/</link>
		<comments>http://prefrontal.org/blog/2010/02/love-conference-wrapup/#comments</comments>
		<pubDate>Sat, 13 Feb 2010 10:39:42 +0000</pubDate>
		<dc:creator>admin</dc:creator>
				<category><![CDATA[CogNeuro]]></category>
		<category><![CDATA[Miscellany]]></category>

		<guid isPermaLink="false">http://prefrontal.org/blog/?p=991</guid>
		<description><![CDATA[The Lake Ontario Visionary Establishment (LOVE) conference just wrapped up and, I have to say, it was a genuinely fantastic experience.  I gave a lighthearted presentation on Type I error and reliability in functional imaging, which hopefully made the message a bit easier to swallow.  I also got the chance to catch up [...]]]></description>
			<content:encoded><![CDATA[<p><img src="http://prefrontal.org/blog/wp-content/uploads/2010/02/Logo-Love.png" alt="Logo-Love" title="Logo-Love" width="200" height="146" align="right">The <a href="http://brain.mcmaster.ca/love/">Lake Ontario Visionary Establishment</a> (LOVE) conference just wrapped up and, I have to say, it was a genuinely fantastic experience.  I gave a lighthearted presentation on Type I error and reliability in functional imaging, which hopefully made the message a bit easier to swallow.  I also got the chance to catch up with longtime friends while making some new aquaintances.  Thanks to the organizers <a href="http://psychology.uwo.ca/faculty/ansari_res.htm">Daniel Ansari</a> and <a href="http://www.psychology.uwaterloo.ca/people/faculty/jafugels/">Jonathan Fugelsang</a> for having me up to present.</p>
<p>For all those who are interested: you can download a copy of my presentation slides <a href="http://prefrontal.org/files/presentations/Bennett-LOVE-2010.pdf">here</a>.<br />
Send me an email if you have any questions or comments.  Thanks!</p>
]]></content:encoded>
			<wfw:commentRss>http://prefrontal.org/blog/2010/02/love-conference-wrapup/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Spring/Summer 2010 Conference Schedule</title>
		<link>http://prefrontal.org/blog/2010/01/springsummer-2010-conference-schedule/</link>
		<comments>http://prefrontal.org/blog/2010/01/springsummer-2010-conference-schedule/#comments</comments>
		<pubDate>Wed, 13 Jan 2010 00:38:47 +0000</pubDate>
		<dc:creator>prefrontal</dc:creator>
				<category><![CDATA[CogNeuro]]></category>
		<category><![CDATA[Meta]]></category>
		<category><![CDATA[Miscellany]]></category>

		<guid isPermaLink="false">http://prefrontal.org/blog/?p=890</guid>
		<description><![CDATA[It is going to be a busy conference season again this spring. I will be at the following professional gatherings over the next few months – send me an email if you will be attending as well and would like to meet up. I’ll buy the first round and we can talk shop.
Lake Ontario Visionary [...]]]></description>
			<content:encoded><![CDATA[<p>It is going to be a busy conference season again this spring. I will be at the following professional gatherings over the next few months – send me an email if you will be attending as well and would like to meet up. I’ll buy the first round and we can talk shop.</p>
<p><a href="http://brain.mcmaster.ca/love/">Lake Ontario Visionary Establishment Conference</a> [LOVE]<br />
February 11-12, Niagra Falls, <del datetime="2010-01-18T06:09:27+00:00">NY</del> Ontario, Canada</p>
<p><a href="http://www.cogneurosociety.org/">Cognitive Neuroscience Society Conference</a> [CNS]<br />
April 17-20, Montreal, Canada</p>
<p><a href="http://www.psychologicalscience.org">Association for Psychological Science Convention</a> [APS]<br />
May 27-30, Boston, MA</p>
<p><a href="http://www.humanbrainmapping.org">Organization for Human Brain Mapping Conference</a> [HBM]<br />
June 6-10, Barcelona, Spain</p>
<p><a href="http://www.ahfe2010.org/">Applied Human Factors and Ergonomics Conference</a> [AHFE]<br />
July 17-20, Miami, FL</p>
<p>Here is some of what I will be presenting:</p>
<p>[LOVE]<br />
• Special topic talk: &#8216;Neural correlates of interspecies perspective taking in the post-mortem Atlantic Salmon: addressing the multiple comparisons problem in fMRI.&#8217;</p>
<p>[APS]<br />
• Invited talk: &#8216;The development of interoceptive information processing across adolescence.&#8217;</p>
<p>[CNS] [HBM]<br />
• Poster: &#8216;How reliable are the results from fMRI?&#8217;<br />
Bennett CM, Guerin SA, Donovan CL, Miller MB</p>
<p>[HBM]<br />
• Poster: &#8216;A device for simultaneous thermal and tactile stimulation in an MR environment.&#8217;<br />
Bennett CM</p>
]]></content:encoded>
			<wfw:commentRss>http://prefrontal.org/blog/2010/01/springsummer-2010-conference-schedule/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Quote of the Week – Pashler</title>
		<link>http://prefrontal.org/blog/2010/01/quote-of-the-week-pashler/</link>
		<comments>http://prefrontal.org/blog/2010/01/quote-of-the-week-pashler/#comments</comments>
		<pubDate>Wed, 06 Jan 2010 02:38:48 +0000</pubDate>
		<dc:creator>prefrontal</dc:creator>
				<category><![CDATA[CogNeuro]]></category>
		<category><![CDATA[MRI]]></category>
		<category><![CDATA[Quotes]]></category>

		<guid isPermaLink="false">http://prefrontal.org/blog/?p=960</guid>
		<description><![CDATA[“It’s hellishly complicated, this data analysis, and that creates great opportunity for inadvertent mischief.” &#8211; Hal Pashler (As seen in Science News)
]]></description>
			<content:encoded><![CDATA[<p>“It’s hellishly complicated, this data analysis, and that creates great opportunity for inadvertent mischief.” &#8211; <a href="http://www.pashler.com/">Hal Pashler</a> (As seen in <a href="http://www.sciencenews.org/view/feature/id/50295/title/Trawling_the_brain">Science News</a>)</p>
]]></content:encoded>
			<wfw:commentRss>http://prefrontal.org/blog/2010/01/quote-of-the-week-pashler/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>PAPER: The Principled Control of False Positives in Neuroimaging</title>
		<link>http://prefrontal.org/blog/2009/12/paper-the-principled-control-of-false-positives-in-neuroimaging/</link>
		<comments>http://prefrontal.org/blog/2009/12/paper-the-principled-control-of-false-positives-in-neuroimaging/#comments</comments>
		<pubDate>Thu, 17 Dec 2009 13:30:19 +0000</pubDate>
		<dc:creator>prefrontal</dc:creator>
				<category><![CDATA[CogNeuro]]></category>
		<category><![CDATA[MRI]]></category>
		<category><![CDATA[Psychology]]></category>
		<category><![CDATA[Statistics]]></category>

		<guid isPermaLink="false">http://prefrontal.org/blog/?p=861</guid>
		<description><![CDATA[- Current Citation:
Bennett CM, Wolford GL, Miller MB. (in press). The Principled Control of False Positives in Neuroimaging.  Social Cognitive and Affective Neuroscience.
- Abstract:
An incredible amount of data is generated in the course of a functional neuroimaging experiment.  The quantity of data gives us improved temporal and spatial resolution with which to evaluate [...]]]></description>
			<content:encoded><![CDATA[<p><strong>- Current Citation:</strong><br />
Bennett CM, Wolford GL, Miller MB. (in press). The Principled Control of False Positives in Neuroimaging.  <em>Social Cognitive and Affective Neuroscience.</em></p>
<p><strong>- Abstract:</strong><br />
An incredible amount of data is generated in the course of a functional neuroimaging experiment.  The quantity of data gives us improved temporal and spatial resolution with which to evaluate our results.  It also creates a staggering multiple testing problem.  A number of methods have been created that address the multiple testing problem in neuroimaging in a principled fashion.  These methods place limits on either the familywise error rate (FWER) or the false discovery rate (FDR) of the results.  These principled approaches are well established in the literature and are known to properly limit the amount of false positives across the whole brain. However, a minority of papers are still published every month using methods that are improperly corrected for the number of tests conducted.  These latter methods place limits on the voxelwise probability of a false positive and yield no information on the global rate of false positives in the results.  In this commentary we argue in favor of a principled approach to the multiple testing problem &#8211; one that places appropriate limits on the rate of false positives across the whole brain and gives the reader the information they need to properly evaluate the results.</p>
<p><strong>- Downloadable Versions:</strong><br />
[<a href="http://prefrontal.org/files/papers/Bennett-SCAN-2009.pdf">Manuscript PDF</a>]</p>
<p><span id="more-861"></span><br />
<strong>- Full Text:</strong><br />
The struggle between the appropriate treatment of false positives and false negatives is a fine line that every scientist must walk.  If our criteria are too conservative we will not have the power to detect meaningful results.  If our thresholds are too liberal our results will become contaminated by an excess of false positives.  Ideally, we hope to maximize the number of true positives (hits) while minimizing false reports.</p>
<p>It is a statistical necessity that we must adapt our threshold criteria to the number of statistical tests completed on the same dataset.  This multiple testing problem is not unique to neuroimaging; it affects many areas of modern science.  Ask an economist about finding market correlations between 10,000 stocks or a geneticist about testing across 100,000 SNPs and you will quickly understand the pervasiveness of the multiple testing problem throughout scientific research (Storey and Tibshirani 2003; Taleb 2004).  </p>
<p>In this paper we argue for the use of principled corrections when dealing with the large number of comparisons typical of neuroimaging data.  By principled, we mean a correction that definitively identifies for the reader the probability or the proportion of false positives that could be expected in the reported results.  Ideally, the correction would be easy for the reader to understand.  Many researchers have avoided principled correction due to the perception that such methods are too conservative.  In theory and in practice, there is no reason for a principled correction to be either liberal or conservative.  The degree of &#8216;conservativeness&#8217; generally can be adjusted by setting a parameter, maintaining accurate knowledge about the prevalence of false positives.  Later in the commentary, we will outline familywise error rate correction (FWER) and false discovery rate correction (FDR) as two examples of principled correction.</p>
<p><strong>The Problem</strong></p>
<p>Many published fMRI papers use arbitrary, uncorrected statistical thresholds.  A commonly chosen threshold is p < 0.001 with a minimum voxel clustering value of 10 voxels.  For a few datasets this threshold may strike an appropriate balance between sensitivity and specificity; and in a few cases it might be possible to specify the probability of a false positive with this threshold.  However, this uncorrected cutoff cannot be valid for the diverse array of situations in which it is used. The same threshold has been used with data comprising 10,000 voxels and with data comprising 60,000 voxels – this simply cannot be appropriate.  The two situations have very different probabilities of false positives.  The use of a principled procedure would yield the same expected probability or proportion of false positives for any number of voxels under investigation.</p>
<p>In a recent survey of all articles published in six major neuroimaging journals during the year 2008 we found that between 25-30% of fMRI articles in each journal used uncorrected thresholds in their analysis (Bennett, Baird et al. Under Review).  This percentage speaks to the fact that the majority of published research uses principled correction.  However, the meta-analysis also highlights that a quarter to a third of published papers do not use principled correction, and that such papers continue to be published in high-impact, specialized journals.  The proportion of studies using uncorrected thresholds is even higher within the realm of conference posters and presentations.  In a survey of posters presented at a recent neuroscience conference we found that 80% of the presentations used uncorrected thresholds.  In these unprincipled cases the reader is unlikely to have an accurate idea about the true likelihood of false positives in the results.  </p>
<p>The prevalence of unprincipled correction in the literature is a serious issue.  During an examination of familywise error correction methods in neuroimaging, Nichols and Hayasaka (2003) compared techniques that included Gaussian Random Field Theory, Bonferroni, FDR, Šidák, and permutation.  They found that only 8 out of 11 fMRI and PET studies had any significant voxels after familywise correction had been completed, leaving three studies with no significant voxels at all.   Based on this data it is quite likely that results comprised wholly of false positives are present in the current literature.  Despite this fact new studies reporting uncorrected statistics are published every month.</p>
<p>False positives can be costly in a number of ways.  One example of the negative consequences of false positives can be illustrated in a study completed by one of the current authors (MBM) in graduate school.   He conducted an fMRI study investigating differential activations between false memories and true memories using the Roediger and McDermott word paradigm (1995).  At the same time Schacter and colleagues were conducting a PET study using the same approach.  Using a liberal uncorrected threshold Schacter and colleagues found a few small regions of interest in the medial temporal lobe and superior temporal sulcus (Schacter, Reiman et al. 1996).  In their own results Miller and colleagues found two very different small clusters in the frontal and parietal cortex.  When the Miller et al. study was presented at the Society for Neuroscience conference (Miller, Erickson et al. 1996) it was made clear that multiple testing correction was necessary.  None of the results survived correction and the study was never released, while the uncorrected Schacter results were published in a major neuroimaging journal.   Since that time there has been a scattering of studies reporting different patterns of brain activations for false memories and for true memories.  Virtually all of them have used uncorrected thresholds and have proven difficult to replicate.  This situation raises two issues.  The first issue is the amount of time and resources that have been spent trying to extend results that may never have existed in the first place. The second issue is the prevailing skewed view of the literature that brain activations can be reliably discerned between false and true memories because only reports with positive results will be published.</p>
<p>Less rigorous control of Type I errors would not be so bad if inferences based on false positives were easily correctable.  However, this does not seem to be the case within the current model of publication.  If researchers failed to reproduce the results of a currently published study it would be quite difficult to disseminate their null findings.  This forms one of the most profound differences between Type I and Type II error: false negatives are correctable in future publications while false positives are difficult to refute once established in the literature.  </p>
<p>This imbalance in the propagation of Type I and Type II errors contributes to an issue known as the ‘File Drawer Problem’ (Rosenthal 1979).  This refers to the publication bias that ensues because the probability of a study being published is directly tied to the significance of a result.  While presentation of null results is not unheard of (see Baker, Hutchinson, 2007) such publications are generally considered the exception and not the rule.</p>
<p>Another important cautionary tale is our recent investigation of false positives during the acquisition of fMRI data from a dead Atlantic salmon (Bennett, Baird et al. 2009; Bennett, Baird et al. Under Review).  Using standard acquisition, preprocessing, and analysis techniques we were able to show that active voxel clusters could be observed in the dead salmon’s brain when using uncorrected statistical thresholds.  If any form of correction for multiple testing was applied these false positives were no longer present.  While the dead salmon study can only speak to the role of principled correction in a single subject, we believe it effectively illustrates the dangers of false positives in any neuroimaging analysis.</p>
<p>A bit of clarification may be important at this point.  Our goal should not be to completely eliminate false positives.  To be completely certain that all of our results are true positives would require obscenely high statistical thresholds that would eliminate all but the very strongest of our legitimate results.  Therefore we must accept that there will always be some risk of false positives in our reports.  At the same time, it is critical that we be able to specify how probable false positives are in our data in a way that is readily communicated to the reader.  </p>
<p>In this discussion of false positives it is also important that we not minimize the danger of high false negative rates.  Being over-conservative regarding the control of Type I error comes at the expense of missing true positives.  Perhaps for this reason there have been some voices in the imaging community that argue against principled correction due to the resulting loss of statistical power.  Again, a principled correction does not necessarily lead to a loss of power.  The researcher can set a liberal criterion in FDR or FWE and the readers can use their precise knowledge of the false positive rate to evaluate the reported results.  </p>
<p><strong>Our Argument</strong></p>
<p>There is a single key argument that we wish to make regarding proper protection against Type I error in fMRI.  <strong>All researchers should use statistical methods that provide information on the Type I error rate across the whole brain.</strong>  It doesn’t matter what method you use to accomplish this.  You can report the false discovery rate (Benjamini and Hochberg 1995), or use one of several methods to control for the familywise error rate (Nichols and Hayasaka 2003).  You can even do a back-of-the-napkin calculation and use a Bonferroni-corrected threshold if you wish.  The end goal is the same: giving the reader information on the prevalence of false positives across the entire family of statistical tests.</p>
<p><center><br />
<table width='500'>
<tr>
<td>
<a href="http://prefrontal.org/blog/wp-content/uploads/2009/12/Bennett-2009-SCAN-Figure1LG.jpg"><img src="http://prefrontal.org/blog/wp-content/uploads/2009/12/Bennett-2009-SCAN-Figure1.jpg" alt="Bennett-2009-SCAN-Figure1" title="Bennett-2009-SCAN-Figure1" width="500" height="127"></a>
</td>
</tr>
<tr>
<td>
Figure 1.  Example figure of a hybrid corrected/uncorrected data presentation.  Areas that are significant under an uncorrected threshold of p < 0.001 with a 10 voxel extent criteria are shaded in blue.  Areas that are significant under a corrected threshold of FDR = 0.05 are shaded in orange.
</td>
</tr>
</table>
<p></center></p>
<p>We would further argue that an investigator could still use an uncorrected threshold for their data as long as proper corrected values detailing the prevalence of false positives are also provided.  In this manner you could threshold your data at p < 0.001 with a 10 voxel extent as long as you presented what FDR or FWE threshold would be required for the results to stay significant.  One example can be seen in the above figure.  In this image voxels that survive an uncorrected threshold are depicted in cool colors while voxels that survive FDR correction are depicted in warm colors.  This allows a researcher to ‘have their cake and eat it too’.  Again, the key to our argument is not that we need to use correction simply for correction’s sake, just that our readers are made aware of the false positive rate across the whole brain.</p>
<p><strong>Techniques For Principled Correction</strong></p>
<p>There are a wide variety of methods that can be used to hold the false positive rate at specified levels across the whole brain.  One approach is to place limits on the familywise error rate (FWER).  Using this method a criterion value of 0.05 would mean that there is a 5% chance of one or more false positives across the entire set of tests.  This yields a 95% confidence level that there are no false positives in your results.  There are many methods that can be used to control the FWER in neuroimaging data: the Bonferroni correction, the use of Gaussian Random Field Theory (Worsley, Evans et al. 1992), and nonparametric permutation correction techniques (Nichols and Holmes 2002).  Nichols and Hayasaka (2003) have authored an excellent article reviewing these techniques.  The Bonferroni correction is typically seen as too conservative for functional neuroimaging since it does not take into account spatial correlation between voxels.  Gaussian RFT adapts to spatial smoothness of the data, but was shown to be quite conservative at low levels of smoothness.  The use of permutation-based techniques to control the FWER emerged as an ideal choice for adequate correction while maintaining high sensitivity.</p>
<p>Another approach to principled correction is to place limits on the false discovery rate (Benjamini and Hochberg 1995; Genovese, Lazar et al. 2002).  Using this method a criterion value of 0.05 would mean that on average 5% of the observed results would be false positives.  The goal of this approach is not to completely eliminate familywise errors, but to control how pervasive false positives are in the results.  This is a weaker control to the multiple testing problem, but one that still provides precise estimates of the percentage of false positives. </p>
<p>The advantages and disadvantages of each correction approach are illustrated graphically using simulated data in Figure 2.  The simulated data is set up so that the uncorrected results have a power of 0.80.  Controlling for the familywise error rate with the criterion p(FWE) = 0.05 can be seen to virtually eliminate false positives while dramatically reducing the amount of detected signal.  In this example power is reduced to 0.16.  Controlling the false discovery rate with the criterion FDR = 0.05 increases the number of false positives relative to FWER techniques, but also increases the ability to detect meaningful signal.  In this example power is increased to 0.54.</p>
<p><center><br />
<table width='500'>
<tr>
<td>
<a href="http://prefrontal.org/blog/wp-content/uploads/2009/12/Bennett-2009-SCAN-Figure2LG.jpg"><img src="http://prefrontal.org/blog/wp-content/uploads/2009/12/Bennett-2009-SCAN-Figure2.jpg" alt="Bennett-2009-SCAN-Figure2" title="Bennett-2009-SCAN-Figure2" width="500" height="508"></a>
</td>
</tr>
<tr>
<td>
Figure 2.  Demonstration of correction methods for the multiple testing problem.  a) A raw image of the simulated data used in this example.  A field of Gaussian random noise was added to a 100&#215;100 image with a 50&#215;50 square section of signal in the center.  b) Thresholded image of the simulated data using a pixelwise statistical test.  The threshold for this test was p < 0.05.  Power is high at 0.80, but a number of false positives can be observed.  c) Thresholded image of the simulated data using a Bonferroni FWER correction.  The probability of a familywise error was set to 0.05.  There are no false positives across the entire set of tests, but power is reduced to 0.16.  d) Thresholded image of the simulated data while controlling the false discovery rate.  The FDR for this example was set to 0.05.  Out of the results 4.9% are known to be false positives, but power is increased to 0.54.
</td>
</tr>
</table>
<p></center></p>
<p>If you are concerned about power, you can appropriately adjust the cutoff in FWE or FDR.  For instance, it isn’t strictly necessary to use 0.05 in either FWE or FDR.  It might yield a better balance of power and false positive protection to use 0.10 or even something higher.  You will be more likely to find true sources of activation and the reader will still have a precise idea about the prevalence of false positives.</p>
<p>It is important to understand the appropriate use of the correction method you select.  For instance, one commonly used approach is the small-volume correction (SVC) method in SPM (http://www.fil.ion.ucl.ac.uk/spm/).  The use of SVC allows researchers to conduct principled correction using Gaussian Random Field Theory within a predefined region of interest.  Ideally this would be a region defined by anatomical boundaries or a region identified in a previous, independent dataset.  However, many researchers implement SVC incorrectly, choosing to first conduct a whole-brain exploratory analysis and then using SVC on the resulting clusters (cf. Loring, Meador et al. 2002; Poldrack and Mumford 2009).  This is an inappropriate approach that does not yield a principled correction.  Another method that is often incorrectly used is the AlphaSim tool included in AFNI (http://afni.nimh.nih.gov/afni/).  For effective false positive control AlphaSim requires that an estimate of the spatial correlation across voxels be modeled using the program 3dFWHM.  Many researchers simply input the amount of Gaussian smoothing that was applied during preprocessing, leading to incorrect clustering thresholds as output.  Errors during estimation of the spatial smoothness can also lead to incorrect values.</p>
<p>In the future we may have statistical methods that are better able to address the multiple testing problem.  Hierarchical Bayes models have been offered as one approach (Lindquist and Gelman 2009).  We may even move away from the binary decision of significance and begin to examine effect sizes in earnest (Wager 2009).  Still, we must examine the balance of Type I and Type II error in the context of where our analysis techniques are today.  At present the general linear model is by far the most prevalent method of analysis in fMRI.  Mumford and Nichols (2009) found that approximately 92% of group fMRI results were computed using an ordinary least squares (OLS) estimation of the general linear model.  This percentage is unlikely to shift dramatically in the next 12-36 months.  Our focus should remain on how to improve OLS methods in the near term as we move toward new analysis techniques in the future.</p>
<p><strong>Predetermined cluster size as a partial correction</strong></p>
<p>In neuroimaging we often rely on the fact that legitimate results tend to spatially cluster together.  The assumption being that voxel clustering provides some assurance against Type I errors.  While predefined thresholds in combination with predetermined clustering requirements may represent a sufficient approximation of a proper threshold, it is in general an unprincipled approach to the control of Type I error rates.</p>
<p>Many authors justify this approach by referring to the results of Forman et al. (1995), who examined clustering behavior of voxels in fMRI.  The results of Forman et al. suggest that a threshold of p < 0.001 combined with a 10 voxel extent requirement should more than adequately control for the prevalence of false positives.  However, the Forman et al. data was only computed across two-dimensional slices, not in 3D volumes.  The findings of Forman et al. simply do not apply to modern fMRI data.</p>
<p>It should also be noted that we are not arguing that p < 0.001 with a 10 voxel threshold is wholly inappropriate.  For example, Cooper and Knutson (2008) used the AlphaSim utility in AFNI to determine that a corrected threshold of p < 0.001 with a 10 voxel extent threshold would be appropriate to keep the familywise error rate at 5% in their particular dataset.  The problem is that this threshold is specific to the parameters of their dataset, and may be inappropriate in other datasets. Arnott et al. (2008) used the same AFNI routine and estimated that an 81 voxel extent was required to ensure that familywise error was kept below 5%.  It is possible to use the combination of a p value and a cluster size in a principled way, but it requires computing the proper values for each and every analysis.  The cluster size criteria can change quite substantially from dataset to dataset.  Further, it can be the case that required cluster sizes become so large that legitimate results with a smaller volume are missed.</p>
<p><strong>Conclusions</strong></p>
<p>The topic of proper Type I error protection is not a new element of discussion in the field of neuroimaging.  The need to correct for thousands of statistical tests has been recognized since the early PET imaging days (Worsley, Evans et al. 1992).  It is uncertain why uncorrected thresholds have lingered so long.  Perhaps many researchers simply recognized it as an accepted, arbitrary threshold in the same manner p < 0.05 is an accepted, arbitrary threshold throughout other scientific fields.  This approach may have been acceptable in the past, but within the last decade we, as a field, have come under increased scrutiny from the public and from other scientists.  At a time when so many are looking for us to slip up we believe it is time to set a new standard of quality with regard to our data acquisition and analysis.  </p>
<p>The fundamental question that that all researchers must face is whether their results will replicate in a new study.  The prevalence of false positives in your results will directly influence this ability.  We are all aware that the multiple testing problem is a major issue in neuroimaging.  How you correct for this problem can be debated, but principled protection against Type I error is an absolute necessity moving forward.</p>
<p><strong>References</strong></p>
<p>Arnott, S. R., J. S. Cant, et al. (2008). &#8220;Crinkling and crumpling: an auditory fMRI study of material properties.&#8221; Neuroimage 43(2): 368-78.</p>
<p>Benjamini, Y. and Y. Hochberg (1995). &#8220;Controlling the false discovery rate: A practical and powerful approach to multiple testing.&#8221; J. Roy. Statist. Soc. Ser. B 57: 289-300.</p>
<p>Bennett, C. M., A. A. Baird, et al. (2009). Neural Correlates of Interspecies Perspective Taking in the Post-Mortem Atlantic Salmon: An Argument For Proper Multiple Comparisons Correction   15th Annual Meeting of the Organization for Human Brain Mapping. San Francisco, CA.</p>
<p>Bennett, C. M., A. A. Baird, et al. (Under Review). &#8220;Neural Correlates of Interspecies Perspective Taking in the Post-Mortem Atlantic Salmon: An Argument For Proper Multiple Comparisons Correction   &#8220;.</p>
<p>Cooper, J. C. and B. Knutson (2008). &#8220;Valence and salience contribute to nucleus accumbens activation.&#8221; Neuroimage 39(1): 538-47.</p>
<p>Forman, S. D., J. D. Cohen, et al. (1995). &#8220;Improved assessment of significant activation in functional magnetic resonance imaging (fMRI): use of a cluster-size threshold.&#8221; Magn Reson Med 33(5): 636-47.</p>
<p>Genovese, C. R., N. A. Lazar, et al. (2002). &#8220;Thresholding of statistical maps in functional neuroimaging using the false discovery rate.&#8221; Neuroimage 15(4): 870-8.</p>
<p>Lindquist, M. A. and A. Gelman (2009). &#8220;Correlations and Multiple Comparisons in Functional Imaging: A Statistical Perspective (Commentary on Vul et al., 2009).&#8221; Perspectives on Psychological Science 4(3): 310-313.</p>
<p>Loring, D. W., K. J. Meador, et al. (2002). &#8220;Now you see it, now you don&#8217;t: statistical and methodological considerations in fMRI.&#8221; Epilepsy Behav 3(6): 539-547.</p>
<p>Miller, E. K., C. A. Erickson, et al. (1996). &#8220;Neural mechanisms of visual working memory in prefrontal cortex of the macaque.&#8221; J Neurosci 16(16): 5154-67.</p>
<p>Mumford, J. A. and T. Nichols (2009). &#8220;Simple group fMRI modeling and inference.&#8221; Neuroimage 47(4): 1469-75.</p>
<p>Nichols, T. and S. Hayasaka (2003). &#8220;Controlling the familywise error rate in functional neuroimaging: a comparative review.&#8221; Stat Methods Med Res 12(5): 419-46.</p>
<p>Nichols, T. E. and A. P. Holmes (2002). &#8220;Nonparametric permutation tests for functional neuroimaging: a primer with examples.&#8221; Hum Brain Mapp 15(1): 1-25.</p>
<p>Poldrack, R. A. and J. A. Mumford (2009). &#8220;Independence in ROI analysis: where is the voodoo?&#8221; Soc Cogn Affect Neurosci 4(2): 208-13.</p>
<p>Roediger, H. L., 3rd and K. B. McDermott (1995). &#8220;Creating false memories: remembering words not presented in lists.&#8221; Journal of Experimental Psychology: Learning, Memory, and Cognition 21: 803-814.</p>
<p>Rosenthal, R. (1979). &#8220;The file drawer problem and tolerance for null results.&#8221; Psychological Bulletin 83(3): 638-641.</p>
<p>Schacter, D. L., E. Reiman, et al. (1996). &#8220;Neuroanatomical correlates of veridical and illusory recognition memory: evidence from positron emission tomography.&#8221; Neuron 17(2): 267-74.</p>
<p>Storey, J. D. and R. Tibshirani (2003). &#8220;Statistical significance for genomewide studies.&#8221; Proc Natl Acad Sci U S A 100(16): 9440-5.</p>
<p>Taleb, N. (2004). Fooled by randomness: the hidden role of chance in lafe and in the market. New York, Thompson/Texere.</p>
<p>Wager, T. D. (2009). If neuroimaging is the answer, what is the question? Estimating Effects and Correlations in Neuroimaging Data Workshop, Columbia University, New York, NY.</p>
<p>Worsley, K. J., A. C. Evans, et al. (1992). &#8220;A three-dimensional statistical analysis for CBF activation studies in human brain.&#8221; Journal of Cerebral Blood Flow &#038; Metabolism 12(6): 900-918.</p>
]]></content:encoded>
			<wfw:commentRss>http://prefrontal.org/blog/2009/12/paper-the-principled-control-of-false-positives-in-neuroimaging/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Holiday Presents for a Neurogeek</title>
		<link>http://prefrontal.org/blog/2009/12/holiday-presents-for-a-neurogeek/</link>
		<comments>http://prefrontal.org/blog/2009/12/holiday-presents-for-a-neurogeek/#comments</comments>
		<pubDate>Thu, 17 Dec 2009 13:15:24 +0000</pubDate>
		<dc:creator>prefrontal</dc:creator>
				<category><![CDATA[CogNeuro]]></category>
		<category><![CDATA[Miscellany]]></category>
		<category><![CDATA[Psychology]]></category>

		<guid isPermaLink="false">http://prefrontal.org/blog/?p=898</guid>
		<description><![CDATA[I know this post might be a bit late in the season to make much of an impact on your shopping plans, but if your loved ones can&#8217;t get enough neuroscience then here are some thoughts for great gifts.  Some are specific to neuroscience, while others are more general and appropriate for any academic. [...]]]></description>
			<content:encoded><![CDATA[<p>I know this post might be a bit late in the season to make much of an impact on your shopping plans, but if your loved ones can&#8217;t get enough neuroscience then here are some thoughts for great gifts.  Some are specific to neuroscience, while others are more general and appropriate for any academic.  Enjoy!</p>
<hr />
General Neuroscience.</p>
<p>- Book: <a href="http://www.amazon.com/gp/product/0878932860?ie=UTF8&#038;tag=prefrontalorg-20&#038;linkCode=as2&#038;camp=1789&#038;creative=390957&#038;creativeASIN=0878932860">Functional Magnetic Resonance Imaging</a><img src="http://www.assoc-amazon.com/e/ir?t=prefrontalorg-20&#038;l=as2&#038;o=1&#038;a=0878932860" width="1" height="1" border="0" alt="" style="border:none !important; margin:0px !important;" />, by Huettel, Song, and McCarthy.  ~$75<br />
I picked this up a few weeks ago since I heard it had a good section on signal and noise in fMRI.  What I found was, far and away, the best single introduction to fMRI that I have run across.  If I am ever fortunate enough to run my own lab then I will see to it that all new lab members are handed this book as soon as they step in the door.  It&#8217;s that good.</p>
<p>- Plush: <a href="http://www.thinkgeek.com/geektoys/plush/bc01/">Neuron</a> or set of <a href="http://www.thinkgeek.com/geektoys/plush/a55e/">Neurons</a>.  ~$12-$24<br />
How much cute can a few dollars buy?  Quite a bit, apparently.  I have a set of plush neurons in my office.  The best part is that they can slot into each other, forming neural networks!  I love it.</p>
<p>- T-Shirt: <a href="http://yellowibis.spreadshirt.com/yellowibis-com-medical-one-liners-men-s-unisex-heavyweight-t-i-love-brains-color-choice-A4609540">I &#x2665; Brains</a>.  ~$20<br />
Don&#8217;t hide your love, share it with the world.  While there may be other organs in the body , the brain is where it&#8217;s at.  </p>
<p>- Poster: <a href="http://www.orkposters.com/brain.html">Think Hard Print</a> (Map of the brain&#8217;s surface).  ~$18<br />
The folks at Ork Posters are well-known for their city neighborhood posters.  In this case they turned their creative talent to the neighborhoods of the brain, and created a great piece of art.  It&#8217;s even anatomically correct.</p>
<p>- Book: <a href="http://www.amazon.com/gp/product/0452288525?ie=UTF8&#038;tag=prefrontalorg-20&#038;linkCode=as2&#038;camp=1789&#038;creative=390957&#038;creativeASIN=0452288525">This Is Your Brain on Music</a><img src="http://www.assoc-amazon.com/e/ir?t=prefrontalorg-20&#038;l=as2&#038;o=1&#038;a=0452288525" width="1" height="1" border="0" alt="" style="border:none !important; margin:0px !important;" />. ~$11<br />
I purchased this book on a whim two years ago and was very pleasantly surprised at how good it is.  Music (and dance) are a key part of the human condition.  With this book you can learn more about what makes music so special within the brain.</p>
<p>- Book: <a href="http://www.amazon.com/gp/product/0064603067?ie=UTF8&#038;tag=prefrontalorg-20&#038;linkCode=as2&#038;camp=1789&#038;creative=390957&#038;creativeASIN=0064603067">The Human Brain Coloring Book</a><img src="http://www.assoc-amazon.com/e/ir?t=prefrontalorg-20&#038;l=as2&#038;o=1&#038;a=0064603067" width="1" height="1" border="0" alt="" style="border:none !important; margin:0px !important;" />.  ~$15<br />
What coloring books do your kids have?  Disney?  Pokemon?  Upgrade them to something better &#8211; something that even med school students use to help learn neuroanatomy.  I purchased my first brain coloring book when I was an undergrad.  It was great then, and it remains great now.</p>
<p>- Tool: <a href="http://www.amazon.com/gp/product/012373603X?ie=UTF8&#038;tag=prefrontalorg-20&#038;linkCode=as2&#038;camp=1789&#038;creative=390957&#038;creativeASIN=012373603X">Atlas of the Human Brain</a><img src="http://www.assoc-amazon.com/e/ir?t=prefrontalorg-20&#038;l=as2&#038;o=1&#038;a=012373603X" width="1" height="1" border="0" alt="" style="border:none !important; margin:0px !important;" />. ~$180+<br />
When you start getting serious about the brain then you are going to need a serious map to help guide you.  My personal standby is the Atlas of the Human Brain by Mai, Paxinos, and Assheuer.  It is a great reference book with excellent illustrations.  As a bonus the atlas comes with a DVD containing PDFs of all the book material.  Copy the DVD to your laptop and you will have your atlas with you everywhere you go.</p>
<p>- Tool: <a href="http://www.carolina.com/product/somso+human+brain+model,+8+parts.do">Somso Human Brain Model</a>. ~$LOTS<br />
One day someone will explain to me why plastic models of the human brain must cost hundreds of dollars.  For now I am a bit lost regarding their exorbitant cost.  Still, these models are incredibly handy to have around when discussing brain anatomy or function.  The link goes to one example of a human brain model, but there are many variations on the theme available.  It is not impossible to spend $1000+ on a really good version.  </p>
<hr />
General Academia.</p>
<p>- Writing Tool: <a href="http://www.amazon.com/gp/product/8883701135?ie=UTF8&#038;tag=prefrontalorg-20&#038;linkCode=as2&#038;camp=1789&#038;creative=390957&#038;creativeASIN=8883701135">Moleskine</a><img src="http://www.assoc-amazon.com/e/ir?t=prefrontalorg-20&#038;l=as2&#038;o=1&#038;a=8883701135" width="1" height="1" border="0" alt="" style="border:none !important; margin:0px !important;" /> notebooks and <a href="http://www.dickblick.com/products/copic-multiliner-sp-pens/">Copic Multiliner SP</a> pens.  ~$8-$15<br />
There are times when academics are out there, on the front line.  Lab meetings.  Department presentations.  Lunch with a collaborator.  Conferences.  In these battles you need the best weapons you can get.  Don&#8217;t get caught with your pants down &#8211; always have solid instruments along with you.  It has taken years of careful testing, but I have settled on Moleskine notebooks and the Copic Multiliner SP pen.  Get the Moleskine with graph paper, and get the 0.35 mm tip Multiliner.  Make sure to get the SP series, because you <em>deserve</em> a rugged aluminum body.</p>
<p>- Writing Tool: <a href="http://www.amazon.com/gp/product/B0000W4MYI?ie=UTF8&#038;tag=prefrontalorg-20&#038;linkCode=as2&#038;camp=1789&#038;creative=390957&#038;creativeASIN=B0000W4MYI">Any</a><img src="http://www.assoc-amazon.com/e/ir?t=prefrontalorg-20&#038;l=as2&#038;o=1&#038;a=B0000W4MYI" width="1" height="1" border="0" alt="" style="border:none !important; margin:0px !important;" /> <a href="http://www.amazon.com/gp/product/B000KA4UYC?ie=UTF8&#038;tag=prefrontalorg-20&#038;linkCode=as2&#038;camp=1789&#038;creative=390957&#038;creativeASIN=B000KA4UYC">kitchen</a><img src="http://www.assoc-amazon.com/e/ir?t=prefrontalorg-20&#038;l=as2&#038;o=1&#038;a=B000KA4UYC" width="1" height="1" border="0" alt="" style="border:none !important; margin:0px !important;" /> <a href="http://www.amazon.com/gp/product/B000I9LDXG?ie=UTF8&#038;tag=prefrontalorg-20&#038;linkCode=as2&#038;camp=1789&#038;creative=390957&#038;creativeASIN=B000I9LDXG">timer</a><img src="http://www.assoc-amazon.com/e/ir?t=prefrontalorg-20&#038;l=as2&#038;o=1&#038;a=B000I9LDXG" width="1" height="1" border="0" alt="" style="border:none !important; margin:0px !important;" /> ~$15<br />
Sometimes I long for a typewriter when I am writing a new manuscript.  Part of the allure is the romance &#8211; feeding the paper in and hearing the click-clack of the hammers striking the page.  The biggest advantage though?  THERE IS NO INTERNET ON A TYPEWRITER.  If you know someone who is as distractible as I am then drop the $15 and buy them a kitchen timer.  Tell them to set it for twenty minutes and make sure to work for that length of time.  Then, when time has elapsed, you get ten minutes to do whatever you want.  This &#8216;dash&#8217; method has saved my bacon, and it is well worth the small cost to give it a try.  Learn more <a href="http://www.43folders.com/2005/09/08/kick-procrastinations-ass-run-a-dash">here</a>.</p>
<p>- Presentation Tool: <a href="http://www.amazon.com/gp/product/B000FPIUAW?ie=UTF8&#038;tag=prefrontalorg-20&#038;linkCode=as2&#038;camp=1789&#038;creative=390957&#038;creativeASIN=B000FPIUAW">Kensington Wireless Clicker</a><img src="http://www.assoc-amazon.com/e/ir?t=prefrontalorg-20&#038;l=as2&#038;o=1&#038;a=B000FPIUAW" width="1" height="1" border="0" alt="" style="border:none !important; margin:0px !important;" />.  ~$35<br />
From the audience it can be a bit humorous when the speaker can&#8217;t seem to get their Powerpoint slides to advance.  Conversely, it is hell when forty pairs of eyes are watching you fumble around at the podium.  If you are presenting in the near future, get a clicker that you can trust.  I have found this Kensington model to be worthy.  You can get this clicker with a <a href="http://www.amazon.com/gp/product/B000FPGP4U?ie=UTF8&#038;tag=prefrontalorg-20&#038;linkCode=as2&#038;camp=1789&#038;creative=390957&#038;creativeASIN=B000FPGP4U">laser pointer</a><img src="http://www.assoc-amazon.com/e/ir?t=prefrontalorg-20&#038;l=as2&#038;o=1&#038;a=B000FPGP4U" width="1" height="1" border="0" alt="" style="border:none !important; margin:0px !important;" /> built-in as well, but I prefer the standard model.  Also, put new batteries in every time you give a talk &#8211; it is worth the three dollars.</p>
<p>- Book: PhD Comics, <a href="http://www.amazon.com/gp/product/0972169504?ie=UTF8&#038;tag=prefrontalorg-20&#038;linkCode=as2&#038;camp=1789&#038;creative=390957&#038;creativeASIN=0972169504">first</a><img src="http://www.assoc-amazon.com/e/ir?t=prefrontalorg-20&#038;l=as2&#038;o=1&#038;a=0972169504" width="1" height="1" border="0" alt="" style="border:none !important; margin:0px !important;" />, <a href="http://www.amazon.com/gp/product/0972169520?ie=UTF8&#038;tag=prefrontalorg-20&#038;linkCode=as2&#038;camp=1789&#038;creative=390957&#038;creativeASIN=0972169520">second</a><img src="http://www.assoc-amazon.com/e/ir?t=prefrontalorg-20&#038;l=as2&#038;o=1&#038;a=0972169520" width="1" height="1" border="0" alt="" style="border:none !important; margin:0px !important;" />, <a href="http://www.amazon.com/gp/product/0972169539?ie=UTF8&#038;tag=prefrontalorg-20&#038;linkCode=as2&#038;camp=1789&#038;creative=390957&#038;creativeASIN=0972169539">third</a><img src="http://www.assoc-amazon.com/e/ir?t=prefrontalorg-20&#038;l=as2&#038;o=1&#038;a=0972169539" width="1" height="1" border="0" alt="" style="border:none !important; margin:0px !important;" />, or <a href="http://www.amazon.com/gp/product/0972169547?ie=UTF8&#038;tag=prefrontalorg-20&#038;linkCode=as2&#038;camp=1789&#038;creative=390957&#038;creativeASIN=0972169547">fourth</a><img src="http://www.assoc-amazon.com/e/ir?t=prefrontalorg-20&#038;l=as2&#038;o=1&#038;a=0972169547" width="1" height="1" border="0" alt="" style="border:none !important; margin:0px !important;" /> releases.  ~$8-$14<br />
Let&#8217;s get something squared away right off the bat: Jorge Cham saves lives.  His creation, <a href="http://www.phdcomics.com/comics/aboutcomics.html">PhD comics</a>, details the everyday insanity that every grad student must deal with.  Take a few minutes and surf over to the <a href="http://www.phdcomics.com/comics.php">website</a> and read a few panels, just to get a feel for it.  If you know anyone who has ever struggled with the soul-crushing madness of grad school then any one of these books will be a cathartic experience.  Also check out the PhD Comics <a href="http://www.phdcomics.com/store/mojostore.php">online store</a>.</p>
<p>- Software: <a href="http://mekentosj.com/papers/">Papers</a>, the personal research library (Mac OS X). ~$42<br />
I have several thousand PDF files on my computer.  Now, suppose I need to find ONE of them.  In the bad old days I would have the PDFs organized by topic in a series of folders on my computer.  To find the right one I would have to remember what topic it might be under, or else face the time-sucking wrath of the Finder&#8217;s search tool.  Now, enter Papers, the iTunes of PDF articles.  It will properly store and organize all your academic PDF files.  Want to see all the articles for a specific author?  Done.  Want to see all articles you have from a specific journal?  Done.  Need to build a list of articles that will be useful for your next paper?  Done and done.  A simple and beautiful program.  Try it out for 30 days and decide if it works for you.  They even give an academic discount!</p>
<p>- Reading Gadget: <a href="http://www.amazon.com/gp/product/B0015TCML0?ie=UTF8&#038;tag=prefrontalorg-20&#038;linkCode=as2&#038;camp=1789&#038;creative=390957&#038;creativeASIN=B0015TCML0">Amazon Kindle DX Reader</a><img src="http://www.assoc-amazon.com/e/ir?t=prefrontalorg-20&#038;l=as2&#038;o=1&#038;a=B0015TCML0" width="1" height="1" border="0" alt="" style="border:none !important; margin:0px !important;" />.  ~$500<br />
When the Kindle first came out I quickly dismissed it as a device with a lot of promise, but limited by various hardware and software shortcomings.  No longer.  With the Kindle DX things start getting really interesting for academics.  The device natively support the PDF file format, which means that all of the journal articles we have downloaded can be opened.  Further, the screen is large enough to be able to read those articles pretty comfortably.  The Kindle might be a unique solution if you are looking to go all-digital.</p>
]]></content:encoded>
			<wfw:commentRss>http://prefrontal.org/blog/2009/12/holiday-presents-for-a-neurogeek/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>Quote of the Week – Logothetis</title>
		<link>http://prefrontal.org/blog/2009/12/quote-of-the-week-logothetis/</link>
		<comments>http://prefrontal.org/blog/2009/12/quote-of-the-week-logothetis/#comments</comments>
		<pubDate>Wed, 09 Dec 2009 22:03:00 +0000</pubDate>
		<dc:creator>admin</dc:creator>
				<category><![CDATA[MRI]]></category>
		<category><![CDATA[Quotes]]></category>

		<guid isPermaLink="false">http://prefrontal.org/blog/?p=855</guid>
		<description><![CDATA[“fMRI is a measure of mass action.  You almost have to be a professional moron to think you’re saying something profound about the neural mechanisms. You’re nowhere close to explaining what’s happening, but you have a nice framework, an excellent starting point.&#8221;  ~ Nikos Logothetis (As seen in Science News)
]]></description>
			<content:encoded><![CDATA[<p>“fMRI is a measure of mass action.  You almost have to be a professional moron to think you’re saying something profound about the neural mechanisms. You’re nowhere close to explaining what’s happening, but you have a nice framework, an excellent starting point.&#8221;  ~ <a href="http://www.kyb.mpg.de/~nikos">Nikos Logothetis</a> (As seen in <a href="http://www.sciencenews.org/view/feature/id/50295/title/Trawling_the_brain">Science News</a>)</p>
]]></content:encoded>
			<wfw:commentRss>http://prefrontal.org/blog/2009/12/quote-of-the-week-logothetis/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Live Sectioning of HM’s Brain</title>
		<link>http://prefrontal.org/blog/2009/12/live-sectioning-of-hms-brain/</link>
		<comments>http://prefrontal.org/blog/2009/12/live-sectioning-of-hms-brain/#comments</comments>
		<pubDate>Wed, 02 Dec 2009 22:07:32 +0000</pubDate>
		<dc:creator>prefrontal</dc:creator>
				<category><![CDATA[CogNeuro]]></category>
		<category><![CDATA[Psychology]]></category>

		<guid isPermaLink="false">http://prefrontal.org/blog/?p=846</guid>
		<description><![CDATA[The Brain Observatory at UCSD is doing a live feed of the histological sectioning of patient HM&#8217;s brain today.  The feed will continue for the next two days while they slice through HM&#8217;s brain by fractions of a millimeter at a time.  You can view the feed yourself at the following link:
http://thebrainobservatory.ucsd.edu/hm_live.php.
The studies [...]]]></description>
			<content:encoded><![CDATA[<p>The Brain Observatory at UCSD is doing a live feed of the histological sectioning of patient HM&#8217;s brain today.  The feed will continue for the next two days while they slice through HM&#8217;s brain by fractions of a millimeter at a time.  You can view the feed yourself at the following link:<br />
<a href="http://thebrainobservatory.ucsd.edu/hm_live.php">http://thebrainobservatory.ucsd.edu/hm_live.php</a>.</p>
<p>The studies done with HM revolutionized our understanding of human memory.  His case remains one of the most important in the history of psychology and cognitive science.  If you aren&#8217;t familiar with patient HM then take a few minutes and go through the Wikipedia article on him:<br />
<a href="http://en.wikipedia.org/wiki/HM_(patient)">http://en.wikipedia.org/wiki/HM_(patient)</a></p>
<p><img src="http://prefrontal.org/blog/wp-content/uploads/2009/12/MicrotomeHM.jpg" alt="MicrotomeHM" title="MicrotomeHM" width="600" height="251" class="alignnone size-full wp-image-847" /></p>
]]></content:encoded>
			<wfw:commentRss>http://prefrontal.org/blog/2009/12/live-sectioning-of-hms-brain/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>The War on Fish: False Positive Horror Stories</title>
		<link>http://prefrontal.org/blog/2009/10/the-war-on-fish-false-positive-horror-stories/</link>
		<comments>http://prefrontal.org/blog/2009/10/the-war-on-fish-false-positive-horror-stories/#comments</comments>
		<pubDate>Tue, 27 Oct 2009 04:03:31 +0000</pubDate>
		<dc:creator>prefrontal</dc:creator>
				<category><![CDATA[CogNeuro]]></category>
		<category><![CDATA[MRI]]></category>
		<category><![CDATA[Statistics]]></category>

		<guid isPermaLink="false">http://prefrontal.org/blog/?p=816</guid>
		<description><![CDATA[Citizens of the Interwebs &#8211; we are in need of your assistance!
My advisor Mike Miller and I have been asked to write a commentary in a major neuroimaging journal that discusses the importance of protecting against false positives (Type I error) in fMRI.  This is essentially an extension of the arguments that we made [...]]]></description>
			<content:encoded><![CDATA[<p><img src="http://prefrontal.org/blog/wp-content/uploads/2009/10/WarOnFish.jpg" alt="WarOnFish" title="WarOnFish" width="200" height="205" align="right">Citizens of the Interwebs &#8211; we are in need of your assistance!</p>
<p>My advisor <a href="http://www.psych.ucsb.edu/people/faculty/miller/index.php">Mike Miller</a> and I have been asked to write a commentary in a major neuroimaging journal that discusses the importance of protecting against false positives (Type I error) in fMRI.  This is essentially an extension of the arguments that we made in the Atlantic Salmon paper.  The commentary will be published alongside a similar piece from a separate group of authors that discusses the relative merits of avoiding false negatives (Type II error).</p>
<p>As part of our commentary we are collecting horror stories of what can happen when high rates of false positives are allowed to be present in imaging results.  Have you ever looked at an article that immediately sounded your BS alarm?  Perhaps you have spent time hopelessly trying to replicate a study that refused to cooperate.  There are a thousand ways in which false positives can negatively impact cognitive neuroscience.  If you know of a good anecdote I would love to hear about it.</p>
<p>If you would like to share your story online then you can post it below in the comments.  Alternatively you can simply email it to me directly.  No names, places, or dates are necessary please.  We&#8217;re looking forward to hearing from you!</p>
<p>Best ~ Craig [Prefrontal]</p>
]]></content:encoded>
			<wfw:commentRss>http://prefrontal.org/blog/2009/10/the-war-on-fish-false-positive-horror-stories/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>The Internet Found the Atlantic Salmon</title>
		<link>http://prefrontal.org/blog/2009/09/the-internet-found-the-atlantic-salmon/</link>
		<comments>http://prefrontal.org/blog/2009/09/the-internet-found-the-atlantic-salmon/#comments</comments>
		<pubDate>Tue, 22 Sep 2009 02:49:52 +0000</pubDate>
		<dc:creator>prefrontal</dc:creator>
				<category><![CDATA[CogNeuro]]></category>
		<category><![CDATA[MRI]]></category>
		<category><![CDATA[Meta]]></category>
		<category><![CDATA[Miscellany]]></category>

		<guid isPermaLink="false">http://prefrontal.org/blog/?p=758</guid>
		<description><![CDATA[The last 72 hours have seen an incredible increase in traffic here at prefrontal.org.  To sum it up in a single sentence: the site has received as many hits in the last three days as it has during the past two years.  Yeah, really.  My activity graph on the Wordpress Dashboard looks [...]]]></description>
			<content:encoded><![CDATA[<p>The last 72 hours have seen an incredible increase in traffic here at prefrontal.org.  To sum it up in a single sentence: the site has received as many hits in the last three days as it has during the past two years.  Yeah, really.  My activity graph on the Wordpress Dashboard looks like this:</p>
<p><center><img src="http://prefrontal.org/blog/wp-content/uploads/2009/09/SalmonTraffic.jpg" alt="SalmonTraffic" title="SalmonTraffic" width="564" height="174"></center></p>
<p>It seems that late last week a few major neuroscience weblogs discovered the Salmon poster and decided to write up summaries.  Those readers then posted to their weblogs, whose readers posted to their weblogs, and so on.  By 10am Friday morning the prefrontal.org activity meter was pegged and my inbox was full.</p>
<p><strong>A few important bits of info:</strong><br />
* The current status of the Salmon is that we are trying to publish it as an editorial in a major neuroimaging journal.  We are very close to resubmitting, only needing to complete a survey on the prevalence of multiple comparisons correction in the previous neuroimaging literature.  We hope that it will be released in the near future.<br />
* If you would like to be sent a copy of the commentary if/when it becomes published just send me an email and I will put you on the list.<br />
* Some sites have played up how difficult it has been for us to get the Salmon published.  We have received some, well, interesting feedback by a few editors in the course of our submission.  Still, it has not been more difficult than average to get the Salmon commentary published (so far).<br />
* The goal of the Salmon poster was to encourage the minority of researchers who report uncorrected statistics to move forward and begin using basic multiple comparisons correction in their research.  The Salmon doesn&#8217;t add anything to the technical discussion of how multiple comparisons correction is performed, it is simply a salient reminder of why <em>proper</em> correction is always necessary.<br />
* None of the authors intended for the Salmon to go public in such a big way, especially before the commentary was reviewed and published.  We were actually quite content to publish our editorial in a neuroimaging journal and be done with it.  We feel that, fundamentally, this is an internal debate within the field of neuroimaging.</p>
<p><strong>Some of the best/notable writeups that I have found:</strong><br />
* <a href="http://neuroskeptic.blogspot.com/2009/09/fmri-gets-slap-in-face-with-dead-fish.html">http://neuroskeptic.blogspot.com</a><br />
* <a href="http://languagelog.ldc.upenn.edu/nll/?p=1746">http://languagelog.ldc.upenn.edu</a><br />
* <a href="http://lawandbiosciences.wordpress.com/2009/09/18/what-a-dead-salmon-reminds-us-about-fmri-analysis/">http://lawandbiosciences.wordpress.com</a><br />
* <a href="http://www.mindhacks.com/blog/2009/09/scientists_find_area.html">http://www.mindhacks.com</a><br />
* <a href="http://www.wired.com/wiredscience/2009/09/fmrisalmon/">http://www.wired.com</a><br />
* <a href="http://www.newscientist.com/blogs/shortsharpscience/2009/09/dead-salmon-responds-to-portra.html">http://www.newscientist.com/blogs/shortsharpscience</a><br />
* <a href="http://blogs.nature.com/news/thegreatbeyond/2009/09/study_warns_of_red_herrings_in.html">http://blogs.nature.com/news/thegreatbeyond</a><br />
* <a href="http://blogs.discovermagazine.com/discoblog/2009/09/21/can-a-dead-fish-prove-that-modern-brain-studies-are-bunk/">http://blogs.discovermagazine.com/discoblog/</a><br />
* <a href="http://chronicle.com/blogPost/Dead-Fish-Lights-Up-When-Shown/8130/?sid=pm&#038;utm_source=pm&#038;utm_medium=en">http://chronicle.com/</a><br />
* <a href="http://science.slashdot.org/story/09/09/20/1948208/Dead-Salmons-Brain-Activity-Cautions-fMRI-Researchers">http://slashdot.org</a></p>
<p><strong>Some of the best comments that I have run across:</strong></p>
<p>&#8220;The recorded signal is changing due to noise. The point of the experiment is that if you look at enough signals, the noise in one will match the timing of your experimental stimulus, purely out of chance. Another way of looking at it is this: if you choose a statistical threshold of p 0.05 then, statistically, you expect a result that is significant at that level purely out of chance once in every twenty experiments. When you&#8217;re analyzing images, or worse volumes, pixel by pixel, you&#8217;re doing a LOT of comparisons. If you don&#8217;t correct for that you WILL get false positives, no matter what you&#8217;re looking at.&#8221; &#8211; ceoyoyo</p>
<p>&#8220;But not everyone uses multiple comparisons correction. This is where the fish comes in &#8211; Bennett et al show that if you don&#8217;t use it, you can find &#8220;neural activation&#8221; even in the tiny brain of dead fish. Of course, with the appropriate correction, you don&#8217;t. There&#8217;s nothing original about this, except the colourful nature of the example &#8211; but many fMRI publications still report &#8220;uncorrected&#8221; results&#8221; &#8211; Neuroskeptic</p>
<p>&#8220;&#8230; it seems to me like that their point wasn&#8217;t that the fMRI wasn&#8217;t sensitive enough, or particular enough. Instead the problem seems to be a problem of statistically expected random noise. Their point seems to be that users of an fMRI should bear in mind that their marvelous magical machine can generate &#8220;real&#8221; errors, and that basic, common-sense multiple comparison habits should be developed, instead of a take a picture, slap a stat against it approach.&#8221; &#8211; Nemus</p>
<p>&#8220;The entire point the write up was to warn about the danger of false positives. Your attributing of brain activity to random, natural noise is exactly the danger they want to avoid.&#8221; &#8211; Anonymous</p>
<p>&#8220;The trouble is, most scientists are not mathematicians, and have no good theoretical understanding of statistics. Most people pushing buttons in SPSS or SAS (or what have you) are just doing &#8220;cargo cult&#8221; mathematics. Ask them to justify why their &#8220;very conservative&#8221; confidence interval for a given test is appropriate when dealing with eleventy billion variables, or why a particular post-hoc test is the proper one to use, and they&#8217;ll look at you like you just asked them to prove that the sky is blue.&#8221; &#8211; Anonymous</p>
<p>&#8220;Actually, the voodoo correlations paper is actually talking about performing correlations between the signals we get from fMRI scans (you can read the actual paper instead of the somewhat misleading article here [edvul.com]), and other measurements or scores. This doesn&#8217;t do that at all. This is about the danger of false positives in fMRI imaging, because of the large number of statistical tests that are done across the brain. The majority of peer reviewed published fMRI papers do some type of multiple comparisons correction to attempt to adjust for this problem.&#8221; &#8211; daenris</p>
<p><strong>Some of the more terrible writeups that I have found:</strong><br />
By and large the comments have been quite good.  However, there have been a few people arguing that the dead fish is actually still thinking or that we have observed evidence of the ethereal soul.  I am not going to quote the comments here, but it has been a bit amusing to see this play out&#8230;</p>
<p><center><img src="http://prefrontal.org/blog/wp-content/uploads/2009/09/SalmonLOL2.jpg" alt="SalmonLOL2" title="SalmonLOL2" width="380" height="253"></center></p>
<p><strong>The funniest comments so far:</strong></p>
<p>&#8220;Of course, Bennett&#8217;s group don&#8217;t mean to suggest that a post-mortem salmon is capable of perspective-taking. Cod forbid.&#8221;<br />
- Kerri</p>
<p>No sir. What it proves is the existence of the sole.<br />
-Jeremi</p>
<p>Yeah, the measurements were right off the scales.<br />
- grcumb</p>
<p>Thank you&#8230; he&#8217;s here all week. Try the fish.<br />
- theshowmecanuck</p>
<p>I&#8217;m wondering if the issue could be resolved if the salmon was smoked and served with cream cheese&#8230;<br />
- G</p>
<p>No, what it proves is that while you can tune an fMRI, you can&#8217;t tuna fish.<br />
- limekiller4</p>
<p>I would think that a salmon in an MRI would be thinking more along the lines of &#8220;HOLY FUCK! I CAN&#8217;T BREATHE!&#8221;<br />
- geminidomino</p>
<p>Does the scientific method for biologists exclude barbeques?<br />
- value_added</p>
<p>And I, for one, welcome our new zombie salmon overlords!!<br />
- DarkOx</p>
<p>Why wasn&#8217;t this published?  Maybe the reviewers considered the experiment a bit fishy &#8230;<br />
- maxwell_demon</p>
<p>First, the fish wasn&#8217;t dead, it was just tenured.<br />
- jesor</p>
<p>I demand that fMRI techniques get a fair herring!<br />
- Bob O&#8217;H</p>
<p>Scream if you love the multiple comparisons problem! AAAAAAAAAAAAAAAAHHHHHHHHHHH!<br />
- Jess</p>
<p>&#8230; compared with how Vul et al. handled a similar topic, this is a party with clowns and flowers<br />
- powrogers</p>
<p>The joke possibilities are endless but I won&#8217;t bother. It&#8217;s like shooting fish in a barrel.<br />
- Anonymous</p>
<p>A common mistake made in discussions of taxonomy is overlooking the issue of whether closely related species taste the same. In this case, you omitted the fact that all of them are great when grilled. With a slice of lemon on the side.<br />
- value_added</p>
<p>HOWEVER a fish that has been caught, killed/gutted, frozen, shipped, sold by auction, shipped again, sold again, taken to a hospital and put in an MRI machine is a dead fish. He ain&#8217;t pining for human faces, he has passed on. This fish is no more. He has ceased to be. He&#8217;s expired and gone to meet his maker. He&#8217;s a stuff. Bereft of life, he rests in filets! If you hadn&#8217;t glued him to his tank he&#8217;d be pushing up the seaweed. Its brainactivity is now history. He&#8217;s out of the pond. He&#8217;s kicked the tank, he&#8217;s shuffled of his mortal coil, run down the river and joined the bleeding choir invisibisble. This is an EX-SALMON!<br />
- Anonymous</p>
<p><strong>Conclusion</strong><br />
I just want to say that it has been great to see the discussion the Salmon has generated in the last few days.  Our hope for this work was that it would call new attention to the multiple comparisons problem.  I think that we can safely say that it has.  Thanks.</p>
<p>~ Craig [Prefrontal].</p>
]]></content:encoded>
			<wfw:commentRss>http://prefrontal.org/blog/2009/09/the-internet-found-the-atlantic-salmon/feed/</wfw:commentRss>
		<slash:comments>5</slash:comments>
		</item>
		<item>
		<title>The Story Behind the Atlantic Salmon</title>
		<link>http://prefrontal.org/blog/2009/09/the-story-behind-the-atlantic-salmon/</link>
		<comments>http://prefrontal.org/blog/2009/09/the-story-behind-the-atlantic-salmon/#comments</comments>
		<pubDate>Fri, 18 Sep 2009 16:13:42 +0000</pubDate>
		<dc:creator>prefrontal</dc:creator>
				<category><![CDATA[CogNeuro]]></category>
		<category><![CDATA[MRI]]></category>
		<category><![CDATA[Miscellany]]></category>

		<guid isPermaLink="false">http://prefrontal.org/blog/?p=617</guid>
		<description><![CDATA[The Atlantic Salmon fMRI poster has garnered a fair amount of attention since its presentation at the Human Brain Mapping conference last June in San Francisco.  So far the reaction from other researchers has been almost unanimously positive.  A sizable number of people stopped by the poster while it was displayed and Rainer [...]]]></description>
			<content:encoded><![CDATA[<p><img src="http://prefrontal.org/blog/wp-content/uploads/2009/09/HBM-Salmon.jpg" alt="HBM-Salmon" title="HBM-Salmon" width="220" height="147" align="right">The Atlantic Salmon fMRI poster has garnered a fair amount of attention since its presentation at the <a href="http://llmsi.humanbrainmapping.org/sanfrancisco2009">Human Brain Mapping conferenc</a>e last June in San Francisco.  So far the reaction from other researchers has been almost unanimously positive.  A sizable number of people stopped by the poster while it was displayed and Rainer Goebel (of <a href="http://www.brainvoyager.com/">BrainVoyager</a> fame) was kind enough to give the fish a shout-out during the closing ceremonies (see photo).  </p>
<p>All in all I am quite pleased that the Salmon seems to be generating a fresh discussion of multiple comparisons correction in neuroimaging.But, how did it all begin?  I mean, really, why would anybody want to scan a fish?  This was one of the top five questions I was asked during the HBM poster session.  It it a story that deserves to be told, and a weblog post is perhaps the ideal medium to tell it.  So, for all readers who are curious, I have written up the story of the Salmon.</p>
<p>The story begins during my first year in graduate school at Dartmouth College.  I was working with <a href="http://psychology.vassar.edu/baird.html">Abigail Baird</a> on fMRI studies investigating the maturation of decision-making and we were developing a large number of new MRI protocols to use with adolescents and adults.  Not wanting to waste valuable magnet time imaging and reimaging a MRI phantom, we instead challenged ourselves to scan the most curious objects we could find at the local grocery store.  </p>
<p>For our first attempt we scanned a pumpkin.  One result of this endeavor can be seen <a href="http://prefrontal.org/wiki/index.php/CogNeuro_Art">here</a>.  This is a pretty standard <a href="http://en.wikipedia.org/wiki/Pumpkin">fruit</a> to scan, as just about every imaging center around the country obtains a T1-weighted image of them in late October.  Still, it was exciting to us.  During the next pilot testing session Abby brought in a <a href="http://en.wikipedia.org/wiki/Cornish_game_hen">Cornish game hen</a> to be scanned.  This really upped the ante, as we had now put a dead bird into the head coil.  When pondering our next step the comment was made: &#8220;we should scan a whole fish&#8221;.  </p>
<p>I picked up the salmon from our local supermarket early on an early Saturday morning in spring of 2005.  The clerk behind the counter was a little shocked to be selling a full-length Atlantic salmon at 6:30 AM, especially when I told her what was about it happen to it.  About an hour later we were in the imaging center with the fish wrapped in plastic and securely placed within the head coil.  We proceeded to test our entire protocol with the salmon in the magnet.  In total, we did an anatomical localizer scan, four functional runs, a T1-weighted anatomical scan, and a diffusion tensor imaging (DTI) scan.</p>
<p>After transferring the data off of the scanner we first took a look at the high resolution anatomical image.  It was simply incredible.  Slice the fish along the sagittal plane and you could see the fish split right down the middle.  Slice the fish coronally and you could see what looked like salmon steaks on the viewer.  By far it was our crowning achievement in terms of ridiculous objects to scan.  Then, our curiosity satisfied, I socked the salmon data away for the next three years.</p>
<p>In early 2008 I was working with my co-adviser George Wolford on a presentation he was giving regarding the multiple comparisons problem in fMRI.  We were discussing false positives in MRI phantom data and I brought up the idea of processing the salmon fMRI data to look for some &#8216;active&#8217; voxels.  I ran the fish data through my SPM processing pipelines and couldn&#8217;t believe what I saw.  Sure, there were some false positives.  Just about any volume with 65,000 voxels is going to have some false positives with uncorrected statistics.  Rather, it was where the false positives occurred that really floored me.  A cluster of three significant voxels were arranged together right along the midline of the salmon&#8217;s brain.  If they would have been anywhere else the salmon would have been just a curious anecdote, but now we had a <i>story</i>.</p>
<p><center><img src="http://prefrontal.org/blog/wp-content/uploads/2009/09/Bennett-Salmon-Figure1.jpg" alt="Bennett-Salmon-Figure1" title="Bennett-Salmon-Figure1" width="300" height="163"></center></p>
<p>George presented the salmon data at our local fMRI methods group, but nothing much happened for a while after that.  George was convinced that we could/should publish the data and that it was an excellent example of the multiple comparisons problem.  I was less convinced, remarking about how silly that would be and how terrible it would be for a young postdoc to become known as &#8216;the fish guy&#8217;.  For the next year we went back and forth about the issue, until one day in January, 2009.  George was out in Los Angeles and came up to UCSB to visit.  Over lunch he said that it was time to &#8216;get the fish out&#8217;.  I relented, and agreed to start writing the paper.</p>
<p>About a week later the HBM conference poster deadline came around and we decided to submit the salmon as an abstract.  We genuinely wanted it to be a part of the conference, but we really doubted that it would be approved.  How right we were.  Through some sources close to the matter I have learned that the salmon poster was indeed rejected by every reviewer who saw the abstract.  Just about everyone thought it was a joke &#8211; some rogue student who was playing a prank on the <a href="http://www.humanbrainmapping.org">OHBM</a>.  It was only when the rejected abstract went before the OHBM Program Committee that it was given approval to stay as part of the conference.  I hear that even that vote was contentious.</p>
<p>While the abstract reviewers were busy rejecting the salmon poster my co-authors and I were diligently writing a full-on salmon manuscript.  The overall outline of the paper had been in our heads for some time and the writing went rather quickly.  By April we had a polished manuscript ready for review and we sent it off to a major neuroimaging journal.  Within a week we heard back that it was being rejected on an editorial basis.  We heard that there were several major discussions within the journal staff regarding whether to even review the piece.  In the end they decided to pass the responsibility, and the trouble, on to another journal.</p>
<p>That brings us to today.  The &#8216;Post-Mortem Atlantic Salmon&#8217; was a strong success at the OHBM conference.  It is also under review at a second major neuroimaging journal.  The more I think about the affair the more I believe that the fish has the chance to impact the field of neuroimaging in a very positive way.  Predefined significance thresholds with a specified cluster extent are a weak control to the problem of false positives in imaging data.  Statisticians and methods researchers have argued about the need for multiple comparisons correction for some time.  In just one figure the salmon data illustrates exactly why we need stronger controls for the false positive problem in fMRI.  I hope it finds a good home in an open-minded journal.</p>
<p>You can find a copy of the &#8216;Post-Mortem Atlantic Salmon&#8217; poster at this link:</p>
<p>http://prefrontal.org/blog/2009/06/human-brain-mapping-2009-presentations/</p>
]]></content:encoded>
			<wfw:commentRss>http://prefrontal.org/blog/2009/09/the-story-behind-the-atlantic-salmon/feed/</wfw:commentRss>
		<slash:comments>30</slash:comments>
		</item>
		<item>
		<title>Upcoming Talk: Bay Area Memory Meeting</title>
		<link>http://prefrontal.org/blog/2009/08/upcoming-talk-bay-area-memory-meeting/</link>
		<comments>http://prefrontal.org/blog/2009/08/upcoming-talk-bay-area-memory-meeting/#comments</comments>
		<pubDate>Tue, 18 Aug 2009 08:39:38 +0000</pubDate>
		<dc:creator>prefrontal</dc:creator>
				<category><![CDATA[CogNeuro]]></category>
		<category><![CDATA[Meta]]></category>
		<category><![CDATA[Miscellany]]></category>

		<guid isPermaLink="false">http://prefrontal.org/blog/?p=733</guid>
		<description><![CDATA[I&#8217;ll be giving a short presentation on individual differences and fMRI experimental design at the upcoming Bay Area Memory Meeting (BAMM) on Monday, August 24th.  If you are around Genentech Hall at the UCSF Mission Bay campus and have some time available in the late afternoon then you should definitely swing by!
]]></description>
			<content:encoded><![CDATA[<p><img src="http://prefrontal.org/blog/wp-content/uploads/2009/08/BAMM_logo.png" alt="BAMM_logo" title="BAMM_logo" width="130" height="125" align='left'>I&#8217;ll be giving a short presentation on individual differences and fMRI experimental design at the upcoming Bay Area Memory Meeting (BAMM) on Monday, August 24th.  If you are around Genentech Hall at the UCSF Mission Bay campus and have some time available in the late afternoon then you should definitely swing by!</p>
]]></content:encoded>
			<wfw:commentRss>http://prefrontal.org/blog/2009/08/upcoming-talk-bay-area-memory-meeting/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>The Middle Ground in Multiple Comparisons Correction</title>
		<link>http://prefrontal.org/blog/2009/08/the-middle-ground-in-multiple-comparisons-correction/</link>
		<comments>http://prefrontal.org/blog/2009/08/the-middle-ground-in-multiple-comparisons-correction/#comments</comments>
		<pubDate>Fri, 07 Aug 2009 08:51:58 +0000</pubDate>
		<dc:creator>prefrontal</dc:creator>
				<category><![CDATA[MRI]]></category>
		<category><![CDATA[Statistics]]></category>

		<guid isPermaLink="false">http://prefrontal.org/blog/?p=676</guid>
		<description><![CDATA[I got a note last week from a longtime colleague seeking advice on some reviewer comments of their latest paper.  In their remarks the reviewer requested that the authors revert the corrected statistical threshold back to an uncorrected level of p < 0.001.  The authors were left scratching their heads, wondering how they [...]]]></description>
			<content:encoded><![CDATA[<p>I got a note last week from a longtime colleague seeking advice on some reviewer comments of their latest paper.  In their remarks the reviewer requested that the authors revert the corrected statistical threshold back to an uncorrected level of p < 0.001.  The authors were left scratching their heads, wondering how they were going to justify their use of a standard statistical technique.  I was a bit shocked to hear that the reviewer was so opposed to multiple comparisons correction, but not entirely surprised.</p>
<p>As multiple comparisons correction has become increasingly standard there seems to be a subtle push against it from some in the neuroimaging field.  We experienced the push ourselves when we got our first rejection on the <a href="http://prefrontal.org/blog/2009/06/human-brain-mapping-2009-presentations/">fish paper</a>.  The reviewer said that there needed to be a more in-depth discussion of avoiding both <a href="http://en.wikipedia.org/wiki/Type_I_and_type_II_errors">Type I</a> and <a href="http://en.wikipedia.org/wiki/Type_I_and_type_II_errors">Type II</a> error in the paper.  They added that always controlling for multiple comparisons correction could potentially lead one to ignore legitimate results that fall short of the corrected statistic value.</p>
<p>So, as a researcher, what do you value more?  Do you want to ensure that false positives are controlled, or do you want to minimize false negatives?  Do you side with roughly 80% of the field and correct for multiple comparisons, guarding against Type I error, or do you side with the other 20% of the field and use uncorrected stats, minimizing the chance of Type II error? [1]</p>
<p>My position: <strong>why not do both</strong>?</p>
<p>For the last two years I have been an advocate of reporting both uncorrected and corrected statistics in results tables and figures.  It just makes sense.  An example of this approach is posted below, shown as corrected results overlaid on top of uncorrected results for an inflated cortical surface.  Regions surviving an uncorrected threshold of p < 0.001 with an 8 voxel extent threshold are shown in blue and the subset of regions surviving a FDR-corrected threshold of p(FDR) = 0.05 with an 8 voxel extent threshold are shown in yellow/orange.</p>
<p><center><img src="http://prefrontal.org/blog/wp-content/uploads/2009/08/Bennett-HybridStats.jpg" alt="Bennett-HybridStats" title="Bennett-HybridStats"></center></p>
<p>At a glance you can see what regions survive multiple comparisons correction and which only survive an uncorrected threshold.  Most interesting to me is that, for this task, there <em>are</em> some regions that survive an uncorrected threshold but do not survive the FDR threshold.  For example, the left and right occipital pole in both hemispheres is only significant for uncorrected statistic values.  </p>
<p>In writing the paper for the above data we have refrained from discussing the regions that do not survive multiple comparisons correction.  Still, what I consider elegant is that I can give the reader additional information about the data from which they can make their own judgements.  Perhaps a vision researcher would be interested in the uncorrected result and could run a more powerful study to investigate this effect, who knows.  The point is that our readers, by and large, are very smart people.  I don&#8217;t mind giving them all the information I can for them to draw their own conclusions and insights from.  I also think that it represents a very good balance between those who support multiple comparisons correction and those who, as my colleague encountered, insist on uncorrected thresholds.</p>
<p>[1] &#8211; Stats gathered during a forthcoming literature review in the salmon paper.</p>
]]></content:encoded>
			<wfw:commentRss>http://prefrontal.org/blog/2009/08/the-middle-ground-in-multiple-comparisons-correction/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Neuroimaging Statistics Workshop Videos</title>
		<link>http://prefrontal.org/blog/2009/08/neuroimaging-statistics-workshop-videos/</link>
		<comments>http://prefrontal.org/blog/2009/08/neuroimaging-statistics-workshop-videos/#comments</comments>
		<pubDate>Fri, 07 Aug 2009 08:06:10 +0000</pubDate>
		<dc:creator>prefrontal</dc:creator>
				<category><![CDATA[CogNeuro]]></category>
		<category><![CDATA[MRI]]></category>
		<category><![CDATA[Statistics]]></category>

		<guid isPermaLink="false">http://prefrontal.org/blog/?p=697</guid>
		<description><![CDATA[The Columbia University Department of Statistics hosted a workshop last month titled &#8220;Estimating Effects and Correlations in Neuroimaging Data&#8221;.  Some great folks stopped by to give talks, including Ed Vul, Nikolas Kriegeskorte, Tor Wager, and Andrew Gelman.  They recorded everything into Quicktime movies for those of us who couldn&#8217;t stop by &#8211; click [...]]]></description>
			<content:encoded><![CDATA[<p>The Columbia University Department of Statistics hosted a workshop last month titled &#8220;Estimating Effects and Correlations in Neuroimaging Data&#8221;.  Some great folks stopped by to give talks, including Ed Vul, Nikolas Kriegeskorte, Tor Wager, and Andrew Gelman.  They recorded everything into Quicktime movies for those of us who couldn&#8217;t stop by &#8211; click the link below and check it out:</p>
<p><a href="http://www.stat.columbia.edu/~martin/Workshop/ECWorkshop.html">http://www.stat.columbia.edu/~martin/Workshop/ECWorkshop.html</a><br />
&nbsp;</p>
]]></content:encoded>
			<wfw:commentRss>http://prefrontal.org/blog/2009/08/neuroimaging-statistics-workshop-videos/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Quote of the Week – Coggan</title>
		<link>http://prefrontal.org/blog/2009/07/quote-of-the-week-frederick-donald-coggan/</link>
		<comments>http://prefrontal.org/blog/2009/07/quote-of-the-week-frederick-donald-coggan/#comments</comments>
		<pubDate>Thu, 30 Jul 2009 01:41:24 +0000</pubDate>
		<dc:creator>prefrontal</dc:creator>
				<category><![CDATA[Quotes]]></category>

		<guid isPermaLink="false">http://prefrontal.org/blog/?p=598</guid>
		<description><![CDATA[&#8220;My ignorance of science is such that if anyone mentioned copper nitrate I should think he was talking about policemen&#8217;s overtime.&#8221; &#8211; Frederick Donald Coggan
]]></description>
			<content:encoded><![CDATA[<p>&#8220;My ignorance of science is such that if anyone mentioned copper nitrate I should think he was talking about policemen&#8217;s overtime.&#8221; &#8211; <a href="http://en.wikipedia.org/wiki/Donald_Coggan">Frederick Donald Coggan</a></p>
]]></content:encoded>
			<wfw:commentRss>http://prefrontal.org/blog/2009/07/quote-of-the-week-frederick-donald-coggan/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
	</channel>
</rss>
