<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	xmlns:georss="http://www.georss.org/georss" xmlns:geo="http://www.w3.org/2003/01/geo/wgs84_pos#" xmlns:media="http://search.yahoo.com/mrss/"
	>

<channel>
	<title>The Replication Network</title>
	<atom:link href="https://replicationnetwork.com/feed/" rel="self" type="application/rss+xml" />
	<link>https://replicationnetwork.com</link>
	<description>Furthering the Practice of Replication in Economics</description>
	<lastBuildDate>Thu, 19 Feb 2026 20:32:24 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>
	hourly	</sy:updatePeriod>
	<sy:updateFrequency>
	1	</sy:updateFrequency>
	<generator>http://wordpress.com/</generator>
<site xmlns="com-wordpress:feed-additions:1">82485922</site><cloud domain='replicationnetwork.com' port='80' path='/?rsscloud=notify' registerProcedure='' protocol='http-post' />
<image>
		<url>https://secure.gravatar.com/blavatar/a171c57818c1d2751778995eb3a911b63206ef9c8dbb4eaf96f6bc32e5112bc1?s=96&#038;d=https%3A%2F%2Fs0.wp.com%2Fi%2Fbuttonw-com.png</url>
		<title>The Replication Network</title>
		<link>https://replicationnetwork.com</link>
	</image>
	<atom:link rel="search" type="application/opensearchdescription+xml" href="https://replicationnetwork.com/osd.xml" title="The Replication Network" />
	<atom:link rel='hub' href='https://replicationnetwork.com/?pushpress=hub'/>
	<item>
		<title>AoI*: “Computational Reproducibility and Robustness of Empirical Economics and Political Science Research&#8221;</title>
		<link>https://replicationnetwork.com/2026/02/20/aoi-computational-reproducibility-and-robustness-of-empirical-economics-and-political-science-research/</link>
					<comments>https://replicationnetwork.com/2026/02/20/aoi-computational-reproducibility-and-robustness-of-empirical-economics-and-political-science-research/#respond</comments>
		
		<dc:creator><![CDATA[replicationnetwork]]></dc:creator>
		<pubDate>Thu, 19 Feb 2026 20:32:24 +0000</pubDate>
				<category><![CDATA[GUEST BLOGS]]></category>
		<category><![CDATA[Abel Brodeur]]></category>
		<category><![CDATA[economics]]></category>
		<category><![CDATA[Nature]]></category>
		<category><![CDATA[political science]]></category>
		<category><![CDATA[Reproducibility]]></category>
		<category><![CDATA[robustness]]></category>
		<guid isPermaLink="false">http://replicationnetwork.com/?p=7965</guid>

					<description><![CDATA[[*AoI = “Articles of Interest” is a feature of TRN where we report abstracts of recent research related to replication and research integrity.] ABSTRACT (taken from the article) &#8220;This systematic and large-scale reproduction effort tests the reproducibility and robustness of economics...]]></description>
										<content:encoded><![CDATA[
<p class="has-black-color has-text-color has-link-color wp-elements-47b756873aa88b4198acacfcb6670665 wp-block-paragraph" style="font-size:21px;line-height:1.4"><em>[*AoI = “Articles of Interest” is a feature of TRN where we report abstracts of recent research related to replication and research integrity.]</em></p>



<p class="has-black-color has-text-color has-link-color wp-elements-acf91bf775e1a4217c35d8e5864922ed wp-block-paragraph" style="font-size:24px;line-height:1.4"><strong>ABSTRACT (taken from <em><a href="https://research.birmingham.ac.uk/en/publications/computational-reproducibility-and-robustness-of-empirical-economi/" target="_blank" rel="noreferrer noopener">the article</a></em>)</strong></p>



<h4 class="wp-block-heading has-black-color has-text-color has-link-color wp-elements-c09acd083cb2d7a1efbfffc442fb45f2" style="font-size:21px;line-height:1.4">&#8220;This systematic and large-scale reproduction effort tests the reproducibility and robustness of economics and political science. We reproduced original analyses and conducted robustness checks of 110 articles recently published in leading economics and political science journals (all of which have mandatory data and code sharing policies). We found that over 85% of published claims were computationally reproducible. In robustness checks, our re-analyses lead to 72% of statistically significant estimates to remain significant and in the same direction, and the median reproduced effect size is (nearly) the same as the originally published effect size (that is, 99% of the published effect size). Additionally, six independent research teams examined 12 prespecified hypotheses about determinants of robustness. Research teams with more experience found lower levels of robustness, but robustness correlated with neither author characteristics nor data availability.&#8221;</h4>



<p class="has-black-color has-text-color has-link-color wp-elements-329c2721a4a54b14dc198c81f290db9f wp-block-paragraph" style="font-size:21px;line-height:1.4"><strong>REFERENCE</strong></p>



<p class="has-black-color has-text-color has-link-color wp-elements-3c6082cbcebd904f10ff2f8dc9e373ff wp-block-paragraph" style="font-size:21px;line-height:1.4"><a href="https://research.birmingham.ac.uk/en/publications/computational-reproducibility-and-robustness-of-empirical-economi/" target="_blank" rel="noreferrer noopener"><span style="text-decoration: underline">Brodeur, A., Cook, N., Mikola, D., Fiala, L., &amp; Heyes, A. (2026). Computational Reproducibility and Robustness of Empirical Economics and Political Science Research. <em>Nature</em></span></a><a href="https://doi.org/10.1038/s41562-025-02129-1" target="_blank" rel="noreferrer noopener"><span style="text-decoration: underline">.</span></a></p>



<p class="wp-block-paragraph"></p>
]]></content:encoded>
					
					<wfw:commentRss>https://replicationnetwork.com/2026/02/20/aoi-computational-reproducibility-and-robustness-of-empirical-economics-and-political-science-research/feed/</wfw:commentRss>
			<slash:comments>0</slash:comments>
		
		
		<post-id xmlns="com-wordpress:feed-additions:1">7965</post-id>
		<media:content url="https://2.gravatar.com/avatar/e64ee14e2ee86f72b7681f00e27a78273ca0707a414bfd21f792423c54038664?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">replicationnetwork</media:title>
		</media:content>
	</item>
		<item>
		<title>REED: Another Reason to Prefer Random Effects Over Fixed Effects/UWLS Meta-Analysis</title>
		<link>https://replicationnetwork.com/2026/01/20/another-reason-to-prefer-random-effects-over-fixed-effects-uwls-meta-analysis/</link>
					<comments>https://replicationnetwork.com/2026/01/20/another-reason-to-prefer-random-effects-over-fixed-effects-uwls-meta-analysis/#respond</comments>
		
		<dc:creator><![CDATA[replicationnetwork]]></dc:creator>
		<pubDate>Mon, 19 Jan 2026 20:10:19 +0000</pubDate>
				<category><![CDATA[GUEST BLOGS]]></category>
		<category><![CDATA["Few studies" problem]]></category>
		<category><![CDATA[Fixed Effects/UWLS]]></category>
		<category><![CDATA[Heterogeneity]]></category>
		<category><![CDATA[Meta-analysis]]></category>
		<category><![CDATA[Random Effects]]></category>
		<category><![CDATA[Shiny app]]></category>
		<category><![CDATA[Stanley]]></category>
		<guid isPermaLink="false">http://replicationnetwork.com/?p=7936</guid>

					<description><![CDATA[NOTE: This blog is a repost of a blog that was previously published at the MAER-Net blogsite (see here) Introduction Random Effects (RE) versus Fixed Effects (FE) has a long and active debate history. More recently, a “Knapp–Hartung–like” version of...]]></description>
										<content:encoded><![CDATA[
<p class="has-black-color has-text-color has-link-color wp-elements-905364967ac77987bbe455ae8243d91e wp-block-paragraph" id="viewer-5x1dm1037" style="font-size:21px;line-height:1.4"><em>NOTE: This blog is a repost of a blog that was previously published at the MAER-Net blogsite </em>(<a href="https://www.maer-net.org/post/another-reason-to-prefer-random-effects-over-fixed-effects-uwls" target="_blank" rel="noreferrer noopener"><strong><em><span style="text-decoration: underline">see here</span></em></strong></a>)</p>



<p class="has-black-color has-text-color has-link-color wp-elements-1009b08eb63cfaebd691c8216c57bfc1 wp-block-paragraph" id="viewer-5x1dm1037" style="font-size:21px;line-height:1.4"><strong>Introduction</strong></p>



<p class="has-black-color has-text-color has-link-color wp-elements-a76c6cbe4cfb88856a55c264138e9e92 wp-block-paragraph" id="viewer-dts8b1039" style="font-size:21px;line-height:1.4">Random Effects (RE) versus Fixed Effects (FE) has a long and active debate history. More recently, a “Knapp–Hartung–like” version of FE—Unrestricted Weighted Least Squares (UWLS)—has entered the fray (Stanley &amp; Doucouliagos, <a href="https://onlinelibrary.wiley.com/doi/10.1002/sim.6481" target="_blank" rel="noreferrer noopener"><u><strong><em>2015</em></strong></u></a>, <a href="https://onlinelibrary.wiley.com/doi/10.1002/jrsm.1211" target="_blank" rel="noreferrer noopener"><u><strong><em>2016</em></strong></u></a>). UWLS is simply conventional weighted least squares using inverse sampling variance weights. It produces identical coefficient estimates to FE, albeit with different standard errors.</p>



<p class="has-black-color has-text-color has-link-color wp-elements-f9239e3033697979180d9d069c68b94f wp-block-paragraph" id="viewer-5c77k1045" style="font-size:21px;line-height:1.4">Among these approaches, RE is by far the most widely used. TABLE 1 is taken from a recently published study in <em>Research Synthesis Methods </em>that examined 1000 meta-analyses across 10 disciplines (<a href="https://www.cambridge.org/core/journals/research-synthesis-methods/article/what-can-we-learn-from-1000-metaanalyses-across-10-different-disciplines/538E64A39F4F151B387B9DEDF4531840" target="_blank" rel="noreferrer noopener"><u><strong><em>Wu et al., 2025</em></strong></u></a>). In each and every discipline, RE was more widely employed than FE/UWLS. Moreover, when meta-analysts relied on only one estimator, that estimator was overwhelmingly RE.</p>



<figure class="wp-block-image"><img src="https://replicationnetwork.com/wp-content/uploads/2026/01/a5e68-f1165a_f76063a2129d459d98275800ca7a86f5mv2.png" alt="" /></figure>



<p class="has-black-color has-text-color has-link-color wp-elements-d8d993fc50d3c6d82c48f30efd35aea1 wp-block-paragraph" id="viewer-yv6bl1052" style="font-size:21px;line-height:1.4">Despite this widespread preference, the debate over RE versus FE/UWLS remains active, especially in contexts involving publication selection bias. In this blog, I highlight an additional reason to favor RE—one that has received little attention. The key idea is that a property of FE/UWLS that is often viewed as an advantage can, under realistic conditions, become a disadvantage.</p>



<p class="has-black-color has-text-color has-link-color wp-elements-6e880f10355f8ee0d5922fba9f7ec5fb wp-block-paragraph" id="viewer-4kgo91054" style="font-size:21px;line-height:1.4"><strong>The Context</strong></p>



<p class="has-black-color has-text-color has-link-color wp-elements-bc07c4762e772669de93c44f7047b120 wp-block-paragraph" id="viewer-n2fc41056" style="font-size:21px;line-height:1.4">There are multiple arguments that come into play in the RE versus FE/UWLS debate. Two are listed below.</p>



<p class="has-black-color has-text-color has-link-color wp-elements-fc37eaf7a933cba15a620fc3d10cc8b1 wp-block-paragraph" id="viewer-px2ql1058" style="font-size:21px;line-height:1.4">RE is generally a more realistic framework<strong>. </strong>In economics and the social sciences, it is typically more plausible to assume that true effects vary across studies—as the RE model allows—than to assume a single common effect, as in FE. This likely explains why RE is the estimator of choice in most applied meta-analyses. However it needs to be noted that realism does not guarantee better performance. As <a href="https://onlinelibrary.wiley.com/doi/10.1002/jrsm.1631" target="_blank" rel="noreferrer noopener"><u><strong><em>Stanley and Doucouliagos (2023)</em></strong></u></a> illustrate, an empirically incorrect meta-analytic model can sometimes yield superior statistical results. Thus, even if the RE model more accurately reflects the data-generating process, that is not a decisive argument in its favor.</p>



<p class="has-black-color has-text-color has-link-color wp-elements-ee81affdbd93c18bae2daa682f62681b wp-block-paragraph" id="viewer-52w781064" style="font-size:21px;line-height:1.4">Recent evidence favors UWLS on goodness-of-fit grounds. <a href="https://www.jclinepi.com/article/S0895-4356(23)00047-1/abstract" target="_blank" rel="noreferrer noopener"><u><strong><em>Stanley et al. (2023)</em></strong></u></a> report that UWLS provides a better fit than RE for 67,308 meta-analyses from the Cochrane Database of Systematic Reviews (CDSR). However, in a <a href="https://econpapers.repec.org/paper/cbteconwp/25_2f13.htm" target="_blank" rel="noreferrer noopener"><u><strong><em>separate paper</em></strong></u></a>, Sanghyun Hong and I challenge that conclusion.</p>



<p class="has-black-color has-text-color has-link-color wp-elements-6a9aab0d51c9b6080705c688f53f1df6 wp-block-paragraph" id="viewer-8rz9j1071" style="font-size:21px;line-height:1.4">These two points frame the broader discussion but are not my focus here. Instead, I examine a different issue—one that arises from the way FE/UWLS assigns weights.</p>



<p class="has-black-color has-text-color has-link-color wp-elements-f6d3410b13f150d0815b931aea2e51c0 wp-block-paragraph" id="viewer-okac31073" style="font-size:21px;line-height:1.4"><strong>Why FE/UWLS is thought to be better in the presence of publication selection bias</strong></p>



<p class="has-black-color has-text-color has-link-color wp-elements-b23cdfb175a85ccee175e67d29926486 wp-block-paragraph" id="viewer-2qzjm1075" style="font-size:21px;line-height:1.4">A frequently asserted argument in favor of FE/UWLS is that it performs better than RE when estimates are distorted by publication selection bias.</p>



<p class="has-black-color has-text-color has-link-color wp-elements-2489394e62ea6bd9d0670dea570d60f8 wp-block-paragraph" id="viewer-0cxtu1077" style="font-size:21px;line-height:1.4">The logic is simple. Studies with large standard errors are most vulnerable to publication bias because they must report large, estimated effects to obtain statistical significance. More precise studies, with small standard errors, can be published even when their estimated effects are modest. As a result, small-SE studies are less distorted by selection and more representative of the underlying population of true effects. Because FE/UWLS assigns most of its weight to these highly precise—and thus less biased—studies, it is expected to produce pooled estimates that are both less biased and more efficient than RE.</p>



<p class="has-black-color has-text-color has-link-color wp-elements-a68923d954e9f519e2a347602824fb3a wp-block-paragraph" id="viewer-nxazp1079" style="font-size:21px;line-height:1.4">Previous simulation work, including my own, supports this conclusion. In <strong><em><a href="https://onlinelibrary.wiley.com/doi/10.1002/jrsm.1467" target="_blank" rel="noreferrer noopener"><u>Hong &amp; Reed, (2020</u></a>)</em></strong> we found that WAAP—a method that, like FE/UWLS, prioritizes estimates with small standard errors—consistently outperformed RE on both bias and RMSE. Although that study did not directly compare FE/UWLS against RE, the underlying logic is the same: giving more weight to precise studies can mitigate the positive bias induced by publication selection.</p>



<p class="has-black-color has-text-color has-link-color wp-elements-75b7afe27105ac9de3a2a0d021b553ba wp-block-paragraph" id="viewer-nfz3n1083" style="font-size:21px;line-height:1.4"><strong>Is it possible that previous simulations have it wrong?</strong></p>



<p class="has-black-color has-text-color has-link-color wp-elements-5eca0abb8e7f22fb8297f070b5756830 wp-block-paragraph" id="viewer-7k6cg1085" style="font-size:21px;line-height:1.4">Previous simulations may all be making a mistake. When generating primary study estimates, simulations typically assume that true effects are uncorrelated with the size of the standard errors. While that assumption may be valid for the population, it may not be warranted for a given sample. &nbsp;</p>



<p class="has-black-color has-text-color has-link-color wp-elements-b1f039adacbaf523ee764ba4046469ae wp-block-paragraph" id="viewer-bm8ut1087" style="font-size:21px;line-height:1.4">As simulations are typically constructed, heterogeneity is modeled by each replication drawing a new set of true effects from a population distribution. Every simulated meta-analysis therefore reflects a different random sample of true effects. Any chance correlation between true effects and standard errors in one replication will tend to get cancelled out when results are averaged over 1,000 or 10,000 simulations. In other words, the simulation design structurally forces the correlation between true effects and standard errors to be zero.</p>



<p class="has-black-color has-text-color has-link-color wp-elements-c70f9e913824cd42a3ff73babd4d76eb wp-block-paragraph" id="viewer-l5nqa1089" style="font-size:21px;line-height:1.4">But in any given meta-analysis, we observe only a single realization of true effects—one draw, not thousands. In that realized sample, the correlation between true effects and standard errors need not be zero. When such correlations occur, estimators that place substantial weight on a small number of highly precise studies can yield realized estimates that differ markedly from what unconditional simulation averages would suggest.</p>



<p class="has-black-color has-text-color has-link-color wp-elements-bff4e14405f0d022a79bf7a03e22c86a wp-block-paragraph" id="viewer-ytalr1091" style="font-size:21px;line-height:1.4"><strong>The problem with giving large weights to a few studies</strong></p>



<p class="has-black-color has-text-color has-link-color wp-elements-0c7180594c3b04b01ab5d0e6e32d573a wp-block-paragraph" id="viewer-e39c41093" style="font-size:21px;line-height:1.4">FIGURE 1 illustrates how estimators that place heavy weight on a small number of highly precise studies can yield misleading results.. The population of true effects is centered on β₀, but the studies with the smallest standard errors (indicated by the vertical lines) happen, in this particular sample, to lie well above β₀. If an estimator such as FE/UWLS heavily weights these three studies, they will exert disproportionate influence, pulling the pooled estimate upward.</p>



<p class="has-text-align-center has-black-color has-text-color has-link-color wp-elements-7b86d4f4b93c4f33ac90262c2ae01eba wp-block-paragraph" id="viewer-s43mm1095" style="font-size:21px;line-height:1.4"><strong>FIGURE 1: Distribution of true effects</strong></p>



<figure class="wp-block-image"><img src="https://replicationnetwork.com/wp-content/uploads/2026/01/73d31-f1165a_42372b132584402a97af66cc80d9c423mv2.png" alt="" /></figure>



<p class="has-black-color has-text-color has-link-color wp-elements-df084a0ec83484b94e4a9ed8236d8cd7 wp-block-paragraph" id="viewer-l9vca47809" style="font-size:21px;line-height:1.4">How often does a small number of studies dominate the overall estimate? In my experience of working with economics and social science data, quite often. TABLE 2 summarizes five recent meta-analyses in which I have been involved. Column 2 reports the number of studies; column 3 the percentage of total FE/UWLS weight assigned to the top three studies; and column 4 the corresponding I² values.</p>



<p class="has-text-align-center has-black-color has-text-color has-link-color wp-elements-32687c9363f93116205480471770f0b1 wp-block-paragraph" id="viewer-mlhz41100" style="font-size:21px;line-height:1.4"><strong>TABLE 2: Weights given to top 3 studies in selected meta-analyses</strong></p>


<div class="wp-block-image">
<figure class="aligncenter size-large is-resized"><a href="https://replicationnetwork.com/wp-content/uploads/2026/01/image.png"><img width="961" height="448" data-attachment-id="7941" data-permalink="https://replicationnetwork.com/2026/01/20/another-reason-to-prefer-random-effects-over-fixed-effects-uwls-meta-analysis/image-121/" data-orig-file="https://replicationnetwork.com/wp-content/uploads/2026/01/image.png" data-orig-size="961,448" data-comments-opened="1" data-image-meta="{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}" data-image-title="image" data-image-description="" data-image-caption="" data-medium-file="https://replicationnetwork.com/wp-content/uploads/2026/01/image.png?w=300" data-large-file="https://replicationnetwork.com/wp-content/uploads/2026/01/image.png?w=604" src="https://replicationnetwork.com/wp-content/uploads/2026/01/image.png?w=961" alt="" class="wp-image-7941" style="aspect-ratio:2.1451442180471023;width:727px;height:auto" srcset="https://replicationnetwork.com/wp-content/uploads/2026/01/image.png 961w, https://replicationnetwork.com/wp-content/uploads/2026/01/image.png?w=150 150w, https://replicationnetwork.com/wp-content/uploads/2026/01/image.png?w=300 300w, https://replicationnetwork.com/wp-content/uploads/2026/01/image.png?w=768 768w" sizes="(max-width: 961px) 100vw, 961px" /></a></figure>
</div>


<p class="has-black-color has-text-color has-link-color wp-elements-1ab9a82d88f7094e33c0ca2e2e8e747c wp-block-paragraph" id="viewer-3iqr41191" style="font-size:21px;line-height:1.4">Across these meta-analyses, the top three studies receive anywhere from about 50% to over 90% of the total weight. <a href="https://www.sciencedirect.com/science/article/pii/S0167629619300141?via%3Dihub" target="_blank" rel="noreferrer noopener"><u><strong><em>Xu et al. (2020)</em></strong></u></a>, for instance, contains 470 studies, yet just three of them account for more than two-thirds of the FE/UWLS weight. That concentration alone should raise concern: those few studies may not be representative of the broader distribution of true effects.</p>



<p class="has-black-color has-text-color has-link-color wp-elements-74ea7bfe8d0ee9c274046bb97dc87877 wp-block-paragraph" id="viewer-auwpr1197" style="font-size:21px;line-height:1.4">The problem becomes more serious when heterogeneity is high. The I² values for these studies range from 89.9% to 98.8%, which is typical in economics and the social sciences. Such values indicate that the true-effect distribution is extremely wide relative to sampling error. Under these conditions, it is entirely plausible that three randomly chosen studies (those with the smallest standard errors) could tilt the pooled estimate away from the population mean as in FIGURE 1. The concern with inverse sampling variance weighting in the presence of strong heterogeneity has long been noted (<a href="https://onlinelibrary.wiley.com/doi/10.1002/(SICI)1097-0258(19980430)17:8%3C841::AID-SIM781%3E3.0.CO;2-D" target="_blank" rel="noreferrer noopener"><u><strong><em>Hardy &amp; Thompson, 1998</em></strong></u></a>; <a href="https://journals.sagepub.com/doi/10.1177/016327870102400203" target="_blank" rel="noreferrer noopener"><u><strong><em>Song et al., 2001</em></strong></u></a>; <a href="https://onlinelibrary.wiley.com/doi/10.1002/sim.4488" target="_blank" rel="noreferrer noopener"><u><strong><em>Moreno et al., 2012</em></strong></u></a>).</p>



<p class="has-black-color has-text-color has-link-color wp-elements-b1dcb07bb382ca865ccc5e9c99197656 wp-block-paragraph" id="viewer-uxjl41205" style="font-size:21px;line-height:1.4"><strong>Is this really a problem? Hard to say</strong></p>



<p class="has-black-color has-text-color has-link-color wp-elements-3abe87b68ab8fce393e5e3c7a00b0a43 wp-block-paragraph" id="viewer-xsrww1207" style="font-size:21px;line-height:1.4">The combination of (i) a small number of studies receiving a large share of the total weight and (ii) a highly dispersed distribution of true effects does not, by itself, guarantee a problem. These conditions only matter if the heavily weighted studies are unrepresentative of the underlying population of true effects.</p>



<p class="has-black-color has-text-color has-link-color wp-elements-6f01d59fbb6c19b9409e09e1af60948e wp-block-paragraph" id="viewer-dsfxe1209" style="font-size:21px;line-height:1.4">The challenge is that this is extremely difficult to verify. We never observe the pre–publication-selection distribution of true effects. If we could, we could check directly whether the studies with the smallest standard errors differ meaningfully from the population. But in real meta-analysis, we only observe the post-selection distribution—the subset of results that survive publication filters. Consequently, we cannot distinguish whether unusual patterns arise because a few estimates are unrepresentatively drawn or because publication selection has distorted the observed distribution.</p>



<p class="has-black-color has-text-color has-link-color wp-elements-1d0014bd5eb72e8ff85a7e5282758028 wp-block-paragraph" id="viewer-racrt1211" style="font-size:21px;line-height:1.4">A simple example illustrates the problem. Suppose a researcher runs an Egger regression and obtains a positive coefficient on the standard error. Is this evidence of positive publication bias? Possibly. But it could equally reflect a situation where the most precise studies happen, by chance, to lie below the population mean, inducing a positive relationship between estimated effects and standard errors. The two explanations—publication bias versus sample unrepresentativeness—are observationally equivalent.</p>



<p class="has-black-color has-text-color has-link-color wp-elements-e85a1e4c818b22c137efed5d9be217ff wp-block-paragraph" id="viewer-okng61213" style="font-size:21px;line-height:1.4">There is one case where the two explanations could be distinguished. If a researcher were confident of the sign of publication selection bias (positive/negative), then the coefficient on the standard error variable in an Egger regression should have the same sign. If the actual estimated coefficient has the opposite sign, this would be consistent with the “small standard error estimates are unrepresentative” hypothesis. Even then, such a finding could only establish existence of the problem, not extent. The problem could be lurking in many meta-analyses, but not identifiable either because the bias was in the same direction as publication selection, or because publication selection masked its existence.</p>



<p class="has-black-color has-text-color has-link-color wp-elements-84b7cdfcd047850ccef3cda9dbad7f56 wp-block-paragraph" id="viewer-93w8c1215" style="font-size:21px;line-height:1.4"><strong>How to show this in a simulation?</strong></p>



<p class="has-black-color has-text-color has-link-color wp-elements-e19b0c307aadfc28c81c843a66cfe801 wp-block-paragraph" id="viewer-6br1n1217" style="font-size:21px;line-height:1.4">The challenge of illustrating this with a simulation is that this is a sample problem, arising from a one-off, random draw of observations taken from a distribution of true effects. A simulation of repeated, random draws would have the three most heavily weighted studies sometimes having true effects below the mean, and sometimes having true effects above the mean. Averaged over 1000 or more simulations, any sample correlations between standard errors and true effects would get washed out.</p>



<p class="has-black-color has-text-color has-link-color wp-elements-e698a2c2804ebe693d7ce4a908f46671 wp-block-paragraph" id="viewer-50a1u1219" style="font-size:21px;line-height:1.4">To address this problem, I impose a correlation between true effects and standard errors at the population level. This ensures that the simulated samples reflect the correlation between true effects and standard error that form the core of the argument above. The setup is therefore an imperfect but useful analogy for illustrating the dependency issue.</p>



<p class="has-black-color has-text-color has-link-color wp-elements-2d112ce7f1a9bcf130ea872fadfd7fda wp-block-paragraph" id="viewer-fhk0i1221" style="font-size:21px;line-height:1.4">My simulation generates 100 meta-analyses, each containing 100 primary studies before any publication selection occurs. For every primary study, I draw a unique true effect from a normal distribution and assign a random error term with its own standard deviation. This ensures that some studies produce highly precise estimates, while others are much noisier.</p>



<p class="has-black-color has-text-color has-link-color wp-elements-08fc8c3f9191aaf35c96d7f7ce66a927 wp-block-paragraph" id="viewer-9ehw81223" style="font-size:21px;line-height:1.4">I examine two versions of this setup.</p>



<p class="has-black-color has-text-color has-link-color wp-elements-2c156d098426f8731127c8e08ecf5805 wp-block-paragraph" id="viewer-65ncy1225" style="font-size:21px;line-height:1.4">1) Without publication selection bias: all study estimates enter the meta-analysis.</p>



<p class="has-black-color has-text-color has-link-color wp-elements-27b36f7178845f14958bcbcdb44900dc wp-block-paragraph" id="viewer-wwxyx1229" style="font-size:21px;line-height:1.4">2) With publication selection bias: only statistically significant estimates are “published” and included.</p>



<p class="has-black-color has-text-color has-link-color wp-elements-24e68dc329c8155b8e62c61d86b9e12f wp-block-paragraph" id="viewer-li6l11233" style="font-size:21px;line-height:1.4">This simple design captures the key features of applied meta-analysis—heterogeneity, varying precision, and selective reporting—and allows us to see clearly when inverse sampling variance weighting makes things better, and when it makes things worse.</p>



<p class="has-black-color has-text-color has-link-color wp-elements-95644e1eb38c8f4118d8af79c16c14cb wp-block-paragraph" id="viewer-u4g0l1235" style="font-size:21px;line-height:1.4"><strong>Simulating correlation between effects and standard errors</strong></p>



<p class="has-black-color has-text-color has-link-color wp-elements-ad9765052fbd7ec8ec7e159e61563a87 wp-block-paragraph" id="viewer-7t3ts1237" style="font-size:21px;line-height:1.4">The key line of code in my simulation program that generates correlation between effects and standard errors is given in the box below:</p>



<figure class="wp-block-image size-large"><a href="https://replicationnetwork.com/wp-content/uploads/2026/01/image-1.png"><img width="966" height="77" data-attachment-id="7945" data-permalink="https://replicationnetwork.com/2026/01/20/another-reason-to-prefer-random-effects-over-fixed-effects-uwls-meta-analysis/image-122/" data-orig-file="https://replicationnetwork.com/wp-content/uploads/2026/01/image-1.png" data-orig-size="966,77" data-comments-opened="1" data-image-meta="{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}" data-image-title="image" data-image-description="" data-image-caption="" data-medium-file="https://replicationnetwork.com/wp-content/uploads/2026/01/image-1.png?w=300" data-large-file="https://replicationnetwork.com/wp-content/uploads/2026/01/image-1.png?w=604" src="https://replicationnetwork.com/wp-content/uploads/2026/01/image-1.png?w=966" alt="" class="wp-image-7945" srcset="https://replicationnetwork.com/wp-content/uploads/2026/01/image-1.png 966w, https://replicationnetwork.com/wp-content/uploads/2026/01/image-1.png?w=150 150w, https://replicationnetwork.com/wp-content/uploads/2026/01/image-1.png?w=300 300w, https://replicationnetwork.com/wp-content/uploads/2026/01/image-1.png?w=768 768w" sizes="(max-width: 966px) 100vw, 966px" /></a></figure>



<p class="has-black-color has-text-color has-link-color wp-elements-2e7626dc641895b97c7b700fd500f3c9 wp-block-paragraph" id="viewer-3zye81244" style="font-size:21px;line-height:1.4">where</p>



<p class="has-black-color has-text-color has-link-color wp-elements-a17c386d39d79597dc46d2dfd80f99e3 wp-block-paragraph" style="font-size:21px;line-height:1.4">&#8211;  “true_effect” is the true effect for a given primary study</p>



<p class="has-black-color has-text-color has-link-color wp-elements-924a2bd0a4edc0e7123f394736a9d4de wp-block-paragraph" style="font-size:21px;line-height:1.4">&#8211; “mu” is the overall mean of the true-effect distribution</p>



<p class="has-black-color has-text-color has-link-color wp-elements-8172ccc30923713cf4d0c1d1cf72de90 wp-block-paragraph" style="font-size:21px;line-height:1.4">&#8211; “error_sd” is the standard deviation of errors in the DGP that produces individual observations in a given primary study. The larger the “error_sd”, the larger the standard error of the estimated effect for that primary study. The error SDs are assumed to be uniformly distributed.</p>



<p class="has-black-color has-text-color has-link-color wp-elements-88403d87a3b6ccc3d940f8b34932bd71 wp-block-paragraph" style="font-size:21px;line-height:1.4">&#8211; “tau” is the standard deviation of residual heterogeneity in true effects (after standardizing, it becomes the standard deviation of true effects)</p>



<p class="has-black-color has-text-color has-link-color wp-elements-12c4b58a9ca47243fbf265560fa48354 wp-block-paragraph" style="font-size:21px;line-height:1.4">&#8211; “corr” controls the extent to which true effects are related to the studies’ standard errors.</p>



<p class="has-black-color has-text-color has-link-color wp-elements-ff2f27efdfec537e92e6ba563f572ccb wp-block-paragraph" id="viewer-rzkzr1258" style="font-size:21px;line-height:1.4">The idea is that true effects have a mean &nbsp;and variance, but the true effects will be related to the standard errors of the estimated effects depending on the value of “corr”. (In the actual program, true_effect is transformed to keep its variance constant for different values of “corr”. )</p>



<p class="has-black-color has-text-color has-link-color wp-elements-6f9bbbcf0fb5056accf024537ce0a7dd wp-block-paragraph" id="viewer-9lf911262" style="font-size:21px;line-height:1.4">When corr = 0, the simulation reflects the conventional assumption used in most published Monte Carlo studies: studies differ in their true effects, but those effects are unrelated to the studies’ precisions. This is the world where FE/UWLS typically performs well under publication bias.</p>



<p class="has-black-color has-text-color has-link-color wp-elements-d99a7853048b9195f53e9fcef8741689 wp-block-paragraph" id="viewer-kkq231266" style="font-size:21px;line-height:1.4">&nbsp;When corr ≠ 0, the story changes. Now the true effects and the standard errors move together, so the most precise studies may lie systematically above or below the population mean. In such samples, the estimates with the smallest standard errors are not representative of the full distribution of true effects. Yet FE/UWLS gives those same studies most of the weight—setting up exactly the situation in which inverse-variance weighting can produce biased results.</p>



<p class="has-black-color has-text-color has-link-color wp-elements-afa85186b0f3db3065176c6d15451228 wp-block-paragraph" id="viewer-y0v251270" style="font-size:21px;line-height:1.4">&nbsp;This simple modification is designed to capture the real-world possibility that, in any single meta-analytic dataset, sampling variability and study design differences can jointly produce a correlation between effect sizes and their precisions.</p>



<p class="has-black-color has-text-color has-link-color wp-elements-8745ee1f689bc48ee3d0dbd9fcc8d893 wp-block-paragraph" id="viewer-ysgwo1274" style="font-size:21px;line-height:1.4">&nbsp;<strong>Example where FE/UWLS is better than RE</strong></p>



<p class="has-black-color has-text-color has-link-color wp-elements-1724978363e28bdc86c7e5dbae463eac wp-block-paragraph" id="viewer-5zrhz1278" style="font-size:21px;line-height:1.4">I begin by examining a baseline scenario: there is no publication selection bias and corr = 0, meaning true effects are unrelated to standard errors. This setup reflects the assumptions used in most existing Monte Carlo studies. As shown in the left panel of FIGURE 2, all three estimators—OLS, FE/UWLS, and RE—are unbiased. Their efficiencies, however, differ in the expected way: RE is most efficient, FE/UWLS somewhat less so, and OLS least precise.</p>



<p class="has-black-color has-text-color has-link-color wp-elements-665d10118ec811b0ba1fd98fdaab78ae wp-block-paragraph" id="viewer-up0nl1282" style="font-size:21px;line-height:1.4">&nbsp;<strong>FIGURE 2</strong>: <strong>Distributions of Estimated Effects Without and With Publication Selection Bias (corr = 0)</strong></p>



<figure class="wp-block-image"><img src="https://replicationnetwork.com/wp-content/uploads/2026/01/7f5f9-f1165a_4719bec609fd40a6b5437c3173e2b70dmv2.png" alt="" /></figure>



<p class="has-black-color has-text-color has-link-color wp-elements-48229110aa474e98da3a1d1357c626f1 wp-block-paragraph" id="viewer-o7vfq1288" style="font-size:21px;line-height:1.4">The right panel of FIGURE 2 introduces publication selection bias while keeping corr = 0. Here, FE/UWLS performs best, exhibiting both lower bias and root mean square error than RE or OLS. This result reproduces the standard argument for preferring FE/UWLS under publication bias.</p>



<p class="has-black-color has-text-color has-link-color wp-elements-ac8f0b2e249fb6b1bd650460f490e4c6 wp-block-paragraph" id="viewer-5lzqc1295" style="font-size:21px;line-height:1.4">&nbsp;Why does FE/UWLS perform best in this case? When publication selection is present, studies with large standard errors must have large, estimated effects to be statistically significant and thus included in the published literature. Consequently, these imprecise studies are systematically positively biased. More precise studies by contrast require only modest effects to reach significance, so their published results remain closer to the underlying true-effect distribution.</p>



<p class="has-black-color has-text-color has-link-color wp-elements-2d0a9f6994ed45d388ff531a0885edc4 wp-block-paragraph" id="viewer-pvdzv1299" style="font-size:21px;line-height:1.4">FE/UWLS succeeds in this setting precisely because it gives most of the weight to these highly precise, less biased studies. In a world where true effects and study precisions are uncorrelated, and where publication bias primarily distorts the noisy studies, inverse-sampling variance weighting helps correct that distortion.</p>



<p class="has-black-color has-text-color has-link-color wp-elements-17a729b6641c19e41a8cacb48cf3e8fa wp-block-paragraph" id="viewer-vgmkk1301" style="font-size:21px;line-height:1.4"><strong>Example where RE is better than FE/UWLS</strong></p>



<p class="has-black-color has-text-color has-link-color wp-elements-4023170df90055230faffa7e27f0d84b wp-block-paragraph" id="viewer-jbh4w1303" style="font-size:21px;line-height:1.4">However, the strength of FE/UWLS can also be its weakness. When only a few studies receive most of the weight—and when those studies are not representative—FE/UWLS’s main advantage becomes a liability.</p>



<p class="has-black-color has-text-color has-link-color wp-elements-972b267a775203632aa3bb1d8c40d73d wp-block-paragraph" id="viewer-r2kn81305" style="font-size:21px;line-height:1.4">FIGURE 3 illustrates this situation. Here, estimated effects and standard errors are positively correlated in the pre-selection sample. As a result, the most precise studies tend to have smaller true effects, while the less precise studies tend to have larger ones. Under these conditions, weighting by inverse sampling variance pulls the pooled estimate downward, producing estimates that consistently lie below the population mean.</p>



<p class="has-black-color has-text-color has-link-color wp-elements-613b4b5eb641c55c2aa59e8404f963e4 wp-block-paragraph" id="viewer-7gujs1309" style="font-size:21px;line-height:1.4"><strong>&nbsp;FIGURE 3</strong>: <strong>Distributions of Estimated Effects Without and With Publication Selection Bias (corr &gt; 0)</strong></p>



<figure class="wp-block-image"><img src="https://replicationnetwork.com/wp-content/uploads/2026/01/b4720-f1165a_35deb556149d4909a348f55bb30969d8mv2.png" alt="" /></figure>



<p class="has-black-color has-text-color has-link-color wp-elements-afe988c0fc0775b699da0aa371dd3fc8 wp-block-paragraph" id="viewer-0uqn01315" style="font-size:21px;line-height:1.4"><strong>&nbsp;</strong>As demonstrated in the left panel of FIGURE 3, without publication selection bias,</p>



<p class="has-black-color has-text-color has-link-color wp-elements-477788d78f8fcaeab9a2c9e91664b48a wp-block-paragraph" style="font-size:21px;line-height:1.4">&#8212; OLS performs best, because it does not overweight the non-representative precise studies;</p>



<p class="has-black-color has-text-color has-link-color wp-elements-4432e71d938b5d7ab4b1d0ba6a1544f0 wp-block-paragraph" style="font-size:21px;line-height:1.4">&#8212; RE performs reasonably well by partially shrinking the overly influential studies;</p>



<p class="has-black-color has-text-color has-link-color wp-elements-c3e4a286184e31d6bdcc36f7ca5ece8e wp-block-paragraph" style="font-size:21px;line-height:1.4">&#8212; FE/UWLS performs worst, because inverse-variance weighting amplifies the bias introduced by the correlation structure.</p>



<p class="has-black-color has-text-color has-link-color wp-elements-d307fd8c06b75688c8a60f3ac6f82c22 wp-block-paragraph" id="viewer-9ile2223910" style="font-size:21px;line-height:1.4">The right panel introduces publication selection bias, and the ranking shifts. Now two forces operate simultaneously:</p>



<p class="has-black-color has-text-color has-link-color wp-elements-14e9eedab84a30d64080b05cefa559e7 wp-block-paragraph" id="viewer-3xy5j1330" style="font-size:21px;line-height:1.4">1.&nbsp;&nbsp;&nbsp;&nbsp; Publication bias, which pushes noisy estimates upward; and</p>



<p class="has-black-color has-text-color has-link-color wp-elements-8934618dbcaae7e71a3494cfaad93bf1 wp-block-paragraph" id="viewer-h376s1332" style="font-size:21px;line-height:1.4">2.&nbsp;&nbsp;&nbsp;&nbsp;Correlation between effects and standard errors, which pushes inverse-variance weighted estimates downward.</p>



<p class="has-black-color has-text-color has-link-color wp-elements-b14d768c558a234f91a2473307bca3c5 wp-block-paragraph" id="viewer-ahafi225706" style="font-size:21px;line-height:1.4">In this tug-of-war, RE performs best because it partially adjusts for both influences. It moderates the publication-selection bias (which inflates the noisy studies) and avoids the extreme overweighting that would magnify the downward bias from the correlated true effects. FE/UWLS, by contrast, is pulled too far in one direction, and OLS too far in the other.</p>



<p class="has-black-color has-text-color has-link-color wp-elements-7aeb204ac633f44e54250ecfadd5462a wp-block-paragraph" id="viewer-w5g3h1336" style="font-size:21px;line-height:1.4">This example shows that once true effects and standard errors are correlated—a plausible situation in real meta-analytic datasets—the assumptions underpinning FE/UWLS’s superiority no longer hold. Under such conditions, RE can produce more reliable estimates.</p>



<p class="has-black-color has-text-color has-link-color wp-elements-8e06926667603270014a6898cdb81df7 wp-block-paragraph" id="viewer-m5e071338" style="font-size:21px;line-height:1.4"><strong>A Shiny app to simulate more examples</strong></p>



<p class="has-black-color has-text-color has-link-color wp-elements-db949e6bc2eb0d85c46843177fa85908 wp-block-paragraph" id="viewer-kh4zd1340" style="font-size:21px;line-height:1.4">Did I cherry pick this example? Absolutely! I chose parameter values that allowed me to illustrate a scenario where the correlation between true effects and standard errors clearly works against FE/UWLS. However, this outcome is confirmed using a wide range of plausible parameter settings.</p>



<p class="has-black-color has-text-color has-link-color wp-elements-6f5c757bb08d861398ec887c1a31eb46 wp-block-paragraph" id="viewer-antmi1342" style="font-size:21px;line-height:1.4">Rather than taking my word for it, you can explore these patterns directly using the Shiny app developed for this blog (<a href="https://w87avq-bob-reed.shinyapps.io/re_vs_fe_publication_bias/" target="_blank" rel="noreferrer noopener"><u><strong><em>see here</em></strong></u></a>). The app allows you to vary key parameters—mu, tau, primary study sample sizes, and corr—by entering them in the simulation settings box (see below). You’ll find that RE is not universally better; in some settings FE/UWLS dominates. But once correlations between effects and standard errors are allowed, RE often provides the more reliable estimate.</p>



<p class="has-black-color has-text-color has-link-color wp-elements-bd4f29a4e5b40c794ec92142ff087bd6 wp-block-paragraph" id="viewer-knfeq1347" style="font-size:21px;line-height:1.4"><strong>Concluding thoughts</strong></p>



<p class="has-black-color has-text-color has-link-color wp-elements-8704d453a33255230075dd5c2ee9309f wp-block-paragraph" id="viewer-pkcqu1349" style="font-size:21px;line-height:1.4">The following points summarize the main lessons of this blog.</p>



<p class="has-black-color has-text-color has-link-color wp-elements-12f87dc143ded70e3038de398c67b2d1 wp-block-paragraph" id="viewer-mj3iy1351" style="font-size:21px;line-height:1.4">The “few-studies problem” is fundamentally a sample-level problem<strong>. </strong>When a small number of highly precise studies dominate the weighting, FE/UWLS effectively becomes a “few-studies estimator.” If those few studies are not representative of the population distribution of true effects, the pooled estimate will be misleading. Standard simulation designs examine a data-generating process where there is no population correlation between true effects and standard errors (or sampling errors), which helps explain why earlier simulation studies often found FE/UWLS to outperform RE under publication selection bias.</p>



<p class="has-black-color has-text-color has-link-color wp-elements-d01af9254f2dbb1bc92bf7e0599c0c86 wp-block-paragraph" id="viewer-v7qbd1355" style="font-size:21px;line-height:1.4">Meta-analysts should routinely report weight concentration and heterogeneity<strong>. </strong>Simple diagnostics—such as the share of total weight carried by the top few studies and heterogeneity measures like I²—provide readers with a clearer sense of how vulnerable a meta-analysis is to the representativeness of a small subset of estimates. TABLE 2 illustrates the type of information that would be especially useful to include in applied work.</p>



<p class="has-black-color has-text-color has-link-color wp-elements-16ff19f0ba9348746822c2f4ad05f86a wp-block-paragraph" id="viewer-hkznc1359" style="font-size:21px;line-height:1.4">The same concerns raised here may may also apply to multi-level models<strong>. </strong>In a recent paper, <a href="https://psycnet.apa.org/doiLanding?doi=10.1037%2Fmet0000773" target="_blank" rel="noreferrer noopener"><u><strong><em>Chen &amp; Pustejovsky (2025)</em></strong></u></a> investigate methods to correct publication selection bias in the context of multi-level models. They conclude that a variant of the CHE estimator – something they call CHE-ISCW (Correlated and Hierarchical Effects model with Inverse Sampling Covariance Weights) &#8212; outperforms CHE. In a sense, CHE versus CHE-ISCW is a multi-level analog of the RE versus FE/UWLS comparison discussed here. As such, the “few studies” problem may apply to their comparison as well.</p>



<p class="has-black-color has-text-color has-link-color wp-elements-1da455b0a0efe949f772876f5df90997 wp-block-paragraph" id="viewer-jnofr1365" style="font-size:21px;line-height:1.4">In conclusion, whether FE/UWLS or RE performs better depends critically on the representativeness of the small set of studies receiving most of the weight. The claim that FE/UWLS outperforms RE under publication selection bias implicitly assumes that, in the pre-publication-selection world, estimated effects are uncorrelated with their standard errors. This assumption may not hold in real samples. The presence of a few heavily weighted studies does not automatically mean RE is preferable—but it does support that possibility.</p>



<p class="has-black-color has-text-color has-link-color wp-elements-b9e06660f10f47b3c5fc0c74176df62e wp-block-paragraph" id="viewer-rc7fx1367" style="font-size:21px;line-height:1.4">If nothing else, I hope this blog encourages greater attention to the risks that arise when a large amount of inferential weight is placed on a small number of studies.</p>



<p class="has-black-color has-text-color has-link-color wp-elements-c81c753bf13251cf3ccb2f3710a4dcd4 wp-block-paragraph" style="font-size:21px;line-height:1.4"><em>NOTE: Bob Reed is Professor of Economics and the Director of </em><a href="https://www.canterbury.ac.nz/business-and-law/research/ucmeta/" target="_blank" rel="noreferrer noopener"><strong><em>UCMeta</em></strong></a><em> at the University of Canterbury. He can be reached at </em><a href="mailto:bob.reed@canterbury.ac.nz" target="_blank" rel="noreferrer noopener"><em>bob.reed@canterbury.ac.nz</em></a><em>.</em></p>



<p class="has-black-color has-text-color has-link-color wp-elements-bae60cefa8f13e1c31557c8d601720c1 wp-block-paragraph" id="viewer-w7gnm1369" style="font-size:21px;line-height:1.4"><strong>REFERENCES</strong></p>



<p class="has-black-color has-text-color has-link-color wp-elements-cfa2a33764f9408df4ce742639dee735 wp-block-paragraph" id="viewer-r5bvw1373" style="font-size:21px;line-height:1.4">Alinaghi, N., &amp; Reed, W. R. (2020). Taxes and Economic Growth in OECD Countries: A Meta-analysis. <em>Public Finance Review</em>, <em>49</em>(1), 3. <a target="_blank" href="https://doi.org/10.1177/1091142120961775" rel="noreferrer noopener"><u>https://doi.org/10.1177/1091142120961775</u></a></p>



<p class="has-black-color has-text-color has-link-color wp-elements-2f17a561dd5f044700817c4f2d397c96 wp-block-paragraph" id="viewer-w7zzs1380" style="font-size:21px;line-height:1.4">Chen, M., &amp; Pustejovsky, J. E. (2025). Adapting methods for correcting selective reporting bias in meta-analysis of dependent effect sizes. <em>Psychological Methods</em>. <a target="_blank" href="https://doi.org/10.1037/met0000773" rel="noreferrer noopener"><u>https://doi.org/10.1037/met0000773</u></a></p>



<p class="has-black-color has-text-color has-link-color wp-elements-85aa2214c80df92720aba7998da1c8cd wp-block-paragraph" id="viewer-ayi1n1385" style="font-size:21px;line-height:1.4">Hardy, R., &amp; Thompson, S. G. (1998). Detecting and describing heterogeneity in meta-analysis. <em>Statistics in Medicine</em>, <em>17</em>(8), 841. <a target="_blank" href="https://doi.org/10.1002/(sici)1097-0258(19980430)17:8" rel="noreferrer noopener"><u>https://doi.org/10.1002/(sici)1097-0258(19980430)17:8&lt;841::aid-sim781&gt;3.0.co;2-d</u></a></p>



<p class="has-black-color has-text-color has-link-color wp-elements-003b5b77ee86a930f92378d2cfd6eede wp-block-paragraph" id="viewer-balic1392" style="font-size:21px;line-height:1.4">Hong, S., &amp; Reed, W. R. (2020). Using Monte Carlo experiments to select meta‐analytic estimators [Review of <em>Using Monte Carlo experiments to select meta‐analytic estimators</em>]. <em>Research Synthesis Methods</em>, <em>12</em>(2), 192. Wiley. <a target="_blank" href="https://doi.org/10.1002/jrsm.1467" rel="noreferrer noopener"><u>https://doi.org/10.1002/jrsm.1467</u></a></p>



<p class="has-black-color has-text-color has-link-color wp-elements-3153128d17658d5e340cc7266883e0a6 wp-block-paragraph" id="viewer-3i9md1401" style="font-size:21px;line-height:1.4">Ma, W., Hong, S., Reed, W. R., Duan, J., &amp; Luu, P. Q. (2023). Yield effects of agricultural cooperative membership in developing countries: A meta‐analysis. <em>Annals of Public and Cooperative Economics</em>, <em>94</em>(3), 761. <a target="_blank" href="https://doi.org/10.1111/apce.12411" rel="noreferrer noopener"><u>https://doi.org/10.1111/apce.12411</u></a></p>



<p class="has-black-color has-text-color has-link-color wp-elements-fb0d347856fdb843cdeb622f632ea1ad wp-block-paragraph" id="viewer-i5tmb1408" style="font-size:21px;line-height:1.4">Moreno, S. G., Sutton, A. J., Thompson, J. R., Ades, A. E., Abrams, K. R., &amp; Cooper, N. J. (2012). A generalized weighting regression‐derived meta‐analysis estimator robust to small‐study effects and heterogeneity. <em>Statistics in Medicine</em>, <em>31</em>(14), 1407. <a target="_blank" href="https://doi.org/10.1002/sim.4488" rel="noreferrer noopener"><u>https://doi.org/10.1002/sim.4488</u></a></p>



<p class="has-black-color has-text-color has-link-color wp-elements-8194ac4d218e30169a80d2cf636984c8 wp-block-paragraph" id="viewer-74pbw1415" style="font-size:21px;line-height:1.4">Shi, B. (2025). Unpublished research. University of Canterbury.</p>



<p class="has-black-color has-text-color has-link-color wp-elements-102c0cc81d2f53f91d6dcb8a92e2df23 wp-block-paragraph" id="viewer-c9c101417" style="font-size:21px;line-height:1.4">Song, F., Sheldon, T., Sutton, A. J., Abrams, K. R., &amp; Jones, D. R. (2001). Methods for Exploring Heterogeneity in Meta-Analysis. <em>Evaluation &amp; the Health Professions</em>, <em>24</em>(2), 126. <a target="_blank" href="https://doi.org/10.1177/016327870102400203" rel="noreferrer noopener"><u>https://doi.org/10.1177/016327870102400203</u></a></p>



<p class="has-black-color has-text-color has-link-color wp-elements-0189d8dd5acee6282fe4660265eaf4d2 wp-block-paragraph" id="viewer-3bh4o1424" style="font-size:21px;line-height:1.4">Stanley, T. D., &amp; Doucouliagos, H. (2015). Neither fixed nor random: weighted least squares meta‐analysis. <em>Statistics in Medicine</em>, <em>34</em>(13), 2116. <a target="_blank" href="https://doi.org/10.1002/sim.6481" rel="noreferrer noopener"><u>https://doi.org/10.1002/sim.6481</u></a></p>



<p class="has-black-color has-text-color has-link-color wp-elements-f8bd7da7c01222d19315dd859913cf00 wp-block-paragraph" id="viewer-f44o81431" style="font-size:21px;line-height:1.4">Stanley, T. D., &amp; Doucouliagos, H. (2016). Neither fixed nor random: weighted least squares meta‐regression. <em>Research Synthesis Methods</em>, <em>8</em>(1), 19. <a target="_blank" href="https://doi.org/10.1002/jrsm.1211" rel="noreferrer noopener"><u>https://doi.org/10.1002/jrsm.1211</u></a></p>



<p class="has-black-color has-text-color has-link-color wp-elements-a77d753707bfdcdd209e2c7ad50b2430 wp-block-paragraph" id="viewer-gpeku1438" style="font-size:21px;line-height:1.4">Stanley, T. D., &amp; Doucouliagos, H. (2023). Correct standard errors can bias meta‐analysis. <em>Research Synthesis Methods</em>, <em>14</em>(3), 515. <a target="_blank" href="https://doi.org/10.1002/jrsm.1631" rel="noreferrer noopener"><u>https://doi.org/10.1002/jrsm.1631</u></a></p>



<p class="has-black-color has-text-color has-link-color wp-elements-8c74b0246761e7b305d7785a226267a3 wp-block-paragraph" id="viewer-x4web1445" style="font-size:21px;line-height:1.4">Stanley, T. D., Ioannidis, J. P. A., Maier, M., Doucouliagos, H., Otte, W. M., &amp; Bartoš, F. (2023). Unrestricted weighted least squares represent medical research better than random effects in 67,308 Cochrane meta-analyses. <em>Journal of Clinical Epidemiology</em>, <em>157</em>, 53. Elsevier BV. <a target="_blank" href="https://doi.org/10.1016/j.jclinepi.2023.03.004" rel="noreferrer noopener"><u>https://doi.org/10.1016/j.jclinepi.2023.03.004</u></a></p>



<p class="has-black-color has-text-color has-link-color wp-elements-b3607fa76694b43cf2ee305938e89658 wp-block-paragraph" id="viewer-vuhle1454" style="font-size:21px;line-height:1.4">Wu, W., Duan, J., Reed, W. R., &amp; Tipton, E. (2025). What can we learn from 1,000 meta-analyses across 10 different disciplines? <em>Research Synthesis Methods</em>, 1.</p>



<p class="has-black-color has-text-color has-link-color wp-elements-037c6c884e7da223d21d2699036fc4b8 wp-block-paragraph" id="viewer-ka7zi1459" style="font-size:21px;line-height:1.4">Xue, X., Reed, W. R., &amp; Aert, R. C. M. van. (2024). Social capital and economic growth: A meta‐analysis. <em>Journal of Economic Surveys</em>, <em>39</em>(4), 1395. <a target="_blank" href="https://doi.org/10.1111/joes.12660" rel="noreferrer noopener"><u>https://doi.org/10.1111/joes.12660</u></a></p>



<p class="has-black-color has-text-color has-link-color wp-elements-f9c535dfda6857a05301406032dd5d5f wp-block-paragraph" id="viewer-jf48e1466" style="font-size:21px;line-height:1.4">Xue, X., Reed, W. R., &amp; Menclova, A. (2020). Social capital and health: a meta-analysis [Review of <em>Social capital and health: a meta-analysis</em>]. <em>Journal of Health Economics</em>, <em>72</em>, 102317. Elsevier BV. <a target="_blank" href="https://doi.org/10.1016/j.jhealeco.2020.102317" rel="noreferrer noopener"><u>https://doi.org/10.1016/j.jhealeco.2020.102317</u></a></p>
]]></content:encoded>
					
					<wfw:commentRss>https://replicationnetwork.com/2026/01/20/another-reason-to-prefer-random-effects-over-fixed-effects-uwls-meta-analysis/feed/</wfw:commentRss>
			<slash:comments>0</slash:comments>
		
		
		<post-id xmlns="com-wordpress:feed-additions:1">7936</post-id>
		<media:content url="https://2.gravatar.com/avatar/e64ee14e2ee86f72b7681f00e27a78273ca0707a414bfd21f792423c54038664?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">replicationnetwork</media:title>
		</media:content>

		<media:content url="https://replicationnetwork.com/wp-content/uploads/2026/01/a5e68-f1165a_f76063a2129d459d98275800ca7a86f5mv2.png" medium="image" />

		<media:content url="https://replicationnetwork.com/wp-content/uploads/2026/01/73d31-f1165a_42372b132584402a97af66cc80d9c423mv2.png" medium="image" />

		<media:content url="https://replicationnetwork.com/wp-content/uploads/2026/01/image.png?w=961" medium="image" />

		<media:content url="https://replicationnetwork.com/wp-content/uploads/2026/01/image-1.png?w=966" medium="image" />

		<media:content url="https://replicationnetwork.com/wp-content/uploads/2026/01/7f5f9-f1165a_4719bec609fd40a6b5437c3173e2b70dmv2.png" medium="image" />

		<media:content url="https://replicationnetwork.com/wp-content/uploads/2026/01/b4720-f1165a_35deb556149d4909a348f55bb30969d8mv2.png" medium="image" />
	</item>
		<item>
		<title>REED: You Can Calculate Power Retrospectively — Just Don’t Use Observed Power</title>
		<link>https://replicationnetwork.com/2025/08/29/you-can-calculate-power-retrospectively-just-dont-use-observed-power/</link>
					<comments>https://replicationnetwork.com/2025/08/29/you-can-calculate-power-retrospectively-just-dont-use-observed-power/#respond</comments>
		
		<dc:creator><![CDATA[replicationnetwork]]></dc:creator>
		<pubDate>Fri, 29 Aug 2025 03:05:18 +0000</pubDate>
				<category><![CDATA[GUEST BLOGS]]></category>
		<category><![CDATA[Observed Power]]></category>
		<category><![CDATA[Post-hoc Power]]></category>
		<category><![CDATA[Retrospective Power]]></category>
		<category><![CDATA[SE-ES]]></category>
		<guid isPermaLink="false">http://replicationnetwork.com/?p=7896</guid>

					<description><![CDATA[In this blog, I highlight a valid approach for calculating power after estimation—often called retrospective power. I provide a Shiny App that lets readers explore how the method works and how it avoids the pitfalls of “observed power” — try...]]></description>
										<content:encoded><![CDATA[
<p class="has-black-color has-text-color has-link-color wp-elements-51f1b7d2e43c76bad96a3eb584ace17b wp-block-paragraph" style="font-size:21px;line-height:1.4"><em>In this blog, I highlight a valid approach for calculating power after estimation—often called retrospective power. I provide a Shiny App that lets readers explore how the method works and how it avoids the pitfalls of “observed power” — try it out for yourself! I also link to a webpage where readers can enter any estimate, along with its standard error and degrees of freedom, to calculate the corresponding power.</em></p>



<h2 class="wp-block-heading has-black-color has-text-color has-link-color wp-elements-64e80c437c7b2386370a87b3f916ab7e" style="font-size:24px;line-height:1.4"><strong>A. Why retrospective power can be useful</strong></h2>



<p class="has-black-color has-text-color has-link-color wp-elements-60a651a7e4f6c52a9a8342224f2683ac wp-block-paragraph" style="font-size:21px;line-height:1.4">Most researchers calculate power before estimation, generally to plan sample sizes: given a hypothesized effect, a significance level, and degrees of freedom, power analysis asks how large a study must be to achieve a desired probability of detection. </p>



<p class="has-black-color has-text-color has-link-color wp-elements-34e40bd77e109bcf8e3f9b28688db6f5 wp-block-paragraph" style="font-size:21px;line-height:1.4">That’s good practice, but key inputs—variance, number of clusters, intraclass correlation coefficient (ICC), attrition, covariate performance—are guessed before the data exist, so realized (ex post) values often differ from what was planned. As <strong><span style="text-decoration: underline"><em><a href="https://www.povertyactionlab.org/resource/quick-guide-power-calculations" target="_blank" rel="noreferrer noopener">Doyle &amp; Feeney (2021)</a></em></span></strong> note in their guide to power calculations, “the exact ex post value of inputs to power will necessarily vary from ex ante estimates.” This is why it can be useful—even preferable—to also calculate power after estimation.</p>



<p class="has-black-color has-text-color has-link-color wp-elements-e18b7106b4e582e99b130bc3667075ec wp-block-paragraph" style="font-size:21px;line-height:1.4">Ex-post power can be helpful in at least three situations. </p>



<p class="has-black-color has-text-color has-link-color wp-elements-6ee1fa32fe4eaf106d3df9fb8ea8ceee wp-block-paragraph" style="font-size:21px;line-height:1.4">1) <strong>It can provide a check on whether ex-ante power assessments were realized.</strong> Because actual implementation rarely matches the original plan—fewer participants recruited, geographic constraints on clusters, or greater dependency within clusters than anticipated—realized power often departs from planned power. Calculating ex-post power highlights these gaps and helps diagnose why they occurred.</p>



<p class="has-black-color has-text-color has-link-color wp-elements-e979adabcc3c1ab56140b1e91250982a wp-block-paragraph" style="font-size:21px;line-height:1.4">2) <strong>It can help distinguish whether a statistically insignificant estimate reflects a negligible effect size or an imprecise estimate.</strong> In other words, it can separate “insignificant because small” from “insignificant because underpowered.”</p>



<p class="has-black-color has-text-color has-link-color wp-elements-9b85de539b74d0c8c7cd343f70a2c113 wp-block-paragraph" style="font-size:21px;line-height:1.4">3) <strong>It can flag potential Type M (magnitude) risk when results are significant but measured power is low.</strong> In this way, it can warn of possible overestimation and prompt more cautious interpretation (<a href="https://sites.stat.columbia.edu/gelman/research/published/retropower_final.pdf" target="_blank" rel="noreferrer noopener"><strong><em><span style="text-decoration: underline">Gelman &amp; Carlin, 2014</span></em></strong></a>).</p>



<p class="has-black-color has-text-color has-link-color wp-elements-a0173dd8730e79c4e17e162b8bab12e2 wp-block-paragraph" style="font-size:21px;line-height:1.4">In short, while ex-ante power is essential for planning, ex-post power is a practical complement for evaluation and interpretation. It connects power claims with realized outcomes, enables the diagnosis of deviations from plan, and provides additional insights when interpreting both null and significant findings.</p>



<h2 class="wp-block-heading has-black-color has-text-color has-link-color wp-elements-d6df2a5fa4d0298e9a1f5708e5abd3c5" style="font-size:24px;line-height:1.4"><strong>B. Why the usual way (“Observed Power”) is a bad idea</strong></h2>



<p class="has-black-color has-text-color has-link-color wp-elements-6713d93df24789ac218e41249352cb48 wp-block-paragraph" style="font-size:21px;line-height:1.4">Most statisticians advise against computing observed power, which plugs the observed effect and its estimated standard error into a power formula (<a href="https://blogs.worldbank.org/en/impactevaluations/why-ex-post-power-using-estimated-effect-sizes-bad-ex-post-mde-not" target="_blank" rel="noreferrer noopener"><strong><span style="text-decoration: underline"><em>McKenzie &amp; Ozier, 2019</em></span></strong></a>). Because observed power is a one-to-one (monotone) transformation of the test statistic—and hence of the <em>p</em>-value—it adds no information and encourages tautological explanations (e.g., “the result was non-significant because power was low”).</p>



<p class="has-black-color has-text-color has-link-color wp-elements-113903dd0a7030e1d0b07074b900940e wp-block-paragraph" style="font-size:21px;line-height:1.4">Worse, as an estimator of a study’s design power, observed power is both biased and high variance, precisely because it treats a noisy point estimate as the true effect. These problems are well documented (<a href="https://doi.org/10.1198/000313001300339897" target="_blank" rel="noreferrer noopener"><strong><span style="text-decoration: underline"><em>Hoenig &amp; Heisey, 2001</em></span></strong></a>; <a href="https://doi.org/10.7326/0003-4819-121-3-199408010-00008" target="_blank" rel="noreferrer noopener"><strong><span style="text-decoration: underline"><em>Goodman &amp; Berlin, 1994</em></span></strong></a>; <a href="https://doi.org/10.1177/0956797613504966" target="_blank" rel="noreferrer noopener"><strong><span style="text-decoration: underline"><em>Cumming, 2014</em></span></strong></a>; <a href="https://doi.org/10.1146/annurev.psych.59.103006.093735" target="_blank" rel="noreferrer noopener"><strong><span style="text-decoration: underline"><em>Maxwell, Kelley, &amp; Rausch, 2008</em></span></strong></a>). These concerns are not just theoretical: I demonstrate below how minor sampling variation translates into dramatic changes in observed power. </p>



<h2 class="wp-block-heading has-black-color has-text-color has-link-color wp-elements-09896ee57ed0a7097d4403e7cf32963a" style="font-size:24px;line-height:1.4"><strong>C. A better retrospective approach: SE–ES</strong></h2>



<p class="has-black-color has-text-color has-link-color wp-elements-4773f399a072b675e701542ed6c41204 wp-block-paragraph" style="font-size:21px;line-height:1.4">In a recent paper (<a href="https://doi.org/10.1111/rode.13130" target="_blank" rel="noreferrer noopener"><strong><span style="text-decoration: underline"><em>Tian et al., 2024</em></span></strong></a>), I and my coauthors propose a practical alternative that we call: SE–ES (Standard Error–Effect Size). The idea is simple. The researcher specifies a hypothesized effect size (what would be substantively important), uses the estimated standard error from the fitted regression, and combines those with the relevant degrees of freedom to compute power for a two‑sided t‑test.</p>



<p class="has-black-color has-text-color has-link-color wp-elements-79c7d286e8f5038edacd8ac32fcc4356 wp-block-paragraph" style="font-size:21px;line-height:1.4">Because SE–ES fixes the effect size externally—rather than using the noisy point estimate—it yields a serviceable retrospective power number: approximately unbiased for the true design power with a reasonably tight 95% estimation interval, provided samples are not too small.</p>



<p class="has-black-color has-text-color has-link-color wp-elements-d1a4ca0a12d4411c100890a3888a893a wp-block-paragraph" style="font-size:21px;line-height:1.4">To make this concrete, suppose the data-generating process is <em>Y=a+bX+ε</em> , with <em>ε</em> a classical error term and <em>b</em> estimated by OLS. If the true design power is 80%, simulations at sample sizes <em>n</em> = 30, 50, 100 show that the SE–ES estimator is approximately unbiased, with 95% estimation intervals that tighten as <em>n</em> grows: (i) <em>n</em> = 30 yields (60%, 96%); (ii) <em>n</em> = 50 yields (65%, 94%); and (iii) <em>n</em> = 100 yields (70%, 90%).</p>



<p class="has-black-color has-text-color has-link-color wp-elements-8020869ec0f06955c6aff2ad0cd7c208 wp-block-paragraph" style="font-size:21px;line-height:1.4"><strong>D. Try it yourself: A Shiny app that compares SE–ES with Observed Power</strong></p>



<p class="has-black-color has-text-color has-link-color wp-elements-a5bb1804e925857ba2de9f01ac0506d8 wp-block-paragraph" style="font-size:21px;line-height:1.4">To visualize the contrast, I have created a companion Shiny app. It lets you vary sample size (<em>n</em>), target/true power, and <em>α</em>, then: (1) runs Monte Carlo replications of <em>Y ~ 1 + βX</em>; (2) plots side‑by‑side histograms of retrospective power for SE–ES and Observed Power; and (3) reports the Mean and the 95% simulation interval (the central 2.5%–97.5% range of simulated power values) for each method. Power is calculated under two‑tailed testing.</p>



<p class="has-black-color has-text-color has-link-color wp-elements-a9643a3efe345216558d9025387659f7 wp-block-paragraph" style="font-size:21px;line-height:1.4">What you should see: the Observed Power histogram tracks the significance test—mass near 0 when results are null, near 1 when they are significant—because it is just a re‑expression of the t statistic. Further, the wide range of estimates makes it unusable even if its biasedness did not. The SE–ES histogram, in contrast, concentrates near the design’s target power and tightens as sample size grows.</p>



<p class="has-black-color has-text-color has-link-color wp-elements-57271f443755f788fc20c673a6da4f81 wp-block-paragraph" style="font-size:21px;line-height:1.4">To use the app, <strong><em><u><a href="https://w87avq-bob-reed.shinyapps.io/retrospective_power_app/" target="_blank" rel="noreferrer noopener">click here</a></u></em></strong>. Input the respective values in the Shiny app’s sidebar panel. The panel below provides an example with sample size set equal to 100; true power equal to 80% (for two-sided significance), alpha equal to 5%, and sets the number of simulations = 1000 and the random seed equal to 123.</p>



<p class="has-black-color has-text-color has-link-color wp-elements-5aea1e7175869bd26b45be07e612952d wp-block-paragraph" style="font-size:21px;line-height:1.4"></p>


<div class="wp-block-image">
<figure class="aligncenter size-large is-resized"><a href="https://replicationnetwork.com/wp-content/uploads/2025/08/image-1.png"><img width="488" height="751" data-attachment-id="7907" data-permalink="https://replicationnetwork.com/2025/08/29/you-can-calculate-power-retrospectively-just-dont-use-observed-power/image-118/" data-orig-file="https://replicationnetwork.com/wp-content/uploads/2025/08/image-1.png" data-orig-size="488,751" data-comments-opened="1" data-image-meta="{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}" data-image-title="image" data-image-description="" data-image-caption="" data-medium-file="https://replicationnetwork.com/wp-content/uploads/2025/08/image-1.png?w=195" data-large-file="https://replicationnetwork.com/wp-content/uploads/2025/08/image-1.png?w=488" src="https://replicationnetwork.com/wp-content/uploads/2025/08/image-1.png?w=488" alt="" class="wp-image-7907" style="aspect-ratio:0.649802099466529;width:392px;height:auto" srcset="https://replicationnetwork.com/wp-content/uploads/2025/08/image-1.png 488w, https://replicationnetwork.com/wp-content/uploads/2025/08/image-1.png?w=97 97w, https://replicationnetwork.com/wp-content/uploads/2025/08/image-1.png?w=195 195w" sizes="(max-width: 488px) 100vw, 488px" /></a></figure>
</div>


<p class="has-black-color has-text-color has-link-color wp-elements-154327f1bf5947f72b3769aa963977e9 wp-block-paragraph" style="font-size:21px;line-height:1.4">Once you have entered your input values, click “Run simulation”. Two histograms will appear. The histogram to the left reports the distribution of estimated power values using the SE-ES method. The histogram to the right reports the same using Observed Power. The vertical dotted line indicates true power.</p>



<p class="has-black-color has-text-color has-link-color wp-elements-5aea1e7175869bd26b45be07e612952d wp-block-paragraph" style="font-size:21px;line-height:1.4"></p>



<figure class="wp-block-image size-large"><a href="https://replicationnetwork.com/wp-content/uploads/2025/08/image-2.png"><img loading="lazy" width="1024" height="493" data-attachment-id="7910" data-permalink="https://replicationnetwork.com/2025/08/29/you-can-calculate-power-retrospectively-just-dont-use-observed-power/image-119/" data-orig-file="https://replicationnetwork.com/wp-content/uploads/2025/08/image-2.png" data-orig-size="1309,631" data-comments-opened="1" data-image-meta="{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}" data-image-title="image" data-image-description="" data-image-caption="" data-medium-file="https://replicationnetwork.com/wp-content/uploads/2025/08/image-2.png?w=300" data-large-file="https://replicationnetwork.com/wp-content/uploads/2025/08/image-2.png?w=604" src="https://replicationnetwork.com/wp-content/uploads/2025/08/image-2.png?w=1024" alt="" class="wp-image-7910" srcset="https://replicationnetwork.com/wp-content/uploads/2025/08/image-2.png?w=1024 1024w, https://replicationnetwork.com/wp-content/uploads/2025/08/image-2.png?w=150 150w, https://replicationnetwork.com/wp-content/uploads/2025/08/image-2.png?w=300 300w, https://replicationnetwork.com/wp-content/uploads/2025/08/image-2.png?w=768 768w, https://replicationnetwork.com/wp-content/uploads/2025/08/image-2.png 1309w" sizes="(max-width: 1024px) 100vw, 1024px" /></a></figure>



<p class="has-black-color has-text-color has-link-color wp-elements-3d2c815fdc78eb1ad8fea29ecff34291 wp-block-paragraph" style="font-size:21px;line-height:1.4">Immediately below this figure, the Shiny app produces a table that reports the mean and 95% estimation interval of estimated powers for the SE-ES and Observed Power methods. For this example, with the true power = 80%, the Observed Power distribution is left skewed, biased downwards (mean = 73.4%) with a 95% estimation interval of (14.5%, 99.8%). In contrast, the SE-ES distribution is approximately symmetric, approximately centered around the true of 80%, with a 95% estimation interval of (68.5%, 89.9%).</p>


<div class="wp-block-image">
<figure class="aligncenter size-large is-resized"><a href="https://replicationnetwork.com/wp-content/uploads/2025/08/image-3.png"><img loading="lazy" width="928" height="329" data-attachment-id="7912" data-permalink="https://replicationnetwork.com/2025/08/29/you-can-calculate-power-retrospectively-just-dont-use-observed-power/image-120/" data-orig-file="https://replicationnetwork.com/wp-content/uploads/2025/08/image-3.png" data-orig-size="928,329" data-comments-opened="1" data-image-meta="{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}" data-image-title="image" data-image-description="" data-image-caption="" data-medium-file="https://replicationnetwork.com/wp-content/uploads/2025/08/image-3.png?w=300" data-large-file="https://replicationnetwork.com/wp-content/uploads/2025/08/image-3.png?w=604" src="https://replicationnetwork.com/wp-content/uploads/2025/08/image-3.png?w=928" alt="" class="wp-image-7912" style="aspect-ratio:2.8206912549292507;width:513px;height:auto" srcset="https://replicationnetwork.com/wp-content/uploads/2025/08/image-3.png 928w, https://replicationnetwork.com/wp-content/uploads/2025/08/image-3.png?w=150 150w, https://replicationnetwork.com/wp-content/uploads/2025/08/image-3.png?w=300 300w, https://replicationnetwork.com/wp-content/uploads/2025/08/image-3.png?w=768 768w" sizes="(max-width: 928px) 100vw, 928px" /></a></figure>
</div>


<p class="has-black-color has-text-color has-link-color wp-elements-a12aebeacf008451860e3a93cd71cf90 wp-block-paragraph" style="font-size:21px;line-height:1.4">The reader is encouraged to try out different target power values and, most importantly, sample sizes. What you should see is that the SE-ES method works well at every true power value, but, in this context, it becomes less serviceable for sample sizes below 30.</p>



<h2 class="wp-block-heading has-black-color has-text-color has-link-color wp-elements-aefb614759fbd6d863cefb35623fd221" style="font-size:24px;line-height:1.4"><strong>E. Bottom line—and an easy calculator you can use now</strong></h2>



<p class="has-black-color has-text-color has-link-color wp-elements-0c99aa04d87c676efeddf542cf7f7b0d wp-block-paragraph" style="font-size:21px;line-height:1.4">Power estimation is useful for before estimation, for planning. But it is also useful after estimation, as an interpretative tool. Furthermore, it is easy to calculate. For readers interested in calculating retrospective power for their own research, Thomas Logchies and I have created an online calculator that is easy to use: <strong><em><u><a href="https://replicationnetwork.com/2024/08/15/reed-logchies-calculating-power-after-estimation-no-programming-required/" target="_blank" rel="noreferrer noopener">click here</a></u></em></strong>. There you can enter α, degrees of freedom, an estimated standard error, and a hypothesized effect size to obtain SE–ES retrospective power for your estimate. Give it a go!</p>



<p class="has-black-color has-text-color has-link-color wp-elements-2b97162dc02471f45d34bc0b3f068539 wp-block-paragraph" style="font-size:21px;line-height:1.4"><em>NOTE: Bob Reed is Professor of Economics and&nbsp;the Director of&nbsp;</em><a href="https://www.canterbury.ac.nz/business-and-law/research/ucmeta/" target="_blank" rel="noreferrer noopener"><strong><em>UCMeta</em></strong></a><em>&nbsp;at the University of Canterbury.&nbsp;He can be reached at&nbsp;</em><a href="mailto:bob.reed@canterbury.ac.nz" target="_blank" rel="noreferrer noopener"><em>bob.reed@canterbury.ac.nz</em></a><em>.</em></p>



<h2 class="wp-block-heading has-black-color has-text-color has-link-color wp-elements-976e1016197ab2485148c451918af152" style="font-size:24px;line-height:1.4"><strong>References</strong></h2>



<p class="has-black-color has-text-color has-link-color wp-elements-d7f2681ee3ce30c923083aebec4f331a wp-block-paragraph" style="font-size:21px;line-height:1.4">Cumming, G. (2014). The new statistics: Why and how. <em>Psychological Science</em>, 25(1), 7–29. <a href="https://doi.org/10.1177/0956797613504966" target="_blank" rel="noreferrer noopener">https://doi.org/10.1177/0956797613504966</a></p>



<p class="has-black-color has-text-color has-link-color wp-elements-d6b5baa7e0e9bf2170728a0d912a4c7a wp-block-paragraph" style="font-size:21px;line-height:1.4">Doyle, M.-A., &amp; Feeney, L. (2021). Quick guide to power calculations. <a href="https://www.povertyactionlab.org/resource/quick-guide-power-calculations">https://www.povertyactionlab.org/</a>resource/quick-guide-power-calculations</p>



<p class="has-black-color has-text-color has-link-color wp-elements-d44f996ab842ae4ac747bba236034a14 wp-block-paragraph" style="font-size:21px;line-height:1.4">Gelman, A., &amp; Carlin, J. (2014). Beyond power calculations: Assessing type S (sign) and type M (magnitude) errors. <em>Perspectives on Psychological Science,</em> 9(6), 641–651. <a href="https://sites.stat.columbia.edu/gelman/research/published/retropower_final.pdf" target="_blank" rel="noreferrer noopener">https://sites.stat.columbia.edu/gelman/research/published/retropower_final.pdf</a></p>



<p class="has-black-color has-text-color has-link-color wp-elements-1c3d9cca328c34df67c7afba3e36fd7b wp-block-paragraph" style="font-size:21px;line-height:1.4">Goodman, S. N., &amp; Berlin, J. A. (1994). The use of predicted confidence intervals when planning experiments and the misuse of power when interpreting results. <em>Annals of Internal Medicine</em>, 121(3), 200–206. <a href="https://doi.org/10.7326/0003-4819-121-3-199408010-00008" target="_blank" rel="noreferrer noopener">https://doi.org/10.7326/0003-4819-121-3-199408010-00008</a></p>



<p class="has-black-color has-text-color has-link-color wp-elements-51ed2c32e9f29ae668356931889c3a05 wp-block-paragraph" style="font-size:21px;line-height:1.4">Hoenig, J. M., &amp; Heisey, D. M. (2001). The abuse of power: The pervasive fallacy of power calculations for data analysis. <em>The American Statistician</em>, 55(1), 19–24. <a href="https://doi.org/10.1198/000313001300339897" target="_blank" rel="noreferrer noopener">https://doi.org/10.1198/000313001300339897</a></p>



<p class="has-black-color has-text-color has-link-color wp-elements-b8d822fc0de67711e147e3fd1366730d wp-block-paragraph" style="font-size:21px;line-height:1.4">Maxwell, S. E., Kelley, K., &amp; Rausch, J. R. (2008). Sample size planning for statistical power and accuracy in parameter estimation. <em>Annual Review of Psychology</em>, 59(1), 537–563. <a href="https://doi.org/10.1146/annurev.psych.59.103006.093735" target="_blank" rel="noreferrer noopener">https://doi.org/10.1146/annurev.psych.59.103006.093735</a></p>



<p class="has-black-color has-text-color has-link-color wp-elements-eeb9bfc45a493eec21a13c5c457aea1d wp-block-paragraph" style="font-size:21px;line-height:1.4">McKenzie, D., &amp; Ozier, O. (2019, May 16). <em>Why ex-post power using estimated effect sizes is bad, but an ex-post MDE is not</em>. <em>Development Impact</em> (World Bank Blog). <a href="https://blogs.worldbank.org/en/impactevaluations/why-ex-post-power-using-estimated-effect-sizes-bad-ex-post-mde-not?utm_source=chatgpt.com">https://blogs.worldbank.org/en/impactevaluations/why-ex-post-power-using-estimated-effect-sizes-bad-ex-post-mde-not</a></p>



<p class="has-black-color has-text-color has-link-color wp-elements-a69e823b8abe7a864114f69e29731783 wp-block-paragraph" style="font-size:21px;line-height:1.4">Tian, J., Coupé, T., Khatua, S., Reed, W. R., &amp; Wood, B. D. (2025). Power to the researchers: Calculating power after estimation.&nbsp;<em>Review of Development Economics</em>,&nbsp;<em>29</em>(1), 324-358. <a href="https://doi.org/10.1111/rode.13130" target="_blank" rel="noreferrer noopener">https://doi.org/10.1111/rode.13130</a></p>



<p class="wp-block-paragraph"></p>
]]></content:encoded>
					
					<wfw:commentRss>https://replicationnetwork.com/2025/08/29/you-can-calculate-power-retrospectively-just-dont-use-observed-power/feed/</wfw:commentRss>
			<slash:comments>0</slash:comments>
		
		
		<post-id xmlns="com-wordpress:feed-additions:1">7896</post-id>
		<media:content url="https://2.gravatar.com/avatar/e64ee14e2ee86f72b7681f00e27a78273ca0707a414bfd21f792423c54038664?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">replicationnetwork</media:title>
		</media:content>

		<media:content url="https://replicationnetwork.com/wp-content/uploads/2025/08/image-1.png?w=488" medium="image" />

		<media:content url="https://replicationnetwork.com/wp-content/uploads/2025/08/image-2.png?w=1024" medium="image" />

		<media:content url="https://replicationnetwork.com/wp-content/uploads/2025/08/image-3.png?w=928" medium="image" />
	</item>
		<item>
		<title>ROODMAN: Appeal to Me &#8211; First Trial of a “Replication Opinion”</title>
		<link>https://replicationnetwork.com/2025/05/31/roodman-appeal-to-me-first-trial-of-a-replication-opinion/</link>
					<comments>https://replicationnetwork.com/2025/05/31/roodman-appeal-to-me-first-trial-of-a-replication-opinion/#respond</comments>
		
		<dc:creator><![CDATA[replicationnetwork]]></dc:creator>
		<pubDate>Sat, 31 May 2025 05:16:00 +0000</pubDate>
				<category><![CDATA[GUEST BLOGS]]></category>
		<category><![CDATA[Academic incentives]]></category>
		<category><![CDATA[Comments]]></category>
		<category><![CDATA[economics]]></category>
		<category><![CDATA[Evidence-based policy]]></category>
		<category><![CDATA[Journal policy]]></category>
		<category><![CDATA[Meta-Science]]></category>
		<category><![CDATA[Open Philanthropy]]></category>
		<category><![CDATA[peer review]]></category>
		<category><![CDATA[replications]]></category>
		<category><![CDATA[Truth-seeking]]></category>
		<guid isPermaLink="false">http://replicationnetwork.com/?p=7862</guid>

					<description><![CDATA[[This blog is a repost of a blog that first appeared at davidroodman.com. It is republished here with permission from the author.] My employer, Open Philanthropy, strives to make grants in light of evidence. Of course, many uncertainties in our...]]></description>
										<content:encoded><![CDATA[
<p class="has-black-color has-text-color has-link-color wp-elements-85263cccc1ad89d8ffcc3d7a681fd387 wp-block-paragraph" style="font-size:21px;line-height:1.4"><em>[This blog is a repost of a blog that first appeared at <a href="https://davidroodman.com/blog/2025/05/09/appeal-to-me-first-trial-of-a-replication-opinion/" target="_blank" rel="noreferrer noopener"><strong><span style="text-decoration: underline">davidroodman.com</span></strong></a>. It is republished here with permission from the author.]</em></p>



<p class="has-black-color has-text-color has-link-color wp-elements-bc7e0f2b9c7d717638651b9af755bd81 wp-block-paragraph" style="font-size:21px;line-height:1.4">My employer, Open Philanthropy, strives to make grants in light of evidence. Of course, many uncertainties in our decision-making are irreducible. No amount of thumbing through peer-reviewed journals will tell us how great a threat AI will pose decades hence, or whether a group we fund will get a vaccine to market or a bill to the governor’s desk. But we have checked journals for insight into many topics, such as the&nbsp;<a href="https://www.openphilanthropy.org/research/geomagnetic-storms-an-introduction-to-the-risk/" target="_blank" rel="noreferrer noopener"><strong><span style="text-decoration: underline"><em>odds of a grid-destabilizing geomagnetic storm</em></span></strong></a>, and how much&nbsp;<a href="https://www.openphilanthropy.org/research/does-putting-kids-in-school-now-put-money-in-their-pockets-later-revisiting-a-natural-experiment-in-indonesia/" target="_blank" rel="noreferrer noopener"><strong><span style="text-decoration: underline"><em>building new schools boosts what kids earn when they grow up</em></span></strong></a>.</p>



<p class="has-black-color has-text-color has-link-color wp-elements-3897095c0afb9fe0b478378bfca8b691 wp-block-paragraph" style="font-size:21px;line-height:1.4">When we draw on research, we vet it in rare depth (as does GiveWell, from which we spun off). I have sometimes spent months replicating and reanalyzing a key study—checking for bugs in the computer code, thinking about how I would run the numbers differently and how I would interpret the results. This interface between research and practice might seem like a picture of harmony, since researchers want their work to guide decision-making for the public good and decision-makers like Open Philanthropy want to receive such guidance.</p>



<p class="has-black-color has-text-color has-link-color wp-elements-55b22f827ff95b35fe6b314b412e6df9 wp-block-paragraph" style="font-size:21px;line-height:1.4">Yet I have come to see how cultural misunderstandings prevail at this interface. From my side, there are two problems. First, about half the time I reanalyze a study, I find that there are important bugs in the code, or that adding more data makes the mathematical finding go away, or that there’s a compelling alternative explanation for the results. (Caveat: most of my experience is with non-randomized studies.)</p>



<p class="has-black-color has-text-color has-link-color wp-elements-557a9a4687ffaba435f3040aba963b60 wp-block-paragraph" style="font-size:21px;line-height:1.4">Second, when I send my critical findings to the journal that peer-reviewed and published the original research, the editors usually don’t seem interested (<a href="https://www.journals.uchicago.edu/doi/10.1086/732254" target="_blank" rel="noreferrer noopener"><strong><span style="text-decoration: underline"><em>recent exception</em></span></strong></a>).</p>



<p class="has-black-color has-text-color has-link-color wp-elements-cb85b631049a3e6d53928cdc8b5955a7 wp-block-paragraph" style="font-size:21px;line-height:1.4">Seeing the ivory tower as a bastion of truth-seeking, I used to be surprised. I understand now that, because of how the academy works, in particular, because of how the individuals within academia respond to incentives beyond their control, we consumers of research are sometimes more truth-seeking than the producers.</p>



<p class="has-black-color has-text-color has-link-color wp-elements-72945ed3fba9cd0f36bd1acc8a61fbdf wp-block-paragraph" style="font-size:21px;line-height:1.4">Last fall I read a tiny illustration of the second problem, and it inspired me to try something new. Dartmouth economist Paul Novosad tweeted his pique with economics journals over how they handle challenges to published papers:</p>



<figure class="wp-block-image size-large is-resized"><a href="https://replicationnetwork.com/wp-content/uploads/2025/05/image.png"><img loading="lazy" width="689" height="865" data-attachment-id="7865" data-permalink="https://replicationnetwork.com/2025/05/31/roodman-appeal-to-me-first-trial-of-a-replication-opinion/image-116/" data-orig-file="https://replicationnetwork.com/wp-content/uploads/2025/05/image.png" data-orig-size="689,865" data-comments-opened="1" data-image-meta="{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}" data-image-title="image" data-image-description="" data-image-caption="" data-medium-file="https://replicationnetwork.com/wp-content/uploads/2025/05/image.png?w=239" data-large-file="https://replicationnetwork.com/wp-content/uploads/2025/05/image.png?w=604" src="https://replicationnetwork.com/wp-content/uploads/2025/05/image.png?w=689" alt="" class="wp-image-7865" style="width:619px;height:auto" srcset="https://replicationnetwork.com/wp-content/uploads/2025/05/image.png 689w, https://replicationnetwork.com/wp-content/uploads/2025/05/image.png?w=119 119w, https://replicationnetwork.com/wp-content/uploads/2025/05/image.png?w=239 239w" sizes="(max-width: 689px) 100vw, 689px" /></a></figure>



<p class="has-black-color has-text-color has-link-color wp-elements-7f75346e2f1988e057cef12a8d99f124 wp-block-paragraph" style="font-size:21px;line-height:1.4">As you might glean from the truncated screenshots, the starting point for debate is a&nbsp;<a href="https://doi.org/10.1257/app.20170223" target="_blank" rel="noreferrer noopener"><strong><span style="text-decoration: underline"><em>paper published in 2019</em></span></strong></a>. It finds that U.S. immigration judges were less likely to grant asylum on warmer days. For each 10°F the temperature went up, the chance of winning asylum went down 1 percentage point.</p>



<p class="has-black-color has-text-color has-link-color wp-elements-9269d54c074a5742df9ab02daf6c672b wp-block-paragraph" style="font-size:21px;line-height:1.4">The&nbsp;<a href="https://doi.org/10.1257/app.20200118" target="_blank" rel="noreferrer noopener"><strong><span style="text-decoration: underline"><em>critique</em></span></strong></a>&nbsp;was written by another academic. It fixes errors in the original paper, expands the data set, and finds no such link from heat to grace. In the&nbsp;<a href="https://doi.org/10.1257/app.20200068" target="_blank" rel="noreferrer noopener"><strong><span style="text-decoration: underline"><em>rejoinder</em></span></strong></a>, the original authors acknowledge errors but say their conclusion stands. “AEJ” (<em>American Economic Journal: Applied Economic</em>s) published all three articles in the debate. As you can see, the dueling abstracts confused even an expert.</p>



<p class="has-black-color has-text-color has-link-color wp-elements-f423946b04098ebf893438916ecb2734 wp-block-paragraph" style="font-size:21px;line-height:1.4">So I appointed myself&nbsp;<em>judge</em>&nbsp;in the case. Which I’ve never seen anyone do before, at least not so formally. I did my best to hear out both sides (though the “hearing” was reading), then identify and probe key points of disagreement. I figured my take would be more independent and credible than anything either party to the debate could write. I hoped to demonstrate and think about how academia sometimes struggles to serve the cause of truth-seeking. And I could experiment with this new form as one way to improve matters.</p>



<p class="has-black-color has-text-color has-link-color wp-elements-dac6f5aac111be21fa2222ab35bd7a3a wp-block-paragraph" style="font-size:21px;line-height:1.4">I just filed my opinion, which is to say, the Institute for Replication has&nbsp;<a href="https://www.econstor.eu/handle/10419/316399" target="_blank" rel="noreferrer noopener"><strong><span style="text-decoration: underline"><em>posted it</em></span></strong></a>. (Open Philanthropy&nbsp;<a href="https://www.openphilanthropy.org/grants/university-of-ottawa-institute-for-replication/" target="_blank" rel="noreferrer noopener"><strong><span style="text-decoration: underline"><em>partly funds</em></span></strong></a>&nbsp;them.) My colleague&nbsp;<a href="https://www.openphilanthropy.org/about/team/matt-clancy/" target="_blank" rel="noreferrer noopener"><strong><span style="text-decoration: underline"><em>Matt Clancy</em></span></strong></a>&nbsp;has pioneered&nbsp;<a href="https://www.newthingsunderthesun.com/about" target="_blank" rel="noreferrer noopener"><strong><span style="text-decoration: underline"><em>living literature reviews</em></span></strong></a>; he suggested that I make this opinion a living document as well. If either party to the debate, or anyone else, changes my mind about anything in the opinion, I will&nbsp;<a href="https://github.com/droodman/RO-Heyes-Saberian-2019/releases" target="_blank" rel="noreferrer noopener"><strong><span style="text-decoration: underline"><em>revise it</em></span></strong></a>&nbsp;while preserving the history.</p>



<p class="has-black-color has-text-color has-link-color wp-elements-56cde97dc88003baf145bcdc30287763 wp-block-paragraph" style="font-size:24px;line-height:1.4"><strong>Verdict</strong></p>



<p class="has-black-color has-text-color has-link-color wp-elements-ca2a98073c640d3006d24c0f03a39a4e wp-block-paragraph" style="font-size:21px;line-height:1.4">My conclusion was more one-sided than I had expected. I came down in favor of the commenter. The authors of the original paper defend their finding by arguing that in retrospect they should have excluded the quarter of their sample consisting of asylum applications filed by people from&nbsp;<em>China</em>. Yes, they concede, correcting the errors mostly erases their original finding. But it reappears after Chinese are excluded.</p>



<p class="has-black-color has-text-color has-link-color wp-elements-bdfdc05303f564e2ea1a6407a5fe3fd3 wp-block-paragraph" style="font-size:21px;line-height:1.4">This argument did not persuade me. True, during the period of this study, 2000–04, most Chinese asylum-seekers applied under a&nbsp;<a href="https://www.govinfo.gov/content/pkg/PLAW-104publ208/pdf/PLAW-104publ208.pdf#page=690" target="_blank" rel="noreferrer noopener"><strong><span style="text-decoration: underline"><em>special U.S. law</em></span></strong></a>&nbsp;meant to give safe harbor to women fearing forced sterilization and abortion in their home country.</p>



<p class="has-black-color has-text-color has-link-color wp-elements-15c38da38043101f7a23ed02b3855bc2 wp-block-paragraph" style="font-size:21px;line-height:1.4">The authors seem to argue that because grounds for asylum were more demonstrable in these cases—anyone&nbsp;<a href="https://www.theguardian.com/books/2013/may/06/chinas-barbaric-one-child-policy" target="_blank" rel="noreferrer noopener"><strong><span style="text-decoration: underline"><em>could read</em></span></strong></a>&nbsp;about the draconian enforcement of China’s one-child policy—immigration judges effectively lacked much discretion. And if outdoor temperature couldn’t meaningfully affect their decisions, the cases were best dropped from a study of precisely that connection.</p>



<p class="has-black-color has-text-color has-link-color wp-elements-1cc8569329d5cb593344576b5721319a wp-block-paragraph" style="font-size:21px;line-height:1.4">But this premise is flatly contradicted by a&nbsp;<a href="https://stanfordlawreview.org/wp-content/uploads/sites/3/2010/04/RefugeeRoulette.pdf" target="_blank" rel="noreferrer noopener"><strong><span style="text-decoration: underline"><em>study the authors cite</em></span></strong></a>&nbsp;called “Refugee Roulette.” In the study, Figure 6 shows that judges differed widely in how often they granted asylum to Chinese applicants. One did so less than 5% of the time, another more than 90%, and the rest were spread evenly between. (For a more thorough discussion, read sections 4.4 and 6.1 of my&nbsp;<a href="https://www.econstor.eu/handle/10419/316399" target="_blank" rel="noreferrer noopener"><strong><span style="text-decoration: underline"><em>opinion</em></span></strong></a>.)</p>



<p class="has-black-color has-text-color has-link-color wp-elements-f86e02d1a6a83b704156dfe20941c1a5 wp-block-paragraph" style="font-size:21px;line-height:1.4">Thus while I do not dispute that there is a correlation between temperature and asylum grants in a particular subset of the data, I think it is best explained by&nbsp;<a href="https://doi.org/10.1037/a0033242" target="_blank" rel="noreferrer noopener"><strong><span style="text-decoration: underline"><em>p-hackin</em></span></strong></a><a href="https://doi.org/10.1037/a0033242">g</a>&nbsp;or some other form of “filtration,” in which,&nbsp;<a href="https://doi.org/10.1511/2014.111.460" target="_blank" rel="noreferrer noopener"><strong><span style="text-decoration: underline"><em>consciously or not</em></span></strong></a>, researchers gravitate toward results that happen to look statistically significant. (In fairness, they know that peer reviewers, editors, and readers gravitate to the same sorts of results, and getting a paper into a good journal can make a career.)</p>



<p class="has-black-color has-text-color has-link-color wp-elements-559fcce85a563c9f85678aae8c09042f wp-block-paragraph" style="font-size:21px;line-height:1.4">The nature of the defense raises a question about how the journal handled the dispute. It published the original authors’ rejoinder&nbsp;<a href="https://www.aeaweb.org/articles?id=10.1257/app.20200068" target="_blank" rel="noreferrer noopener"><strong><span style="text-decoration: underline"><em>as a Correction</em></span></strong></a><a href="https://www.aeaweb.org/articles?id=10.1257/app.20200068">.</a>&nbsp;Yet, while one might agree that it is&nbsp;<em>better</em>&nbsp;to exclude Chinese from the analysis, I think their inclusion in the original was not an&nbsp;<em>error</em>, and therefore their exclusion is not a&nbsp;<em>correction</em>. Thus, one way the journal might have headed off Novosad’s befuddlement would have been by insisting that Corrections only make corrections.</p>



<p class="has-black-color has-text-color has-link-color wp-elements-5db06868ca6491a368ce16c2640d16e9 wp-block-paragraph" style="font-size:24px;line-height:1.4"><strong>What’s wrong with this picture?</strong></p>



<p class="has-black-color has-text-color has-link-color wp-elements-5be284053205ce41821f0d0d2ac4361d wp-block-paragraph" style="font-size:21px;line-height:1.4">To recap:</p>



<div class="wp-block-group"><div class="wp-block-group__inner-container is-layout-constrained wp-block-group-is-layout-constrained">
<div class="wp-block-group"><div class="wp-block-group__inner-container is-layout-constrained wp-block-group-is-layout-constrained">
<p class="has-black-color has-text-color has-link-color wp-elements-2b470f2dafabd1b6fd350e62d1b518db wp-block-paragraph" style="font-size:21px;line-height:1.4">&#8211; <em>Two economists performed a quantitative analysis of a clever, novel question.</em></p>



<div class="wp-block-group"><div class="wp-block-group__inner-container is-layout-constrained wp-block-group-is-layout-constrained">
<p class="has-black-color has-text-color has-link-color wp-elements-92a4faee1b4211cf8c5ba386b73abe9e wp-block-paragraph" style="font-size:21px;line-height:1.4">&#8211; <em>It underwent peer review.</em></p>
</div></div>



<p class="has-black-color has-text-color has-link-color wp-elements-b9e563e7f749e62a5ef18c03a5098a33 wp-block-paragraph" style="font-size:21px;line-height:1.4">&#8211; <em>It was published in one of the&nbsp;</em><a href="https://www.pjip.org/Economics-journal-rankings.html" target="_blank" rel="noreferrer noopener"><strong><span style="text-decoration: underline"><em>top journals in economics</em></span></strong></a><em>. Its data and computer code were&nbsp;<a href="https://www.openicpsr.org/openicpsr/project/113722/version/V1/view" target="_blank" rel="noreferrer noopener"><strong><span style="text-decoration: underline">posted online</span></strong></a>, per the journal’s&nbsp;<a href="https://www.aeaweb.org/journals/data" target="_blank" rel="noreferrer noopener"><strong><span style="text-decoration: underline">policy</span></strong></a></em></p>



<p class="has-black-color has-text-color has-link-color wp-elements-b63da1f4744f09a229bbb771401a8dea wp-block-paragraph" style="font-size:21px;line-height:1.4">&#8211; <em>Another researcher&nbsp;<a href="https://papers.ssrn.com/sol3/papers.cfm?abstract_id=3645463" target="_blank" rel="noreferrer noopener"><strong><span style="text-decoration: underline">promptly responded</span></strong></a>&nbsp;that the analysis contains errors (such as computing average daytime temperature with respect to Greenwich time rather than local time), and that it could have been done on a much larger data set (for 1990 to ~2019 instead of 2000–04). These changes make the headline findings go away.</em></p>



<p class="has-black-color has-text-color has-link-color wp-elements-c0a7cbfd45d222fc6a76c2bdacbd27a2 wp-block-paragraph" style="font-size:21px;line-height:1.4">&#8211; <em>After behind-the-scenes back and forth among the disputants and editors, the journal published the comment and rejoinder.</em></p>



<p class="has-black-color has-text-color has-link-color wp-elements-75f11bae56036c087cf9adec9dc858dc wp-block-paragraph" style="font-size:21px;line-height:1.4">&#8211; <em>These new articles confused even an expert.</em></p>



<p class="has-black-color has-text-color has-link-color wp-elements-2a6edd229d13b23f51051676d37e7a70 wp-block-paragraph" style="font-size:21px;line-height:1.4">&#8211; <em>An outsider (me) delved into the debate and found that it’s actually a pretty easy call.</em></p>
</div></div>
</div></div>



<p class="has-black-color has-text-color has-link-color wp-elements-ef0b68f7c19ffe97ec33b14176f970f9 wp-block-paragraph" style="font-size:21px;line-height:1.4">If you score the journal on whether it successfully illuminated its readership as to the truth, then I think it is kind of 0 for 2.</p>



<p class="has-black-color has-text-color has-link-color wp-elements-4b9ffad620417c99842264679092b0e8 wp-block-paragraph" style="font-size:21px;line-height:1.4">[Update: I submitted the opinion to the journal, which promptly rejected it. I understand that the submission was an odd duck. But if I&#8217;m being harsh I can raise the count to 0 for 3.]</p>



<p class="has-black-color has-text-color has-link-color wp-elements-6415662e9736ceb3e60eac6303918d4c wp-block-paragraph" style="font-size:21px;line-height:1.4">That said,&nbsp;<em>AEJ Applied</em>&nbsp;did support dialogue between economists that eventually brought the truth out. In particular, by requiring public posting of data and code (an area where this journal and its siblings have been pioneers), it facilitated rapid scrutiny.</p>



<p class="has-black-color has-text-color has-link-color wp-elements-57d5288f38054a573a7172d4db497ca8 wp-block-paragraph" style="font-size:21px;line-height:1.4">Still, it bears emphasizing:<em>&nbsp;For quality assurance, the data sharing was much more valuable than the peer review</em>. And, whether for lack of time or reluctance to take sides, the journal’s handling of the dispute obscured the truth.</p>



<p class="has-black-color has-text-color has-link-color wp-elements-a9e54fde3131764b40314b065d1eb3a3 wp-block-paragraph" style="font-size:21px;line-height:1.4">My purpose in examining this example is not to call down a thunderbolt on anyone, from the Olympian heights of a funding body. It is rather to use a concrete story to illustrate the larger patterns I mentioned earlier.</p>



<p class="has-black-color has-text-color has-link-color wp-elements-0188d522c009840834109d781ab883f0 wp-block-paragraph" style="font-size:21px;line-height:1.4">Despite having undergone peer review, many published studies in the social sciences and epidemiology do not withstand close scrutiny. When they are challenged, journal editors have a hard time managing the debate in a way that produces more light than heat.</p>



<p class="has-black-color has-text-color has-link-color wp-elements-687d64d053d31a51f48844df3e355dc4 wp-block-paragraph" style="font-size:21px;line-height:1.4">I have critiqued papers about the impact of&nbsp;<a href="https://www.jstor.org/stable/3592954" target="_blank" rel="noreferrer noopener"><strong><span style="text-decoration: underline"><em>foreign aid</em></span></strong></a>,&nbsp;<a href="https://www.tandfonline.com/doi/abs/10.1080/00220388.2013.858122" target="_blank" rel="noreferrer noopener"><strong><span style="text-decoration: underline"><em>microcredit</em></span></strong></a>,&nbsp;<a href="https://retractionwatch.com/2012/06/29/authors-retract-plos-medicine-foreign-health-aid-paper-that-had-criticized-earlier-lancet-study/" target="_blank" rel="noreferrer noopener"><strong><span style="text-decoration: underline"><em>foreign aid</em></span></strong></a>,&nbsp;<a href="http://blog.givewell.org/2017/12/07/questioning-evidence-hookworm-eradication-american-south/" target="_blank" rel="noreferrer noopener"><strong><span style="text-decoration: underline"><em>deworming</em></span></strong></a>,&nbsp;<a href="https://blog.givewell.org/2017/12/29/revisiting-evidence-malaria-eradication-americas/" target="_blank" rel="noreferrer noopener"><span style="text-decoration: underline"><strong><em>malaria eradication</em></strong></span></a>,&nbsp;<a href="https://www.thelancet.com/journals/lancet/article/PIIS0140-6736(12)61529-3/fulltext" target="_blank" rel="noreferrer noopener"><strong><span style="text-decoration: underline"><em>foreign aid</em></span></strong></a>,&nbsp;<a href="https://www.openphilanthropy.org/research/geomagnetic-storms-historys-surprising-if-tentative-reassurance/" target="_blank" rel="noreferrer noopener"><strong><span style="text-decoration: underline"><em>geomagnetic storm risk</em></span></strong></a>,&nbsp;<a href="https://www.openphilanthropy.org/research/reasonable-doubt-a-new-look-at-whether-prison-growth-cuts-crime/" target="_blank" rel="noreferrer noopener"><strong><span style="text-decoration: underline"><em>incarceration</em></span></strong></a>,&nbsp;<a href="https://www.openphilanthropy.org/research/does-putting-kids-in-school-now-put-money-in-their-pockets-later-revisiting-a-natural-experiment-in-indonesia/" target="_blank" rel="noreferrer noopener"><strong><span style="text-decoration: underline"><em>schooling</em></span></strong></a>,&nbsp;<a href="https://arxiv.org/abs/2303.11956" target="_blank" rel="noreferrer noopener"><strong><span style="text-decoration: underline"><em>more schooling</em></span></strong></a>,&nbsp;<a href="https://arxiv.org/abs/2401.13694" target="_blank" rel="noreferrer noopener"><strong><span style="text-decoration: underline"><em>broadband</em></span></strong></a>,&nbsp;<a href="https://doi.org/10.1177/1091142114537895" target="_blank" rel="noreferrer noopener"><strong><span style="text-decoration: underline"><em>foreign aid</em></span></strong></a>,&nbsp;<a href="https://papers.ssrn.com/abstract=4294284" target="_blank" rel="noreferrer noopener"><strong><span style="text-decoration: underline"><em>malnutrition</em></span></strong></a>, ….</p>



<p class="has-black-color has-text-color has-link-color wp-elements-91a750a3031604fcc4fa80006819d6c3 wp-block-paragraph" style="font-size:21px;line-height:1.4">Many of those critiques I have submitted to journals, typically only to receive polite rejections. I obviously lack objectivity. But it has struck me as strange that, in these instances, we on the outside of academia seem more concerned about getting to the truth than those on the inside. Sometimes I’ve wished I could appeal to an independent authority to review a case and either validate my take or put me in my place.</p>



<p class="has-black-color has-text-color has-link-color wp-elements-0c01b83ed2b476e2429d1ae9d117f320 wp-block-paragraph" style="font-size:21px;line-height:1.4"><em>That</em>&nbsp;yearning is what primed me to respond to Novosad’s tweet by donning the robe of a judge myself. (I passed on the wig.)</p>



<p class="has-black-color has-text-color has-link-color wp-elements-c14d9e32e4ea64d682924e37cf99e37f wp-block-paragraph" style="font-size:21px;line-height:1.4">I’ve never edited a journal, but I’ve talked to people who have, and I have some idea of what is going on. Editors juggle many considerations besides squeezing maximum truth juice out of any particular study. Fully grasping a replication debate takes work—imagine the parties lobbing 25-page missives at each other, dense with equations, tables, and graphs—and editors are busy.</p>



<p class="has-black-color has-text-color has-link-color wp-elements-306840ccf3c2ec3f1e3d42d39a6aee42 wp-block-paragraph" style="font-size:21px;line-height:1.4">Published comments don’t get&nbsp;<a href="https://econjwatch.org/articles/decline-in-critical-commentary-1963-2004" target="_blank" rel="noreferrer noopener"><strong><span style="text-decoration: underline"><em>cited</em></span></strong></a>&nbsp;<a href="https://doi.org/10.1111/ecin.13222" target="_blank" rel="noreferrer noopener"><strong><span style="text-decoration: underline"><em>much</em></span></strong></a>&nbsp;anyway, and editors keep an eye on&nbsp;<a href="https://www.pjip.org/Economics-journal-rankings.html" target="_blank" rel="noreferrer noopener"><strong><span style="text-decoration: underline"><em>how much their journals get cited</em></span></strong></a>. They may also weigh the personal costs for the people whose reputations are at stake. Many journals, especially those published by professional associations, want to be open to all comers—to be the moderator, not the panelist, the platform, not the content provider.</p>



<p class="has-black-color has-text-color has-link-color wp-elements-bbc3de346ccae87f6d6f155e1d0a4a5c wp-block-paragraph" style="font-size:21px;line-height:1.4">The job they set for themselves is not quite to assess the reliability of any given study (a tall order) but to certify that each article meets a minimum standard, to support the collective dialogue through which humanity seeks scientific truth.</p>



<p class="has-black-color has-text-color has-link-color wp-elements-b4edfbd5fe91177f54a57313b6e21490 wp-block-paragraph" style="font-size:21px;line-height:1.4">Then, too, I think journal editors often care a lot about whether a paper makes a “contribution” such as a novel question, data source, or analytical method. Closer to home, junior editors may think twice before welcoming criticism that could harm the reputation of their journal or ruffle the feathers of more powerful members of their flock. Senior editors may have gotten where they are by thinking in the same, savvy way.</p>



<p class="has-black-color has-text-color has-link-color wp-elements-1a3161a465b2170ee16b4f6e90b0e00a wp-block-paragraph" style="font-size:21px;line-height:1.4">Modern science is the best system ever developed for pursuing truth. But it is still populated by human beings (<a href="https://research.google/blog/accelerating-scientific-breakthroughs-with-an-ai-co-scientist/" target="_blank" rel="noreferrer noopener"><strong><span style="text-decoration: underline"><em>for how much longer?</em></span></strong></a>) whose cognitive wiring makes the process uncomfortable and imperfect. Humans are tribal creatures—not as wired for selflessness as your average ant, but more apt to go to war than an eel or an eagle.</p>



<p class="has-black-color has-text-color has-link-color wp-elements-ddd90283bd75f9a89d1133b511431220 wp-block-paragraph" style="font-size:21px;line-height:1.4">Among the bits of psychological glue that bind us are shared ideas about “is” and “ought.” Imperialists and evangelists have long influenced shared ideas in order to expand and solidify the groups over which they hold sway. The links between belief, belonging, and power are why the notion that evidence trumps belief was so revolutionary when the Roman church sent Galileo to his death, and why the idea, despite making modernity possible, remains discomfiting to this day.</p>



<p class="has-black-color has-text-color has-link-color wp-elements-5ea7e3b4c7f78fbd1773e7c9ff144f24 wp-block-paragraph" style="font-size:21px;line-height:1.4">The inefficiency in pursuing truth has real costs for society. Some social science research influences decisions by private philanthropies and public agencies, decisions whose stakes can be measured in human lives, or in the millions, billions, even trillions of dollars. Yet individual studies receive perhaps hundreds of dollars worth of time in peer review, and that within a system in which getting each paper as close as possible to the truth is one of several competing priorities.</p>



<p class="has-black-color has-text-color has-link-color wp-elements-74b51c2a6da86a0f088d93acbb57dca4 wp-block-paragraph" style="font-size:21px;line-height:1.4">Making science work better is the stuff of&nbsp;<em>metascience</em>, an area in which Open Philanthropy&nbsp;<a href="https://www.openphilanthropy.org/grants/?focus-area=innovation-policy" target="_blank" rel="noreferrer noopener"><strong><span style="text-decoration: underline"><em>makes grants</em></span></strong></a>. It’s a big topic. Here, I’ll merely toss out the idea that if these new-fangled replication opinions were regularly produced, they could somewhat mitigate the structural deprecation of truth-seeking.</p>



<p class="has-black-color has-text-color has-link-color wp-elements-29c72de42b7451adedfcea0a4bc04217 wp-block-paragraph" style="font-size:21px;line-height:1.4">On the demand side—among decision-makers using research—replication opinions could improve the vetting of disputed studies, while efficiently targeting the ones that matter most. (Related idea&nbsp;<a href="https://www.chronicle.com/article/social-science-is-broken-heres-how-to-fix-it" target="_blank" rel="noreferrer noopener"><strong><span style="text-decoration: underline"><em>here</em></span></strong></a>.)</p>



<p class="has-black-color has-text-color has-link-color wp-elements-ef9eb077b83be62fc60429412a0b6575 wp-block-paragraph" style="font-size:21px;line-height:1.4">On the supply side, a heightened awareness that an “appeals court” could upstage journals in a role laypeople and policymakers expect them to fill—performing quality assurance on what they publish—could stimulate the journals to handle replication debates in a way that better serves their readers and society.</p>



<p class="has-black-color has-text-color has-link-color wp-elements-5a047ee94b56c1236a867b9a9ea4ba4c wp-block-paragraph" style="font-size:24px;line-height:1.4"><strong>Reflections on writing the replication opinion</strong></p>



<p class="has-black-color has-text-color has-link-color wp-elements-2fc7e8141569f8c92aa4599b285a3fef wp-block-paragraph" style="font-size:21px;line-height:1.4">Writing a novel piece led me to novel questions. To prepare for writing my opinion, I read about how judges&nbsp;<a href="https://www.fjc.gov/content/judicial-writing-manual-pocket-guide-judges-second-edition" target="_blank" rel="noreferrer noopener"><strong><span style="text-decoration: underline"><em>write</em></span></strong></a>&nbsp;<a href="https://scholarship.law.umn.edu/mlr/1677" target="_blank" rel="noreferrer noopener"><strong><span style="text-decoration: underline"><em>theirs</em></span></strong></a>. Judicial opinions usually have a few standard sections. They review the history of the case (what happened to bring it about, what motions were filed); list agreed facts; frame the question to be decided; enunciate some standard that a party has to meet, perhaps handed down by the Supreme Court; and then bring the facts to the standard to reach a decision.</p>



<p class="has-black-color has-text-color has-link-color wp-elements-37648b32850d6fb4ebbb290ae24cad1d wp-block-paragraph" style="font-size:21px;line-height:1.4">Could I follow that outline? Reviewing the case history was easy enough. I had the papers and could inventory their technical points. The data and computer code behind the papers are&nbsp;<a href="https://www.openicpsr.org/openicpsr/project/113722/version/V1/view" target="_blank" rel="noreferrer noopener"><strong><span style="text-decoration: underline"><em>on</em></span></strong></a>&nbsp;<a href="https://doi.org/10.7910/DVN/3LOR3R" target="_blank" rel="noreferrer noopener"><strong><span style="text-decoration: underline"><em>the</em></span></strong></a>&nbsp;<a href="https://www.openicpsr.org/openicpsr/project/127263/version/V1/view" target="_blank" rel="noreferrer noopener"><strong><span style="text-decoration: underline"><em>web</em></span></strong></a>, so I could rerun the code and stipulate facts such as that a particular statistical procedure applied to a particular data set generates a particular output.</p>



<p class="has-black-color has-text-color has-link-color wp-elements-422c741d363f01eeb727c4b328411069 wp-block-paragraph" style="font-size:21px;line-height:1.4">Figuring out what I was trying to&nbsp;<em>judge</em>&nbsp;was harder. Surely it was not whether, for all people, places, and times, heat makes us less gracious. Nor should I try to decide that question even in the study’s context, which was U.S. asylum cases decided between 2000 and 2004.</p>



<p class="has-black-color has-text-color has-link-color wp-elements-20f7a151c1202e06aa8a19962c03e0f0 wp-block-paragraph" style="font-size:21px;line-height:1.4">Truth in the social sciences is rarely absolute. We use statistics precisely because we know that there is noise in every measurement, uncertainty in every finding. In addition, by&nbsp;<a href="https://www.youtube.com/watch?v=BrK7X_XlGB8" target="_blank" rel="noreferrer noopener"><strong><span style="text-decoration: underline"><em>Bayes’ Rule</em></span></strong></a>, the conclusions we draw from any one piece of evidence depend on the prior knowledge we bring to it, which is shaped by other evidence.</p>



<p class="has-black-color has-text-color has-link-color wp-elements-aa1d099161392cfb2d35c8bce34cc0f6 wp-block-paragraph" style="font-size:21px;line-height:1.4">Someone who has read 10 ironclad articles on how temperature affects asylum decisions should hardly be moved by one more. Yet I think those 10 other studies, if they existed, would lie beyond the scope of this case.</p>



<p class="has-black-color has-text-color has-link-color wp-elements-084ed5661848d757d1e81a924173ca65 wp-block-paragraph" style="font-size:21px;line-height:1.4">That means that my replication opinion is&nbsp;<em>not</em>&nbsp;about the effects of temperature on behavior in any setting. It’s more meta than that. It’s about how much this new paper should&nbsp;<em>shift or strengthen</em>&nbsp;one’s views on the question.</p>



<p class="has-black-color has-text-color has-link-color wp-elements-1ed884f86c3dbe72219e0afc33795d5e wp-block-paragraph" style="font-size:21px;line-height:1.4">After reflecting on these complications, here is what I decided to decide:&nbsp;<em>to the extent that a reasonable observer updated their priors after reading the original paper, how much should the subsequent debate reverse or strengthen that update?</em></p>



<p class="has-black-color has-text-color has-link-color wp-elements-8c939b9c92e4d8843193ebafb7e3bf55 wp-block-paragraph" style="font-size:21px;line-height:1.4">My judgment need not have been binary. Unlike a jury burdened with deciding guilt or innocence, a replication opinion can come down in the middle, again by Bayes’ Rule. Sometimes there is more than one reasonable way to run the numbers and more than one reasonable way to interpret the results.</p>



<p class="has-black-color has-text-color has-link-color wp-elements-fe023053050c9b1a5172be4737cc50b2 wp-block-paragraph" style="font-size:21px;line-height:1.4">I sought rubrics through which to organize my discussion—both to discipline my own reasoning and to set precedents, should I or anyone else do this again. I borrowed a<a href="https://onlinelibrary.wiley.com/doi/10.1111/joes.12139">&nbsp;</a><a href="https://onlinelibrary.wiley.com/doi/10.1111/joes.12139" target="_blank" rel="noreferrer noopener"><strong><span style="text-decoration: underline"><em>typology developed by former colleague Michael Clemens</em></span></strong></a>&nbsp;of the varieties of replication and robustness testing, as well as a typology of statistical issues from&nbsp;<a href="https://www.amazon.com/Experimental-Quasi-Experimental-Designs-Generalized-Inference/dp/0395615569" target="_blank" rel="noreferrer noopener"><strong><span style="text-decoration: underline"><em>Shadish, Cook, and Campbell</em></span></strong></a>.</p>



<p class="has-black-color has-text-color has-link-color wp-elements-5a0f5cc83ee9748e4d50ce9e2c1090e5 wp-block-paragraph" style="font-size:21px;line-height:1.4">And I made a list of study traits that we can expect to be associated, on average, with publication bias and other kinds of result filtration. For example, there is&nbsp;<a href="https://doi.org/10.1257/app.20150044" target="_blank" rel="noreferrer noopener"><strong><span style="text-decoration: underline"><em>evidence</em></span></strong></a>&nbsp;that in top journals, statistical results from junior economists, who are running the publish-or-perish gauntlet toward tenure, are more likely to report results that&nbsp;<em>just</em>&nbsp;clear conventional thresholds for statistical significance. That is consistent with the theory that the researchers on whom the system’s perverse incentives impinge most strongly are most apt to run the numbers several ways and emphasize the “significant” runs in their write-ups.</p>



<p class="has-black-color has-text-color has-link-color wp-elements-87b1e155729c8b7ef9547aaa6805b32c wp-block-paragraph" style="font-size:21px;line-height:1.4">One tricky issue was how much I should analyze the data myself. The upside could be more insight. The downside could be a loss of (perceived) objectivity if the self-appointed referee starts playing the game. Wisely or not, I gave myself&nbsp;<em>some</em>&nbsp;leeway here. Surely real judges also rely on their knowledge about the world, not just what the parties submit as evidence.</p>



<p class="has-black-color has-text-color has-link-color wp-elements-9c08c0eed35e80a9d1ae89a1770a3b32 wp-block-paragraph" style="font-size:21px;line-height:1.4">For example, in addition to its analysis of asylum decisions, the original paper checks whether the California parole board was less likely to grant parole on warmer days in 2012–15. Partly because the critical comment did not engage with this side-analysis, I revisited it myself. I transferred it to the next quadrennium, 2016–19, while changing the original computer code as little as possible. (Here, too, the apparent impact of temperature went away.)</p>



<p class="has-black-color has-text-color has-link-color wp-elements-bdee22a926e92c2385ba416b4f5d0d66 wp-block-paragraph" style="font-size:24px;line-height:1.4"><strong>Closing statement</strong></p>



<p class="has-black-color has-text-color has-link-color wp-elements-9c1786c21f12f7852f313dc28109f718 wp-block-paragraph" style="font-size:21px;line-height:1.4">The stakes in this case are probably low. While the question of how temperature affects human decision-making links broadly to climate change, and the arbitrariness of the American immigration system is a serious concern, I would be surprised if any important policy decision in the next few years turns on this research.</p>



<p class="has-black-color has-text-color has-link-color wp-elements-6ecb9941ea11f74c1e405d2a05f53eb1 wp-block-paragraph" style="font-size:21px;line-height:1.4">But the case illustrates a much larger problem. Some studies do influence important decisions. That they have been peer-reviewed should hardly reassure. Judicious&nbsp;<a href="https://statmodeling.stat.columbia.edu/2016/12/16/an-efficiency-argument-for-post-publication-review/" target="_blank" rel="noreferrer noopener"><strong><span style="text-decoration: underline"><em>post-publication review</em></span></strong></a>&nbsp;of important studies, perhaps including “replication opinions,” can give decision-makers with real dollars and real livelihoods on the line a clearer sense of what the data do and do not tell us.</p>



<p class="has-black-color has-text-color has-link-color wp-elements-6131480935831710bd87a395d09ac46f wp-block-paragraph" style="font-size:21px;line-height:1.4">Unfortunately, powerful incentives within academia, rooted in human nature, have generally discouraged such Socratic inquiry.</p>



<p class="has-black-color has-text-color has-link-color wp-elements-ba4398bef47442d56813be1f6905240b wp-block-paragraph" style="font-size:21px;line-height:1.4">I like to think of myself as judicious. As to whether I’ve lived up to my self-image&nbsp;<a href="https://www.econstor.eu/handle/10419/316399" target="_blank" rel="noreferrer noopener"><strong><span style="text-decoration: underline"><em>in this case</em></span></strong></a>, I will let you be the judge. At any rate, I figure that in the face of hard problems, it is good to try new things. We will see if this experiment is replicated, and if that does much good.</p>



<p class="has-black-color has-text-color has-link-color wp-elements-3c1c3064b3a1798aed7594fab8e2f7fb wp-block-paragraph" style="font-size:21px;line-height:1.4"><em>David Roodman is Senior Advisor at Open Philanthropy. He can be contacted at <a href="mailto:david@davidroodman.com" target="_blank" rel="noreferrer noopener"><strong><span style="text-decoration: underline">david@davidroodman.com</span></strong></a></em></p>



<p class="wp-block-paragraph"></p>
]]></content:encoded>
					
					<wfw:commentRss>https://replicationnetwork.com/2025/05/31/roodman-appeal-to-me-first-trial-of-a-replication-opinion/feed/</wfw:commentRss>
			<slash:comments>0</slash:comments>
		
		
		<post-id xmlns="com-wordpress:feed-additions:1">7862</post-id>
		<media:content url="https://2.gravatar.com/avatar/e64ee14e2ee86f72b7681f00e27a78273ca0707a414bfd21f792423c54038664?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">replicationnetwork</media:title>
		</media:content>

		<media:content url="https://replicationnetwork.com/wp-content/uploads/2025/05/image.png?w=689" medium="image" />
	</item>
		<item>
		<title>AoI*: “Introducing Synchronous Robustness Reports” by Bartos et al. (2025)</title>
		<link>https://replicationnetwork.com/2025/03/20/aoi-introducing-synchronous-robustness-reports-by-bartos-et-al-2025/</link>
					<comments>https://replicationnetwork.com/2025/03/20/aoi-introducing-synchronous-robustness-reports-by-bartos-et-al-2025/#respond</comments>
		
		<dc:creator><![CDATA[replicationnetwork]]></dc:creator>
		<pubDate>Wed, 19 Mar 2025 23:05:21 +0000</pubDate>
				<category><![CDATA[GUEST BLOGS]]></category>
		<category><![CDATA[FAIR data principles]]></category>
		<category><![CDATA[Journal policies]]></category>
		<category><![CDATA[Many-Analysts Approach]]></category>
		<category><![CDATA[Methodological diversity]]></category>
		<category><![CDATA[Publication workflow]]></category>
		<category><![CDATA[Robustness in scientific research]]></category>
		<category><![CDATA[TOP (Transparency and Openness Promotion) guidelines]]></category>
		<guid isPermaLink="false">http://replicationnetwork.com/?p=7856</guid>

					<description><![CDATA[[*AoI = “Articles of Interest” is a feature of TRN where we report abstracts of recent research related to replication and research integrity.] NOTE: The article is behind a firewall. ABSTRACT (taken from the article) “Most empirical research articles feature a...]]></description>
										<content:encoded><![CDATA[
<p class="has-black-color has-text-color has-link-color wp-elements-47b756873aa88b4198acacfcb6670665 wp-block-paragraph" style="font-size:21px;line-height:1.4"><em>[*AoI = “Articles of Interest” is a feature of TRN where we report abstracts of recent research related to replication and research integrity.]</em></p>



<p class="has-black-color has-text-color has-link-color wp-elements-f8cd573b67adfe418b5b434b5c8dfcfd wp-block-paragraph" style="font-size:21px;line-height:1.4"><span style="text-decoration: underline">NOTE</span>: The article is behind a firewall.</p>



<p class="has-black-color has-text-color has-link-color wp-elements-d60c5d1966d1993c69787c15c4f28333 wp-block-paragraph" style="font-size:24px;line-height:1.4"><strong>ABSTRACT (taken from <em><a href="https://www.nature.com/articles/s41562-025-02129-1" target="_blank" rel="noreferrer noopener"><span style="text-decoration: underline">the article</span></a></em>)</strong></p>



<p class="has-black-color has-text-color has-link-color wp-elements-46ff16fc4c9f208a2b36d6223fb21daa wp-block-paragraph" style="font-size:21px;line-height:1.4">“Most empirical research articles feature a single primary analysis that is conducted by the authors. However, different analysis teams usually adopt different analytical approaches and frequently reach varied conclusions. We propose synchronous robustness reports [SRRs] — brief reports that summarize the results of alternative analyses by independent experts — to strengthen the credibility of science.”</p>



<p class="has-black-color has-text-color has-link-color wp-elements-d4516c737167c7ff1fe59235e2e1f9cb wp-block-paragraph" style="font-size:21px;line-height:1.4">“To integrate SRRs seamlessly into the publication process, we suggest the framework outlined as a flowchart in Fig. 2. As the flowchart shows, the SRRs form a natural extension to the standard review process.”</p>



<figure class="wp-block-image size-large"><a href="https://replicationnetwork.com/wp-content/uploads/2025/03/image.png"><img loading="lazy" width="944" height="707" data-attachment-id="7857" data-permalink="https://replicationnetwork.com/2025/03/20/aoi-introducing-synchronous-robustness-reports-by-bartos-et-al-2025/image-115/" data-orig-file="https://replicationnetwork.com/wp-content/uploads/2025/03/image.png" data-orig-size="944,707" data-comments-opened="1" data-image-meta="{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}" data-image-title="image" data-image-description="" data-image-caption="" data-medium-file="https://replicationnetwork.com/wp-content/uploads/2025/03/image.png?w=300" data-large-file="https://replicationnetwork.com/wp-content/uploads/2025/03/image.png?w=604" src="https://replicationnetwork.com/wp-content/uploads/2025/03/image.png?w=944" alt="" class="wp-image-7857" srcset="https://replicationnetwork.com/wp-content/uploads/2025/03/image.png 944w, https://replicationnetwork.com/wp-content/uploads/2025/03/image.png?w=150 150w, https://replicationnetwork.com/wp-content/uploads/2025/03/image.png?w=300 300w, https://replicationnetwork.com/wp-content/uploads/2025/03/image.png?w=768 768w" sizes="(max-width: 944px) 100vw, 944px" /></a></figure>



<p class="has-black-color has-text-color has-link-color wp-elements-a422884e54beca493d61c6aa49cdc336 wp-block-paragraph" style="font-size:24px;line-height:1.4"><strong>REFERENCE</strong></p>



<p class="has-black-color has-text-color has-link-color wp-elements-d0278f3972131d73835566658c5075da wp-block-paragraph" style="font-size:21px;line-height:1.4"><a href="https://doi.org/10.1038/s41562-025-02129-1" target="_blank" rel="noreferrer noopener"><span style="text-decoration: underline">Bartoš, F., Sarafoglou, A., Aczel, B. <em>et al.</em> Introducing synchronous robustness reports. <em>Nat Hum Behav</em> (2025). https://doi.org/10.1038/s41562-025-02129-1</span></a></p>
]]></content:encoded>
					
					<wfw:commentRss>https://replicationnetwork.com/2025/03/20/aoi-introducing-synchronous-robustness-reports-by-bartos-et-al-2025/feed/</wfw:commentRss>
			<slash:comments>0</slash:comments>
		
		
		<post-id xmlns="com-wordpress:feed-additions:1">7856</post-id>
		<media:content url="https://2.gravatar.com/avatar/e64ee14e2ee86f72b7681f00e27a78273ca0707a414bfd21f792423c54038664?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">replicationnetwork</media:title>
		</media:content>

		<media:content url="https://replicationnetwork.com/wp-content/uploads/2025/03/image.png?w=944" medium="image" />
	</item>
		<item>
		<title>AoI*: “The Sources of Researcher Variation in Economics” by Huntington-Klein et al. (2025)</title>
		<link>https://replicationnetwork.com/2025/03/14/aoi-the-sources-of-researcher-variation-in-economics-by-huntington-klein-et-al-2025/</link>
					<comments>https://replicationnetwork.com/2025/03/14/aoi-the-sources-of-researcher-variation-in-economics-by-huntington-klein-et-al-2025/#respond</comments>
		
		<dc:creator><![CDATA[replicationnetwork]]></dc:creator>
		<pubDate>Thu, 13 Mar 2025 21:42:08 +0000</pubDate>
				<category><![CDATA[GUEST BLOGS]]></category>
		<category><![CDATA[Causal Inference]]></category>
		<category><![CDATA[Data Cleaning]]></category>
		<category><![CDATA[Many-Analysts Approach]]></category>
		<category><![CDATA[Research design]]></category>
		<category><![CDATA[Researcher degrees of freedom]]></category>
		<category><![CDATA[Researcher Variation]]></category>
		<guid isPermaLink="false">http://replicationnetwork.com/?p=7852</guid>

					<description><![CDATA[[*AoI = “Articles of Interest” is a feature of TRN where we report abstracts of recent research related to replication and research integrity.] ABSTRACT (taken from the article) “We use a rigorous three-stage many-analysts design to assess how different researcher decisions—specifically...]]></description>
										<content:encoded><![CDATA[
<p class="has-black-color has-text-color has-link-color wp-elements-47b756873aa88b4198acacfcb6670665 wp-block-paragraph" style="font-size:21px;line-height:1.4"><em>[*AoI = “Articles of Interest” is a feature of TRN where we report abstracts of recent research related to replication and research integrity.]</em></p>



<p class="has-black-color has-text-color has-link-color wp-elements-695d0959ed600e17feccee260f706025 wp-block-paragraph" style="font-size:24px;line-height:1.4"><strong>ABSTRACT (taken from <em><a href="https://www.econstor.eu/bitstream/10419/312260/1/I4R-DP209.pdf" target="_blank" rel="noreferrer noopener"><span style="text-decoration: underline">the article</span></a></em>)</strong></p>



<p class="has-black-color has-text-color has-link-color wp-elements-f8515427a399d3495fd38c657facad20 wp-block-paragraph" style="font-size:21px;line-height:1.4">“We use a rigorous three-stage many-analysts design to assess how different researcher decisions—specifically data cleaning, research design, and the interpretation of a policy question—affect the variation in estimated treatment effects.”</p>



<p class="has-black-color has-text-color has-link-color wp-elements-04d73d2c3880fb8fda5c2918d7efa5bc wp-block-paragraph" style="font-size:21px;line-height:1.4">“A total of 146 research teams each completed the same causal inference task three times each: first with few constraints, then using a shared research design, and finally with pre-cleaned data in addition to a specified design.”</p>



<p class="has-black-color has-text-color has-link-color wp-elements-bcf4dae8819e497a082d3a87b9bb8f08 wp-block-paragraph" style="font-size:21px;line-height:1.4">“We find that even when analyzing the same data, teams reach different conclusions. In the first stage, the interquartile range (IQR) of the reported policy effect was 3.1 percentage points, with substantial outliers.”</p>



<p class="has-black-color has-text-color has-link-color wp-elements-4918872558114c2abaec1308d4e0d64e wp-block-paragraph" style="font-size:21px;line-height:1.4">“Surprisingly, the second stage, which restricted research design choices, exhibited slightly higher IQR (4.0 percentage points), largely attributable to imperfect adherence to the prescribed protocol. By contrast, the final stage, featuring standardized data cleaning, narrowed variation in estimated effects, achieving an IQR of 2.4 percentage points.”</p>



<p class="has-black-color has-text-color has-link-color wp-elements-d00689a9f0b6a0872de300253ee1e306 wp-block-paragraph" style="font-size:21px;line-height:1.4">“Reported sample sizes also displayed significant convergence under more restrictive conditions, with the IQR dropping from 295,187 in the first stage to 29,144 in the second, and effectively zero by the third.”</p>



<p class="has-black-color has-text-color has-link-color wp-elements-eb8d3b93d867e4e0379713ce34154613 wp-block-paragraph" style="font-size:21px;line-height:1.4">“Our findings underscore the critical importance of data cleaning in shaping applied microeconomic results and highlight avenues for future replication efforts.”</p>



<p class="has-black-color has-text-color has-link-color wp-elements-a422884e54beca493d61c6aa49cdc336 wp-block-paragraph" style="font-size:24px;line-height:1.4"><strong>REFERENCE</strong></p>



<p class="has-black-color has-text-color has-link-color wp-elements-ca958eca2b720c0e0362ba959b5d7b8a wp-block-paragraph" style="font-size:21px;line-height:1.4"><a href="https://www.econstor.eu/bitstream/10419/312260/1/I4R-DP209.pdf" target="_blank" rel="noreferrer noopener"><span style="text-decoration: underline">Huntington-Klein, Nick et al. (2025) : The Sources of Researcher Variation inEconomics, I4R Discussion Paper Series, No. 209, Institute for Replication (I4R), s.l.</span></a></p>



<p class="wp-block-paragraph"></p>
]]></content:encoded>
					
					<wfw:commentRss>https://replicationnetwork.com/2025/03/14/aoi-the-sources-of-researcher-variation-in-economics-by-huntington-klein-et-al-2025/feed/</wfw:commentRss>
			<slash:comments>0</slash:comments>
		
		
		<post-id xmlns="com-wordpress:feed-additions:1">7852</post-id>
		<media:content url="https://2.gravatar.com/avatar/e64ee14e2ee86f72b7681f00e27a78273ca0707a414bfd21f792423c54038664?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">replicationnetwork</media:title>
		</media:content>
	</item>
		<item>
		<title>AoI*: “Same data, different analysts: variation in effect sizes due to analytical decisions in ecology and evolutionary biology” by Gould et al. (2025)</title>
		<link>https://replicationnetwork.com/2025/03/08/aoi-same-data-different-analysts-variation-in-effect-sizes-due-to-analytical-decisions-in-ecology-and-evolutionary-biology-by-gould-et-al-2025/</link>
					<comments>https://replicationnetwork.com/2025/03/08/aoi-same-data-different-analysts-variation-in-effect-sizes-due-to-analytical-decisions-in-ecology-and-evolutionary-biology-by-gould-et-al-2025/#respond</comments>
		
		<dc:creator><![CDATA[replicationnetwork]]></dc:creator>
		<pubDate>Sat, 08 Mar 2025 05:08:40 +0000</pubDate>
				<category><![CDATA[GUEST BLOGS]]></category>
		<category><![CDATA[Ecology and evolutionary biology]]></category>
		<category><![CDATA[Effect size variation]]></category>
		<category><![CDATA[Many-analyst study]]></category>
		<category><![CDATA[Meta-analysis]]></category>
		<category><![CDATA[replication crisis]]></category>
		<category><![CDATA[Reproducibility]]></category>
		<guid isPermaLink="false">http://replicationnetwork.com/?p=7849</guid>

					<description><![CDATA[[*AoI = “Articles of Interest” is a feature of TRN where we report abstracts of recent research related to replication and research integrity.] ABSTRACT (taken from the article) &#8220;We [implemented] a large-scale empirical exploration of the variation in effect sizes and...]]></description>
										<content:encoded><![CDATA[
<p class="has-black-color has-text-color has-link-color wp-elements-47b756873aa88b4198acacfcb6670665 wp-block-paragraph" style="font-size:21px;line-height:1.4"><em>[*AoI = “Articles of Interest” is a feature of TRN where we report abstracts of recent research related to replication and research integrity.]</em></p>



<p class="has-black-color has-text-color has-link-color wp-elements-82cc45988f3633e67f9dbdb49f0f3b8c wp-block-paragraph" style="font-size:24px;line-height:1.4"><strong>ABSTRACT (taken from <em><a href="https://bmcbiol.biomedcentral.com/articles/10.1186/s12915-024-02101-x" target="_blank" rel="noreferrer noopener"><span style="text-decoration: underline">the article</span></a></em>)</strong></p>



<p class="has-black-color has-text-color has-link-color wp-elements-7a99662d94533cad0fe7a91ce762ae7b wp-block-paragraph" style="font-size:21px;line-height:1.4">&#8220;We [implemented] a large-scale empirical exploration of the variation in effect sizes and model predictions generated by the analytical decisions of different researchers in ecology and evolutionary biology.&#8221;</p>



<p class="has-black-color has-text-color has-link-color wp-elements-0134c47bd89b2fc0400e703693b755e0 wp-block-paragraph" style="font-size:21px;line-height:1.4">&#8220;We used two unpublished datasets, one from evolutionary ecology (blue tit, <em>Cyanistes caeruleus</em>, to compare sibling number and nestling growth) and one from conservation ecology (<em>Eucalyptus</em>, to compare grass cover and tree seedling recruitment). The project leaders recruited 174 analyst teams, comprising 246 analysts, to investigate the answers to prespecified research questions.&#8221;</p>



<p class="has-black-color has-text-color has-link-color wp-elements-6310b33733f5236e7fdc40f143d4f087 wp-block-paragraph" style="font-size:21px;line-height:1.4">&#8220;We found substantial heterogeneity among results for both datasets, although the patterns of variation differed between them. For the blue tit analyses, the average effect was convincingly negative, with less growth for nestlings living with more siblings, but there was near continuous variation in effect size from large negative effects to effects near zero, and even effects crossing the traditional threshold of statistical significance in the opposite direction.&#8221;</p>



<p class="has-black-color has-text-color has-link-color wp-elements-703ecbba8bed9507dc7649681057ad28 wp-block-paragraph" style="font-size:21px;line-height:1.4">&#8220;In contrast, the average relationship between grass cover and <em>Eucalyptus</em> seedling number was only slightly negative and not convincingly different from zero, and most effects ranged from weakly negative to weakly positive, with about a third of effects crossing the traditional threshold of significance in one direction or the other. However, there were also several striking outliers in the <em>Eucalyptus</em> dataset, with effects far from zero.&#8221;</p>



<p class="has-black-color has-text-color has-link-color wp-elements-ac5b4c1ebaa5b021d3ca33ff5002542c wp-block-paragraph" style="font-size:21px;line-height:1.4">&#8220;…analyses with results that were far from the mean were no more or less likely to have dissimilar variable sets, use random effects in their models, or receive poor peer reviews than those analyses that found results that were close to the mean.&#8221;</p>



<p class="has-black-color has-text-color has-link-color wp-elements-e31b62e5342392b781614117c551f658 wp-block-paragraph" style="font-size:21px;line-height:1.4">&#8220;The existence of substantial variability among analysis outcomes raises important questions about how ecologists and evolutionary biologists should interpret published results, and how they should conduct analyses in the future.&#8221;</p>



<p class="has-black-color has-text-color has-link-color wp-elements-a422884e54beca493d61c6aa49cdc336 wp-block-paragraph" style="font-size:24px;line-height:1.4"><strong>REFERENCE</strong></p>



<p class="has-black-color has-text-color has-link-color wp-elements-e871b1bcee0272701cbda5b7d4c90680 wp-block-paragraph" style="font-size:21px;line-height:1.4"><a href="https://bmcbiol.biomedcentral.com/articles/10.1186/s12915-024-02101-x" target="_blank" rel="noreferrer noopener"><span style="text-decoration: underline">Gould, E., Fraser, H.S., Parker, T.H. <em>et al.</em> Same data, different analysts: variation in effect sizes due to analytical decisions in ecology and evolutionary biology. <em>BMC Biol</em> <strong>23</strong>, 35 (2025). https://doi.org/10.1186/s12915-024-02101-x</span></a></p>
]]></content:encoded>
					
					<wfw:commentRss>https://replicationnetwork.com/2025/03/08/aoi-same-data-different-analysts-variation-in-effect-sizes-due-to-analytical-decisions-in-ecology-and-evolutionary-biology-by-gould-et-al-2025/feed/</wfw:commentRss>
			<slash:comments>0</slash:comments>
		
		
		<post-id xmlns="com-wordpress:feed-additions:1">7849</post-id>
		<media:content url="https://2.gravatar.com/avatar/e64ee14e2ee86f72b7681f00e27a78273ca0707a414bfd21f792423c54038664?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">replicationnetwork</media:title>
		</media:content>
	</item>
		<item>
		<title>AoI*: “Decisions, Decisions, Decisions: An Ethnographic Study of Researcher Discretion in Practice” by van Drimmelen et al. (2024)</title>
		<link>https://replicationnetwork.com/2025/03/05/aoi-decisions-decisions-decisions-an-ethnographic-study-of-researcher-discretion-in-practice-by-van-drimmelen-et-al-2024/</link>
					<comments>https://replicationnetwork.com/2025/03/05/aoi-decisions-decisions-decisions-an-ethnographic-study-of-researcher-discretion-in-practice-by-van-drimmelen-et-al-2024/#comments</comments>
		
		<dc:creator><![CDATA[replicationnetwork]]></dc:creator>
		<pubDate>Tue, 04 Mar 2025 16:30:51 +0000</pubDate>
				<category><![CDATA[GUEST BLOGS]]></category>
		<category><![CDATA[Ethnographic study]]></category>
		<category><![CDATA[Pre-Analysis plans]]></category>
		<category><![CDATA[Research practice]]></category>
		<category><![CDATA[Researcher discretion]]></category>
		<guid isPermaLink="false">http://replicationnetwork.com/?p=7846</guid>

					<description><![CDATA[[*AoI = “Articles of Interest” is a feature of TRN where we report abstracts of recent research related to replication and research integrity.] ABSTRACT (taken from the article) “This paper is a study of the decisions that researchers take during the...]]></description>
										<content:encoded><![CDATA[
<p class="has-black-color has-text-color has-link-color wp-elements-47b756873aa88b4198acacfcb6670665 wp-block-paragraph" style="font-size:21px;line-height:1.4"><em>[*AoI = “Articles of Interest” is a feature of TRN where we report abstracts of recent research related to replication and research integrity.]</em></p>



<p class="has-black-color has-text-color has-link-color wp-elements-5917511b4990d3aa725acfc9fde8ddc3 wp-block-paragraph" style="font-size:24px;line-height:1.4"><strong>ABSTRACT (taken from <em><a href="https://link.springer.com/article/10.1007/s11948-024-00481-5" target="_blank" rel="noreferrer noopener"><span style="text-decoration: underline">the article</span></a></em>)</strong></p>



<p class="has-black-color has-text-color has-link-color wp-elements-8a2e45bd9c8832a69bdf5867e0cfad70 wp-block-paragraph" style="font-size:21px;line-height:1.4">“This paper is a study of the decisions that researchers take during the execution of a research plan: their researcher discretion. Flexible research methods are generally seen as undesirable, and many methodologists urge to eliminate these so-called ‘researcher degrees of freedom’ from the research practice. However, what this looks like in practice is unclear.”</p>



<p class="has-black-color has-text-color has-link-color wp-elements-511e888e686ce1f0bfcda5c0d8c59cc8 wp-block-paragraph" style="font-size:21px;line-height:1.4">“Based on twelve months of ethnographic fieldwork in two end-of-life research groups in which we observed research practice, conducted interviews, and collected documents, we explore when researchers are required to make decisions, and what these decisions entail.”</p>



<p class="has-black-color has-text-color has-link-color wp-elements-ec95d265c81b3f26cca8facea5e1c0e4 wp-block-paragraph" style="font-size:21px;line-height:1.4">“Our ethnographic study of research practice suggests that researcher discretion is an integral and inevitable aspect of research practice, as many elements of a research protocol will either need to be further operationalised or adapted during its execution. Moreover, it may be difficult for researchers to identify their own discretion, limiting their effectivity in transparency.”</p>



<p class="has-black-color has-text-color has-link-color wp-elements-a422884e54beca493d61c6aa49cdc336 wp-block-paragraph" style="font-size:24px;line-height:1.4"><strong>REFERENCE</strong></p>



<p class="has-black-color has-text-color has-link-color wp-elements-0ac705ac3850239f83baa1705d47008a wp-block-paragraph" style="font-size:21px;line-height:1.4"><a href="https://link.springer.com/article/10.1007/s11948-024-00481-5" target="_blank" rel="noreferrer noopener"><span style="text-decoration: underline">van Drimmelen, T., Slagboom, M.N., Reis, R. <em>et al.</em> Decisions, Decisions, Decisions: An Ethnographic Study of Researcher Discretion in Practice. <em>Sci Eng Ethics</em> <strong>30</strong>, 59 (2024). https://doi.org/10.1007/s11948-024-00481-5</span></a></p>



<p class="wp-block-paragraph"></p>
]]></content:encoded>
					
					<wfw:commentRss>https://replicationnetwork.com/2025/03/05/aoi-decisions-decisions-decisions-an-ethnographic-study-of-researcher-discretion-in-practice-by-van-drimmelen-et-al-2024/feed/</wfw:commentRss>
			<slash:comments>1</slash:comments>
		
		
		<post-id xmlns="com-wordpress:feed-additions:1">7846</post-id>
		<media:content url="https://2.gravatar.com/avatar/e64ee14e2ee86f72b7681f00e27a78273ca0707a414bfd21f792423c54038664?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">replicationnetwork</media:title>
		</media:content>
	</item>
		<item>
		<title>AoI*: “Open minds, tied hands: Awareness, behavior, and reasoning on open science and irresponsible research behavior” by Wiradhany et al. (2025)</title>
		<link>https://replicationnetwork.com/2025/02/25/aoi-open-minds-tied-hands-awareness-behavior-and-reasoning-on-open-science-and-irresponsible-research-behavior-by-wiradhany-et-al-2025/</link>
					<comments>https://replicationnetwork.com/2025/02/25/aoi-open-minds-tied-hands-awareness-behavior-and-reasoning-on-open-science-and-irresponsible-research-behavior-by-wiradhany-et-al-2025/#respond</comments>
		
		<dc:creator><![CDATA[replicationnetwork]]></dc:creator>
		<pubDate>Tue, 25 Feb 2025 02:07:59 +0000</pubDate>
				<category><![CDATA[GUEST BLOGS]]></category>
		<category><![CDATA[Irresponsible Research Behavior (IRB)]]></category>
		<category><![CDATA[Open Science Practices (OSP)]]></category>
		<guid isPermaLink="false">http://replicationnetwork.com/?p=7834</guid>

					<description><![CDATA[[*AoI = “Articles of Interest” is a feature of TRN where we report abstracts of recent research related to replication and research integrity.] ABSTRACT (taken from the article) &#8220;Knowledge on Open Science Practices (OSP) has been promoted through responsible conduct of...]]></description>
										<content:encoded><![CDATA[
<p class="has-black-color has-text-color has-link-color wp-elements-47b756873aa88b4198acacfcb6670665 wp-block-paragraph" style="font-size:21px;line-height:1.4"><em>[*AoI = “Articles of Interest” is a feature of TRN where we report abstracts of recent research related to replication and research integrity.]</em></p>



<p class="has-black-color has-text-color has-link-color wp-elements-47cecad241cdb015f79ff852552d430c wp-block-paragraph" style="font-size:24px;line-height:1.4"><strong>ABSTRACT (taken from <em><a href="https://www.tandfonline.com/doi/full/10.1080/08989621.2025.2457100" target="_blank" rel="noreferrer noopener"><span style="text-decoration: underline">the article</span></a></em>)</strong></p>



<p class="has-black-color has-text-color has-link-color wp-elements-a11699932ccd36a2b8ead40886aa78f8 wp-block-paragraph" style="font-size:21px;line-height:1.4">&#8220;Knowledge on Open Science Practices (OSP) has been promoted through responsible conduct of research training and the development of open science infrastructure to combat Irresponsible Research Behavior (IRB). Yet, there is limited evidence for the efficacy of OSP in minimizing IRB.&#8221; </p>



<p class="has-black-color has-text-color has-link-color wp-elements-72bbddc56ad07778108492e22d9c5b41 wp-block-paragraph" style="font-size:21px;line-height:1.4">&#8220;We asked N=778 participants to fill in questionnaires that contain OSP and ethical reasoning vignettes, and report self-admission rates of IRB and personality traits. We found that against our initial prediction, even though OSP was negatively correlated with IRB, this correlation was very weak, and upon controlling for individual differences factors, OSP neither predicted IRB nor was this relationship moderated by ethical reasoning.&#8221; </p>



<p class="has-black-color has-text-color has-link-color wp-elements-105fa454933c3752a00bf557ad55931a wp-block-paragraph" style="font-size:21px;line-height:1.4">&#8220;On the other hand, individual differences factors, namely dark personality triad, and conscientiousness and openness, contributed more to IRB than OSP knowledge.&#8221; </p>



<p class="has-black-color has-text-color has-link-color wp-elements-96425c06686111769684aa406ebb26a3 wp-block-paragraph" style="font-size:21px;line-height:1.4">&#8220;Our findings suggest that OSP knowledge needs to be complemented by the development of ethical virtues to encounter IRBs more effectively.&#8221;</p>



<p class="has-black-color has-text-color has-link-color wp-elements-a422884e54beca493d61c6aa49cdc336 wp-block-paragraph" style="font-size:24px;line-height:1.4"><strong>REFERENCE</strong></p>



<p class="has-black-color has-text-color has-link-color wp-elements-66411c913b8ce0877dfcec675e7b3303 wp-block-paragraph" style="font-size:21px;line-height:1.4"><a href="https://www.tandfonline.com/doi/full/10.1080/08989621.2025.2457100" target="_blank" rel="noreferrer noopener"><span style="text-decoration: underline">Wiradhany, W., Djalal, F. M., &amp; de Bruin, A. B. H. (2025). Open minds, tied hands: Awareness, behavior, and reasoning on open science and irresponsible research behavior. <em>Accountability in Research</em>, 1–24. https://doi.org/10.1080/08989621.2025.2457100</span></a></p>



<p class="wp-block-paragraph"></p>
]]></content:encoded>
					
					<wfw:commentRss>https://replicationnetwork.com/2025/02/25/aoi-open-minds-tied-hands-awareness-behavior-and-reasoning-on-open-science-and-irresponsible-research-behavior-by-wiradhany-et-al-2025/feed/</wfw:commentRss>
			<slash:comments>0</slash:comments>
		
		
		<post-id xmlns="com-wordpress:feed-additions:1">7834</post-id>
		<media:content url="https://2.gravatar.com/avatar/e64ee14e2ee86f72b7681f00e27a78273ca0707a414bfd21f792423c54038664?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">replicationnetwork</media:title>
		</media:content>
	</item>
		<item>
		<title>RÖSELER: Replication Research Symposium and Journal</title>
		<link>https://replicationnetwork.com/2025/02/08/roseler-replication-research-symposium-and-journal/</link>
					<comments>https://replicationnetwork.com/2025/02/08/roseler-replication-research-symposium-and-journal/#respond</comments>
		
		<dc:creator><![CDATA[replicationnetwork]]></dc:creator>
		<pubDate>Fri, 07 Feb 2025 20:54:47 +0000</pubDate>
				<category><![CDATA[GUEST BLOGS]]></category>
		<category><![CDATA[Annotator]]></category>
		<category><![CDATA[Educational materials]]></category>
		<category><![CDATA[Explorer]]></category>
		<category><![CDATA[FORRT]]></category>
		<category><![CDATA[Framework for Open and Reproducible Research Training]]></category>
		<category><![CDATA[Journal policies]]></category>
		<category><![CDATA[Replication Research journal]]></category>
		<category><![CDATA[Replication Research Symposium]]></category>
		<guid isPermaLink="false">http://replicationnetwork.com/?p=7825</guid>

					<description><![CDATA[Efforts to teach, collect, curate, and guide replication research are culminating in the new diamond open access journal Replication Research, which will launch in late 2025. The Framework for Open and Reproducible Research Training (FORRT; forrt.org) and the Münster Center...]]></description>
										<content:encoded><![CDATA[
<p class="has-black-color has-text-color has-link-color wp-elements-0ceb641d16d2881d29b85b7e6b5781f1 wp-block-paragraph" style="font-size:21px;line-height:1.4">Efforts to teach, collect, curate, and guide replication research are culminating in the new diamond open access journal <em>Replication Research,</em> which will launch in late 2025. The Framework for Open and Reproducible Research Training (FORRT; <a href="http://forrt.org" target="_blank" rel="noreferrer noopener"><em><strong><span style="text-decoration: underline">forrt.org</span></strong></em></a>) and the Münster Center for Open Science have spearheaded several initiatives to bolster replication research across various disciplines. From May 14-16, 2025, we are excited to invite researchers to join us in Münster, as well as online, for the <strong><em>Replication Research Symposium</em></strong>. This event will mark a significant step toward the launch of our interdisciplinary journal dedicated to reproductions, replications, and discussions on the methodologies involved. But let’s start from the beginning: What is going on at FORRT?</p>



<p class="has-black-color has-text-color has-link-color wp-elements-1b00030adc42c35d97e3c6976d766d05 wp-block-paragraph" style="font-size:21px;line-height:1.4"><strong>Finding and exploring replications</strong>: FORRT Replication Database (FReD) includes hundreds of replication studies and thousands of replication findings &#8211; which we define as tests of previously established claims using different data. Researchers can use the <a href="https://forrt.org/apps/fred_annotator.html" target="_blank" rel="noreferrer noopener"><strong><span style="text-decoration: underline"><em>Annotator</em></span></strong></a> to have their reference lists auto-checked to see whether they cited original studies that have been replicated. With the <a href="https://forrt.org/apps/fred_explorer.html" target="_blank" rel="noreferrer noopener"><strong><em><span style="text-decoration: underline">Explorer</span></em></strong></a>, they get an overview of all studies and can analyze replication rates across different success criteria or moderator variables.</p>



<p class="has-black-color has-text-color has-link-color wp-elements-6a3e75b8d1d985f373f119ce9a78e783 wp-block-paragraph" style="font-size:21px;line-height:1.4"><strong>Meta-analyzing replication outcomes</strong>: To increase the accessibility of the database, we created the FReD <a href="https://forrt.org/FReD/index.html" target="_blank" rel="noreferrer noopener"><strong><span style="text-decoration: underline"><em>R-package</em></span></strong></a> with which researchers can run their own analyses or run the ShinyApps locally. In a vignette, we outline different <a href="https://forrt.org/FReD/articles/success_criteria.html" target="_blank" rel="noreferrer noopener"><strong><span style="text-decoration: underline"><em>replication success criteria</em></span></strong></a> and show how this choice can affect the overall replication success rate.</p>



<p class="has-black-color has-text-color has-link-color wp-elements-1012bd2d385c342e0932be9b8714532d wp-block-paragraph" style="font-size:21px;line-height:1.4"><strong>Teaching replications</strong>: One of FORRT’s core ideas is to support researchers from all fields to learn about openness and reproducibility. Among numerous projects, we clarified terminology (<a href="https://forrt.org/glossary/" target="_blank" rel="noreferrer noopener"><strong><span style="text-decoration: underline"><em>Glossary of Open Science Terms</em></span></strong></a>), produced educational materials such as an <a href="https://www.nature.com/articles/s44271-023-00003-2" target="_blank" rel="noreferrer noopener"><strong><span style="text-decoration: underline"><em>educationally-driven review paper</em></span></strong></a> on the transformative impact of the replication crisis, <a href="https://forrt.org/positive-changes-replication-crisis/" target="_blank" rel="noreferrer noopener"><strong><span style="text-decoration: underline"><em>syllabus and slides</em></span></strong></a> with lecture and pedagogical notes (see Educational toolkit), and <a href="https://forrt.org/resources/" target="_blank" rel="noreferrer noopener"><strong><span style="text-decoration: underline"><em>curated resources</em></span></strong></a>. We are also now working together with experts from economics, psychology, medicine, and other fields to create an interdisciplinary guide to carrying out replications and reproductions.</p>



<p class="has-black-color has-text-color has-link-color wp-elements-26a5c582f806c7af4e7550fe10deaf14 wp-block-paragraph" style="font-size:21px;line-height:1.4"><strong>Publishing replication studies and discussing standards across fields</strong>: We are currently developing the journal <a href="https://lukasroeseler.github.io/replicationresearch_mockup/" target="_blank" rel="noreferrer noopener"><strong><span style="text-decoration: underline"><em>Replication Research</em></span></strong></a>, a diamond open-access outlet for replication and reproduction studies and discussions about the respective methods. There will be reproducibility checks for all published studies and standardized machine-readable templates that authors are encouraged to use. We are currently building the journal with a network of 20 experts from different fields. From February until April, we are organizing the <em>Road to Replication Research</em> via Zoom. This online discussion series is centered around different aspects of open and responsible scientific publishing and is open to anybody who wants to join the conversation, so that the journal is maximally open from the start. Finally, at the <em>Replication Research Symposium</em>, participants and experts from diverse fields such as psychology, economics, biology, medicine, marketing, meta-science, library science, humanities, and others will convene to discuss the significance and methodology of conducting replication and reproduction studies over three days in May 2025. This symposium will further shape <em>Replication Research</em> and <a href="https://indico.uni-muenster.de/event/3176/abstracts/" target="_blank" rel="noreferrer noopener"><strong><span style="text-decoration: underline"><em>we invite researchers from all fields to present their replications, reproductions, or methodological discussions</em></span></strong></a>. The journal launch is then slated for late 2025.</p>



<p class="has-black-color has-text-color has-link-color wp-elements-91a5d135938da2168ad1005112ab2625 wp-block-paragraph" style="font-size:21px;line-height:1.4">For more information about <em>Replication Research,</em> the upcoming symposium, and the online discussion series about the creation of the journal <a href="https://lukasroeseler.github.io/replicationresearch_mockup/" target="_blank" rel="noreferrer noopener"><strong><span style="text-decoration: underline"><em>click here</em></span></strong></a>.</p>



<p class="has-black-color has-text-color has-link-color wp-elements-980f029da185b9ef978ed474636445c5 wp-block-paragraph" style="font-size:21px;line-height:1.4"><em>Lukas Röseler is the managing director of the Münster Center for Open Science at the University of Münster, one of the project leads at FORRT’s Replication Hub, and will be the managing editor of Replication Research. He can be contacted at lukas.roeseler@uni-muenster.de.</em></p>



<p class="wp-block-paragraph"></p>
]]></content:encoded>
					
					<wfw:commentRss>https://replicationnetwork.com/2025/02/08/roseler-replication-research-symposium-and-journal/feed/</wfw:commentRss>
			<slash:comments>0</slash:comments>
		
		
		<post-id xmlns="com-wordpress:feed-additions:1">7825</post-id>
		<media:content url="https://2.gravatar.com/avatar/e64ee14e2ee86f72b7681f00e27a78273ca0707a414bfd21f792423c54038664?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">replicationnetwork</media:title>
		</media:content>
	</item>
	</channel>
</rss>
