<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Featured Articles &#8211; AI Impacts</title>
	<atom:link href="http://aiimpacts.org/category/featured-articles/feed/" rel="self" type="application/rss+xml" />
	<link>http://aiimpacts.org</link>
	<description></description>
	<lastBuildDate>Tue, 31 Oct 2023 17:14:30 +0000</lastBuildDate>
	<language>en-US</language>
	<sy:updatePeriod>
	hourly	</sy:updatePeriod>
	<sy:updateFrequency>
	1	</sy:updateFrequency>
	<generator>https://wordpress.org/?v=6.9.1</generator>
	<item>
		<title>2022 Expert Survey on Progress in AI</title>
		<link>http://aiimpacts.org/2022-expert-survey-on-progress-in-ai/</link>
		
		<dc:creator><![CDATA[Katja Grace]]></dc:creator>
		<pubDate>Thu, 04 Aug 2022 13:25:21 +0000</pubDate>
				<category><![CDATA[AI Timeline Surveys]]></category>
		<category><![CDATA[AI Timelines]]></category>
		<category><![CDATA[Featured Articles]]></category>
		<category><![CDATA[Predictions of Human-Level AI Timelines]]></category>
		<category><![CDATA[Pages]]></category>
		<guid isPermaLink="false">https://aiimpacts.org/?p=3246</guid>

					<description><![CDATA[Collected data and analysis from a large survey of machine learning researchers.  <a class="mh-excerpt-more" href="http://aiimpacts.org/2022-expert-survey-on-progress-in-ai/" title="2022 Expert Survey on Progress in AI"></a>]]></description>
										<content:encoded><![CDATA[
<p><em><mark style="background-color:rgba(0, 0, 0, 0)" class="has-inline-color has-cyan-bluish-gray-color">Published 3 August 2022; last updated 3 August 2022</mark></em><br><br><strong><em><mark style="background-color:rgba(0, 0, 0, 0)" class="has-inline-color has-vivid-red-color">This page is out-of-date. Visit the <a href="https://wiki.aiimpacts.org/doku.php?id=ai_timelines:predictions_of_human-level_ai_timelines:ai_timeline_surveys:2022_expert_survey_on_progress_in_ai">updated version of this page</a> on our <a href="https://wiki.aiimpacts.org/doku.php?id=start">wiki</a>.</mark></em></strong></p>



<p>The 2022 Expert Survey on Progress in AI (2022 ESPAI) is a survey of machine learning researchers that AI Impacts ran in June-August 2022. </p>



<h2 class="wp-block-heading"><strong>Details</strong></h2>



<h3 class="wp-block-heading">Background</h3>



<p>The 2022 ESPAI is a rerun of the <a href="https://aiimpacts.org/2016-expert-survey-on-progress-in-ai/">2016 Expert Survey on Progress in AI</a> that researchers at AI Impacts previously collaborated on with others. Almost all of the questions were identical, and both surveyed authors who recently published in NeurIPS and ICML, major machine learning conferences. </p>



<p>Zhang et al ran a followup survey in 2019 (published in 2022)<span id='easy-footnote-1-3246' class='easy-footnote-margin-adjust'></span><span class='easy-footnote'><a href='http://aiimpacts.org/2022-expert-survey-on-progress-in-ai/#easy-footnote-bottom-1-3246' title='Zhang, Baobao, Noemi Dreksler, Markus Anderljung, Lauren Kahn, Charlie Giattino, Allan Dafoe, and Michael Horowitz. “Forecasting AI Progress: Evidence from a Survey of Machine Learning Researchers,” June 8, 2022. &lt;a href=&quot;https://doi.org/10.48550/arXiv.2206.04132&quot;&gt;https://doi.org/10.48550/arXiv.2206.04132&lt;/a&gt;.'><sup>1</sup></a></span> however they reworded or altered many questions, including the definitions of HLMI, so much of their data is not directly comparable to that of the 2016 or 2022 surveys, especially in light of large potential for framing effects observed.</p>



<h3 class="wp-block-heading">Methods</h3>



<h4 class="wp-block-heading">Population</h4>



<p>We contacted approximately 4271 researchers who published at the conferences NeurIPS or ICML in 2021. These people were selected by taking all of the authors at those conferences and randomly allocating them between this survey and a survey being run by others. We then contacted those whose email addresses we could find. We found email addresses in papers published at those conferences, in other public data, and in records from our previous survey and Zhang et al 2022. We received 738 responses, some partial, for a 17% response rate.</p>



<p>Participants who previously participated in the the 2016 ESPAI or Zhang et al surveys received slightly longer surveys, and received questions which they had received in past surveys (where random subsets of questions were given), rather than receiving newly randomized questions. This was so that they could also be included in a &#8216;matched panel&#8217; survey, in which we contacted all researchers who completed the 2016 ESPAI or Zhang et al surveys, to compare responses from exactly the same samples of researchers over time. These surveys contained additional questions matching some of those in the Zhang et al survey. </p>



<h4 class="wp-block-heading">Contact</h4>



<p>We invited the selected researchers to take the survey via email. We accepted responses between June 12 and August 3, 2022. </p>



<h4 class="wp-block-heading">Questions</h4>



<p>The full list of survey questions is available below, as exported from the survey software. The export does not preserve pagination, or data about survey flow. Participants received randomized subsets of these questions, so the survey each person received was much shorter than that shown below.</p>



<div data-wp-interactive="core/file" class="wp-block-file"><object data-wp-bind--hidden="!state.hasPdfPreview" hidden class="wp-block-file__embed" data="https://aiimpacts.org/wp-content/uploads/2022/08/2022ESPAIV.pdf" type="application/pdf" style="width:100%;height:600px" aria-label="Embed of 2022ESPAIV."></object><a id="wp-block-file--media-e41f2945-7645-4eaa-8bde-58192430e913" href="https://aiimpacts.org/wp-content/uploads/2022/08/2022ESPAIV.pdf">2022ESPAIV</a><a href="https://aiimpacts.org/wp-content/uploads/2022/08/2022ESPAIV.pdf" class="wp-block-file__button wp-element-button" download aria-describedby="wp-block-file--media-e41f2945-7645-4eaa-8bde-58192430e913">Download</a></div>



<p>A small number of changes were made to questions since the 2016 survey (list forthcoming).</p>



<h3 class="wp-block-heading"><strong>Definitions</strong></h3>



<p>&#8216;HLMI&#8217; was defined as follows:</p>



<p><em>The following questions ask about ‘high–level machine intelligence’ (HLMI). Say we have ‘high-level machine intelligence’ when unaided machines can accomplish every task better and more cheaply than human workers. Ignore aspects of tasks for which being a human is intrinsically advantageous, e.g. being accepted as a jury member. Think feasibility, not adoption.</em></p>



<h3 class="wp-block-heading"><strong>Results</strong></h3>



<h4 class="wp-block-heading">Data</h4>



<p>The anonymized dataset is available <a href="https://docs.google.com/spreadsheets/d/1u_qcG6erXkH4EJgygl2fpkpJENAv6-kFWJejsw1oA1Q/edit?usp=sharing">here</a>. </p>



<h4 class="wp-block-heading"><strong>Summary of results</strong></h4>



<ul class="wp-block-list">
<li><strong>The aggregate forecast time to a 50% chance of HLMI was 37 years, i.e. 2059</strong> (not including data from questions about the conceptually similar Full Automation of Labor, which in 2016 received much later estimates)<strong>.</strong> This timeline has become about eight years shorter in the six years since 2016, when the aggregate prediction put 50% probability at 2061, i.e. 45 years out. Note that these estimates are conditional on &#8220;human scientific activity continu[ing] without major negative disruption.&#8221;</li>



<li><strong>The median respondent believes the probability that the long-run effect of advanced AI on humanity will be &#8220;extremely bad (e.g., human extinction)&#8221; is 5%.</strong> This is the same as it was in 2016 (though Zhang et al 2022 found 2% in a similar but non-identical question). Many respondents were substantially more concerned: 48% of respondents gave at least 10% chance of an extremely bad outcome. But some much less concerned: 25% put it at 0%.</li>



<li><strong>The median respondent believes society should prioritize AI safety research &#8220;more&#8221; than it is currently prioritized.</strong> Respondents chose from &#8220;much less,&#8221; &#8220;less,&#8221; &#8220;about the same,&#8221; &#8220;more,&#8221; and &#8220;much more.&#8221; 69% of respondents chose &#8220;more&#8221; or &#8220;much more,&#8221; up from 49% in 2016.</li>



<li><strong>The median respondent thinks there is an &#8220;about even chance&#8221; that a stated argument for an intelligence explosion is broadly correct.</strong> 54% of respondents say the likelihood that it is correct is &#8220;about even,&#8221; &#8220;likely,&#8221; or &#8220;very likely&#8221; (corresponding to probability &gt;40%), similar to 51% of respondents in 2016. The median respondent also believes machine intelligence will probably (60%) be &#8220;vastly better than humans at all professions&#8221; within 30 years of HLMI, and the rate of global technological improvement will probably (80%) dramatically increase (e.g., by a factor of ten) as a result of machine intelligence within 30 years of HLMI.</li>
</ul>



<h4 class="wp-block-heading"><strong>High-level machine intelligence timelines</strong></h4>



<p>The aggregate forecast time to HLMI was 36.6 years, conditional on &#8220;human scientific activity continu[ing] without major negative disruption.&#8221; and considering only questions using the HLMI definition. We have not yet analyzed data about the conceptually similar Full Automation of Labor (FAOL), which in 2016 prompted much later timeline estimates. Thus this timeline figure is expected to be low relative to an overall estimate from this survey.</p>



<p>This aggregate is the 50th percentile date in an equal mixture of probability distributions created by fitting a gamma distribution to each person&#8217;s answers to three questions either about the probability of HLMI occurring by a given year or the year at which a given probability would obtain.</p>



<p></p>



<figure class="wp-block-image size-large is-resized"><a href="http://aiimpacts.org/wp-content/uploads/2022/08/Screen-Shot-2022-08-04-at-02.55.28.jpg"><img fetchpriority="high" decoding="async" src="http://aiimpacts.org/wp-content/uploads/2022/08/Screen-Shot-2022-08-04-at-02.55.28-1024x960.jpg" alt="" class="wp-image-3248" style="width:512px;height:480px" width="512" height="480" srcset="http://aiimpacts.org/wp-content/uploads/2022/08/Screen-Shot-2022-08-04-at-02.55.28-1024x960.jpg 1024w, http://aiimpacts.org/wp-content/uploads/2022/08/Screen-Shot-2022-08-04-at-02.55.28-300x281.jpg 300w, http://aiimpacts.org/wp-content/uploads/2022/08/Screen-Shot-2022-08-04-at-02.55.28-768x720.jpg 768w, http://aiimpacts.org/wp-content/uploads/2022/08/Screen-Shot-2022-08-04-at-02.55.28.jpg 1126w" sizes="(max-width: 512px) 100vw, 512px" /></a><figcaption class="wp-element-caption">Figure 1: Gamma distributions inferred for each individual.</figcaption></figure>



<figure class="wp-block-image size-full is-resized"><a href="https://aiimpacts.org/wp-content/uploads/2022/08/2016_cdfs.jpg"><img decoding="async" src="https://aiimpacts.org/wp-content/uploads/2022/08/2016_cdfs.jpg" alt="" class="wp-image-3249" style="width:420px;height:420px" width="420" height="420" srcset="http://aiimpacts.org/wp-content/uploads/2022/08/2016_cdfs.jpg 840w, http://aiimpacts.org/wp-content/uploads/2022/08/2016_cdfs-300x300.jpg 300w, http://aiimpacts.org/wp-content/uploads/2022/08/2016_cdfs-150x150.jpg 150w, http://aiimpacts.org/wp-content/uploads/2022/08/2016_cdfs-768x768.jpg 768w" sizes="(max-width: 420px) 100vw, 420px" /></a><figcaption class="wp-element-caption">Figure 2: Gamma distributions inferred for each individual, 2016 data</figcaption></figure>



<h4 class="wp-block-heading"><strong>Impacts of HLMI</strong></h4>



<h6 class="wp-block-heading"><strong>Question</strong></h6>



<p>Participants were asked:</p>



<p>Assume for the purpose of this question that HLMI will at some point exist. How positive or negative do you expect the overall impact of this to be on humanity, in the long run? Please answer by saying how probable you find the following kinds of impact, with probabilities adding to 100%:</p>



<p>______ Extremely good (e.g. rapid growth in human flourishing) (1)</p>



<p>______ On balance good (2)</p>



<p>______ More or less neutral (3)</p>



<p>______ On balance bad (4)</p>



<p>______ Extremely bad (e.g. human extinction) (5)</p>



<h6 class="wp-block-heading"><strong>Answers</strong></h6>



<p>Medians:</p>



<ul class="wp-block-list">
<li>Extremely good: 10%</li>



<li>On balance good: 20%</li>



<li>More or less neutral: 15%</li>



<li>On balance bad: 10%</li>



<li>Extremely bad: 5%</li>
</ul>



<p>Means:</p>



<ul class="wp-block-list">
<li>Extremely good: 24%</li>



<li>On balance good: 26%</li>



<li>More or less neutral: 18%</li>



<li>On balance bad: 17%</li>



<li>Extremely bad: 14%</li>
</ul>



<figure class="wp-block-image size-large"><img decoding="async" src="https://wiki.aiimpacts.org/_media/ai_timelines/predictions_of_human-level_ai_timelines/ai_timeline_surveys/howbad-exploded.png" alt=""/></figure>



<h4 class="wp-block-heading"><strong>Intelligence explosion</strong></h4>



<h5 class="wp-block-heading"><strong>Probability of dramatic technological speedup</strong></h5>



<h6 class="wp-block-heading"><strong>Question</strong></h6>



<p>Participants were asked:</p>



<p>Assume that HLMI will exist at some point. How likely do you then think it is that the rate of global technological improvement will dramatically increase (e.g. by a factor of ten) as a result of machine intelligence:</p>



<p>Within <strong>two years</strong> of that point? &nbsp; &nbsp; &nbsp; ___% chance</p>



<p>Within <strong>thirty years</strong> of that point?&nbsp; &nbsp; ___% chance</p>



<h6 class="wp-block-heading"><strong>Answers</strong></h6>



<p>Median P(within <strong>two years</strong>) = 20% (20% in 2016)</p>



<p>Median P(within <strong>thirty years</strong>) = 80% (80% in 2016)</p>



<h5 class="wp-block-heading"><strong>Probability of superintelligence</strong></h5>



<h6 class="wp-block-heading"><strong>Question</strong></h6>



<p>Participants were asked:</p>



<p>Assume that HLMI will exist at some point. How likely do you think it is that there will be machine intelligence that is <strong>vastly better</strong> than humans at all professions (i.e. that is vastly more capable or vastly cheaper):</p>



<p>Within <strong>two years</strong> of that point? &nbsp; &nbsp; &nbsp; ___% chance</p>



<p>Within <strong>thirty years</strong> of that point?&nbsp; &nbsp; ___% chance</p>



<h6 class="wp-block-heading"><strong>Answers</strong></h6>



<p>Median P(…within <strong>two years</strong>) = 10% (10% in 2016)</p>



<p>Median P(…within <strong>thirty years</strong>) = 60% (50% in 2016)</p>



<h5 class="wp-block-heading"><strong>Chance that the intelligence explosion argument is about right</strong></h5>



<h6 class="wp-block-heading"><strong>Question</strong></h6>



<p>Participants were asked:</p>



<p>Some people have argued the following:</p>



<p><em>If AI systems do nearly all research and development, improvements in AI will accelerate the pace of technological progress, including further progress in AI.</em></p>



<p><em>Over a short period (less than 5 years), this feedback loop could cause technological progress to become more than an order of magnitude faster.</em></p>



<p>How likely do you find this argument to be broadly correct?</p>



<ul class="wp-block-list">
<li>Quite unlikely (0-20%)</li>



<li>Unlikely (21-40%)</li>



<li>About even chance (41-60%)</li>



<li>Likely (61-80%)</li>



<li>Quite likely (81-100%)</li>
</ul>



<h6 class="wp-block-heading"><strong>Answers</strong></h6>



<ul class="wp-block-list">
<li>20% quite unlikely (25% in 2016)</li>



<li>26% unlikely (24% in 2016)</li>



<li>21% about even chance (22% in 2016)</li>



<li>26% likely (17% in 2016)</li>



<li>7% quite likely (12% in 2016)</li>
</ul>



<h4 class="wp-block-heading"><strong>Existential risk</strong></h4>



<p>In an above question, participants&#8217; credence in &#8220;extremely bad&#8221; outcomes of HLMI have median 5% and mean 14%. To better clarify what participants mean by this, we also asked a subset of participants one of the following questions, which did not appear in the 2016 survey:</p>



<h5 class="wp-block-heading">Extinction from <strong>AI</strong></h5>



<p>Participants were asked:</p>



<p>What probability do you put on future AI advances causing human extinction or similarly permanent and severe disempowerment of the human species?&nbsp;</p>



<h6 class="wp-block-heading"><strong>Answers</strong></h6>



<p>Median 5%.</p>



<h5 class="wp-block-heading"><strong>Extinction from human failure to control AI</strong></h5>



<p>Participants were asked:</p>



<p>What probability do you put on human inability to control future advanced AI systems causing human extinction or similarly permanent and severe disempowerment of the human species?</p>



<h6 class="wp-block-heading">Answers</h6>



<p>Median 10%.</p>



<p>This question is more specific and thus necessarily less probable than the previous question, but it was given a higher probability at the median. This could be due to noise (different random subsets of respondents received the questions, so there is no logical requirement that their answers cohere), or due to the <a href="https://en.wikipedia.org/wiki/Representativeness_heuristic">representativeness heuristic</a>. </p>



<h4 class="wp-block-heading"><strong>Safety</strong></h4>



<h5 class="wp-block-heading"><strong>General safety</strong></h5>



<h6 class="wp-block-heading"><strong>Question</strong></h6>



<p>Participants were asked:</p>



<p>Let ‘<strong>AI safety research</strong>’ include any AI-related research that, rather than being primarily aimed at improving the <em>capabilities</em> of AI systems, is instead primarily aimed at <em>minimizing potential risks</em> of AI systems (beyond what is already accomplished for those goals by increasing AI system capabilities).</p>



<p>Examples of AI safety research might include:</p>



<ul class="wp-block-list">
<li>Improving the human-interpretability of machine learning algorithms for the purpose of improving the safety and robustness of AI systems, not focused on improving AI capabilities</li>



<li>Research on long-term existential risks from AI systems</li>



<li>AI-specific formal verification research</li>



<li>Policy research about how to maximize the public benefits of AI</li>
</ul>



<p>How much should society prioritize <strong>AI safety research</strong>, relative to how much it is currently prioritized?</p>



<ul class="wp-block-list">
<li>Much less</li>



<li>Less</li>



<li>About the same</li>



<li>More</li>



<li>Much more</li>
</ul>



<h6 class="wp-block-heading"><strong>Answers</strong></h6>



<ul class="wp-block-list">
<li>Much less: 2% (5% in 2016)</li>



<li>Less: 9% (8% in 2016)</li>



<li>About the same: 20% (38% in 2016)</li>



<li>More: 35% (35% in 2016)</li>



<li>Much more: 33% (14% in 2016)</li>
</ul>



<p>69% of respondents think society should prioritize AI safety research more or much more, up from 49% in 2016.</p>



<figure class="wp-block-image size-large"><img decoding="async" src="https://wiki.aiimpacts.org/_media/ai_timelines/predictions_of_human-level_ai_timelines/ai_timeline_surveys/how_much_should_society_prioritize_ai_safety_research_relative_to_how_much_it_is_currently_prioritized_1_.png" alt=""/></figure>



<h5 class="wp-block-heading"><strong>Stuart Russell&#8217;s problem</strong></h5>



<h6 class="wp-block-heading"><strong>Question</strong></h6>



<p>Participants were asked:</p>



<p>Stuart Russell summarizes an argument for why highly advanced AI might pose a risk as follows:</p>



<p><em>The primary concern [with highly advanced AI] is not spooky emergent consciousness but simply the ability to make high-quality decisions. Here, quality refers to the expected outcome utility of actions taken […]. Now we have a problem:</em></p>



<p><em>1. The utility function may not be perfectly aligned with the values of the human race, which are (at best) very difficult to pin down.</em></p>



<p><em>2. Any sufficiently capable intelligent system will prefer to ensure its own continued existence and to acquire physical and computational resources – not for their own sake, but to succeed in its assigned task.</em></p>



<p><em>A system that is optimizing a function of n variables, where the objective depends on a subset of size k&lt;n, will often set the remaining unconstrained variables to extreme values; if one of those unconstrained variables is actually something we care about, the solution found may be highly undesirable.&nbsp; This is essentially the old story of the genie in the lamp, or the sorcerer’s apprentice, or King Midas: you get exactly what you ask for, not what you want.</em></p>



<p>Do you think this argument points at an important problem?</p>



<ul class="wp-block-list">
<li>No, not a real problem.</li>



<li>No, not an important problem.</li>



<li>Yes, a moderately important problem.</li>



<li>Yes, a very important problem.</li>



<li>Yes, among the most important problems in the field.</li>
</ul>



<p>How valuable is it to work on this problem <strong><em>today</em></strong>, compared to other problems in AI?</p>



<ul class="wp-block-list">
<li>Much less valuable</li>



<li>Less valuable</li>



<li>As valuable as other problems</li>



<li>More valuable</li>



<li>Much more valuable</li>
</ul>



<p>How hard do you think this problem is compared to other problems in AI?</p>



<ul class="wp-block-list">
<li>Much easier</li>



<li>Easier</li>



<li>As hard as other problems</li>



<li>Harder</li>



<li>Much harder</li>
</ul>



<h6 class="wp-block-heading"><strong>Answers</strong></h6>



<p>Importance:</p>



<ul class="wp-block-list">
<li>No, not a real problem: 4%</li>



<li>No, not an important problem: 14%</li>



<li>Yes, a moderately important problem: 24%</li>



<li>Yes, a very important problem: 37%</li>



<li>Yes, among the most important problems in the field: 21%</li>
</ul>



<p>Value today:</p>



<ul class="wp-block-list">
<li>Much less valuable: 10%</li>



<li>Less valuable: 30%</li>



<li>As valuable as other problems: 33%</li>



<li>More valuable: 19%</li>



<li>Much more valuable: 8%</li>
</ul>



<p>Hardness:</p>



<ul class="wp-block-list">
<li>Much easier: 5%</li>



<li>Easier: 9%</li>



<li>As hard as other problems: 29%</li>



<li>Harder: 31%</li>



<li>Much harder: 26%</li>
</ul>



<h3 class="wp-block-heading">Contributions</h3>



<p>The survey was run by Katja Grace and Ben Weinstein-Raun. Data analysis was done by Zach Stein-Perlman and Ben Weinstein-Raun. This page was written by Zach Stein-Perlman and Katja Grace.</p>



<p>We thank many colleagues and friends for help, discussion and encouragement, including John Salvatier, Nick Beckstead, Howie Lempel, Joe Carlsmith, Leopold Aschenbrenner, Ramana Kumar, Jimmy Rintjema, Jacob Hilton, Ajeya Cotra, Scott Siskind, Chana Messinger, Noemi Dreksler, and Baobao Zhang.</p>



<p>We also thank the expert participants who spent time sharing their impressions with us, including:</p>



<p>Michał Zając<br>Morten Goodwin<br>Yue Sun<br>Ningyuan Chen<br>Egor Kostylev<br>Richard Antonello<br>Elia Turner<br>Andrew C Li<br>Zachary Markovich<br>Valentina Zantedeschi<br>Michael Cooper<br>Thomas A Keller<br>Marc Cavazza<br>Richard Vidal<br>David Lindner<br>Xuechen (Chen) Li<br>Alex M. Lamb<br>Tristan Aumentado-Armstrong<br>Ferdinando Fioretto<br>Alain Rossier<br>Wentao Zhang<br>Varun Jampani<br>Derek Lim<br>Muchen Li<br>Cong Hao<br>Yao-Yuan Yang<br>Linyi Li<br>Stéphane D’Ascoli<br>Lang Huang<br>Maxim Kodryan<br>Hao Bian<br>Orestis Paraskevas<br>David Madras<br>Tommy Tang<br>Li Sun<br>Stefano V Albrecht<br>Tristan Karch<br>Muhammad A Rahman<br>Runtian Zhai<br>Benjamin Black<br>Karan Singhal<br>Lin Gao<br>Ethan Brooks<br>Cesar Ferri<br>Dylan Campbell<br>Xujiang Zhao<br>Jack Parker-Holder<br>Michael Norrish<br>Jonathan Uesato<br>Yang An<br>Maheshakya Wijewardena<br>Ulrich Neumann<br>Lucile Ter-Minassian<br>Alexander Matt Turner<br>Subhabrata Dutta<br>Yu-Xiang Wang<br>Yao Zhang<br>Joanna Hong<br>Yao Fu<br>Wenqing Zheng<br>Louis C Tiao<br>Hajime Asama<br>Chengchun Shi<br>Moira R Dillon<br>Yisong Yue<br>Aurélien Bellet<br>Yin Cui<br>Gang Hua<br>Jongheon Jeong<br>Martin Klissarov<br>Aran Nayebi<br>Fabio Maria Carlucci<br>Chao Ma<br>Sébastien Gambs<br>Rasoul Mirzaiezadeh<br>Xudong Shen<br>Julian Schrittwieser<br>Adhyyan Narang<br>Fuxin Li<br>Linxi Fan<br>Johannes Gasteiger<br>Karthik Abinav Sankararaman<br>Patrick Mineault<br>Akhilesh Gotmare<br>Jibang Wu<br>Mikel Landajuela<br>Jinglin Liu<br>Qinghua Hu<br>Noah Siegel<br>Ashkan Khakzar<br>Nathan Grinsztajn<br>Julian Lienen<br>Xiaoteng Ma<br>Mohamad H Danesh<br>Ke ZHANG<br>Feiyu Xiong<br>Wonjae Kim<br>Michael Arbel<br>Piotr Skowron<br>Lê-Nguyên Hoang<br>Travers Rhodes<br>Liu Ziyin<br>Hossein Azizpour<br>Karl Tuyls<br>Hangyu Mao<br>Yi Ma<br>Junyi Li<br>Yong Cheng<br>Aditya Bhaskara<br>Xia Li<br>Danijar Hafner<br>Brian Quanz<br>Fangzhou Luo<br>Luca Cosmo<br>Scott Fujimoto<br>Santu Rana<br>Michael Curry<br>Karol Hausman<br>Luyao Yuan<br>Samarth Sinha<br>Matthew McLeod<br>Hao Shen<br>Navid Naderializadeh<br>Alessio Micheli<br>Zhenbang You<br>Van Huy Vo<br>Chenyang Wu<br>Thanard Kurutach<br>Vincent Conitzer<br>Chuang Gan<br>Chirag Gupta<br>Andreas Schlaginhaufen<br>Ruben Ohana<br>Luming Liang<br>Marco Fumero<br>Paul Muller<br>Hana Chockler<br>Ming Zhong<br>Jiamou Liu<br>Sumeet Agarwal<br>Eric Winsor<br>Ruimeng Hu<br>Changjian Shui<br>Yiwei Wang<br>Joey Tianyi Zhou<br>Anthony L. Caterini<br>Guillermo Ortiz-Jimenez<br>Iou-Jen Liu<br>Jiaming Liu<br>Michael Perlmutter<br>Anurag Arnab<br>Ziwei Xu<br>John Co-Reyes<br>Aravind Rajeswaran<br>Roy Fox<br>Yong-Lu Li<br>Carl Yang<br>Divyansh Garg<br>Amit Dhurandhar<br>Harris Chan<br>Tobias Schmidt<br>Robi Bhattacharjee<br>Marco Nadai<br>Reid McIlroy-Young<br>Wooseok Ha<br>Jesse Mu<br>Neale Ratzlaff<br>Kenneth Borup<br>Binghong Chen<br>Vikas Verma<br>Walter Gerych<br>Shachar Lovett<br>Zhengyu Zhao<br>Chandramouli Chandrasekaran<br>Richard Higgins<br>Nicholas Rhinehart<br>Blaise Agüera Y Arcas<br>Santiago Zanella-Beguelin<br>Dian Jin<br>Scott Niekum<br>Colin A. Raffel<br>Sebastian Goldt<br>Yali Du<br>Bernardo Subercaseaux<br>Hui Wu<br>Vincent Mallet<br>Ozan Özdenizci<br>Timothy Hospedales<br>Lingjiong Zhu<br>Cheng Soon Ong<br>Shahab Bakhtiari<br>Huan Zhang<br>Banghua Zhu<br>Byungjun Lee<br>Zhenyu Liao<br>Adrien Ecoffet<br>Vinay Ramasesh<br>Jesse Zhang<br>Soumik Sarkar<br>Nandan Kumar Jha<br>Daniel S Brown<br>Neev Parikh<br>Chen-Yu Wei<br>David K. Duvenaud<br>Felix Petersen<br>Songhua Wu<br>Huazhu Fu<br>Roger B Grosse<br>Matteo Papini<br>Peter Kairouz<br>Burak Varici<br>Fabio Roli<br>Mohammad Zalbagi Darestani<br>Jiamin He<br>Lys Sanz Moreta<br>Xu-Hui Liu<br>Qianchuan Zhao<br>Yulia Gel<br>Jan Drgona<br>Sajad Khodadadian<br>Takeshi Teshima<br>Igor T Podolak<br>Naoya Takeishi<br>Man Shun Ang<br>Mingli Song<br>Jakub Tomczak<br>Lukasz Szpruch<br>Micah Goldblum<br>Graham W. Taylor<br>Tomasz Korbak<br>Maheswaran Sathiamoorthy<br>Lan-Zhe Guo<br>Simone Fioravanti<br>Lei Jiao<br>Davin Choo<br>Kristy Choi<br>Varun Nair<br>Rayana Jaafar<br>Amy Greenwald<br>Martin V. Butz<br>Aleksey Tikhonov<br>Samuel Gruffaz<br>Yash Savani<br>Rui Chen<br>Ke Sun</p>



<h4 class="wp-block-heading">Suggested citation</h4>



<p>Zach Stein-Perlman, Benjamin Weinstein-Raun, Katja Grace, &#8220;2022 Expert Survey on Progress in AI.&#8221; <em>AI Impacts</em>, 3 Aug. 2022. <a href="https://aiimpacts.org/2022-expert-survey-on-progress-in-ai/">https://aiimpacts.org/2022-expert-survey-on-progress-in-ai/</a>.</p>



<h2 class="wp-block-heading">Notes</h2>
]]></content:encoded>
					
		
		
			</item>
		<item>
		<title>AI Vignettes Project</title>
		<link>http://aiimpacts.org/ai-vignettes-project/</link>
		
		<dc:creator><![CDATA[Katja Grace]]></dc:creator>
		<pubDate>Wed, 13 Oct 2021 04:58:10 +0000</pubDate>
				<category><![CDATA[Featured Articles]]></category>
		<category><![CDATA[Fiction]]></category>
		<category><![CDATA[Pages]]></category>
		<guid isPermaLink="false">https://aiimpacts.org/?p=2985</guid>

					<description><![CDATA[An ongoing effort to write concrete plausible future histories of AI development and its social impacts. <a class="mh-excerpt-more" href="http://aiimpacts.org/ai-vignettes-project/" title="AI Vignettes Project"></a>]]></description>
										<content:encoded><![CDATA[
<p><em>Posted Oct 12 2021</em></p>



<p>The <strong>AI Vignettes Project</strong> is an ongoing effort to write concrete plausible future histories of AI development and its social impacts.</p>



<h2 class="wp-block-heading">Details</h2>



<h3 class="wp-block-heading">Purposes</h3>



<p>We hope to:</p>



<ul class="wp-block-list"><li>Check that abstract views about the future of AI have plausible concrete instantiations. (Especially, hypothesized extinction scenarios, and proposed safe scenarios.)</li><li>Develop better intuitions about possible scenarios by thinking through them concretely.</li><li>Notice recurring themes in concrete stories, that may be worth thinking about more broadly.</li><li>Fill out the space of plausible feasible scenarios with concrete illustrations, to decrease bias in thinking about the future.</li><li>Keep a collection of AI vignettes for others to use, in the above or other ways.</li></ul>



<h3 class="wp-block-heading">Methods</h3>



<p>Our current intended method is:</p>



<ol class="wp-block-list"><li>Write draft vignettes, with no particular systematic method for choosing an unbiased selection of scenarios</li><li>Request comments on their realism</li><li>Modify according to comments</li><li>Repeat 2-3 until realism critiques subside</li></ol>



<h3 class="wp-block-heading">Work so far</h3>



<p>AI Impacts has run two small workshops where participants wrote AI vignettes. </p>



<h3 class="wp-block-heading">Vignette collection</h3>



<p>This is a subset of vignettes arising from this project, or similar.</p>



<iframe class="airtable-embed" src="https://airtable.com/embed/shr4mHlTIiKtFRDuR?backgroundColor=cyan&amp;viewControls=on" frameborder="0" onmousewheel="" width="100%" height="833" style="background: transparent; border: 1px solid #ccc;"></iframe>
]]></content:encoded>
					
		
		
			</item>
		<item>
		<title>Fiction relevant to AI futurism</title>
		<link>http://aiimpacts.org/partially-plausible-fictional-ai-futures/</link>
		
		<dc:creator><![CDATA[Katja Grace]]></dc:creator>
		<pubDate>Tue, 13 Apr 2021 00:51:04 +0000</pubDate>
				<category><![CDATA[Featured Articles]]></category>
		<category><![CDATA[Fiction]]></category>
		<category><![CDATA[Pages]]></category>
		<guid isPermaLink="false">http://aiimpacts.org/?p=2893</guid>

					<description><![CDATA[A list of stories potentially relevant to thinking about the development of advanced AI, including both those intended as futurism and those intended as entertainment. <a class="mh-excerpt-more" href="http://aiimpacts.org/partially-plausible-fictional-ai-futures/" title="Fiction relevant to AI futurism"></a>]]></description>
										<content:encoded><![CDATA[
<p>This page is an incomplete collection of fiction about the development of advanced AI, and the consequences for society. </p>



<h2 class="wp-block-heading">Details</h2>



<p>Entries are generally included if we judge that they contain enough that is plausible or correctly evocative to be worth considering, in light of AI futurism. </p>



<p>The list includes: </p>



<ol class="wp-block-list"><li>works (usually in draft form) belonging to our <a href="https://aiimpacts.org/ai-vignettes-project/" data-type="post" data-id="2985">AI Vignettes Project</a>. These are written with the intention of incrementally improving their realism via comments. These are usually in commentable form, and we welcome criticism, especially of departures from realism.</li><li>works created for the purpose of better understanding the future of AI</li><li>works from mainstream entertainment, either because they were prominent or recommended to us.<span id='easy-footnote-1-2893' class='easy-footnote-margin-adjust'></span><span class='easy-footnote'><a href='http://aiimpacts.org/partially-plausible-fictional-ai-futures/#easy-footnote-bottom-1-2893' title='We collected traditional fictional works via requests on social media, &lt;a href=&quot;https://twitter.com/KatjaGrace/status/1390544320525070338&quot;&gt;here&lt;/a&gt; and &lt;a href=&quot;https://www.facebook.com/katja.grace/posts/926632485955&quot;&gt;here&lt;/a&gt;'><sup>1</sup></a></span></li></ol>



<p>The list can be sorted and filtered by various traits that aren&#8217;t visible by default (see top left options). For instance:</p>



<ul class="wp-block-list"><li><strong>Type</strong>, i.e. being mainstream entertainment, futurism, or specifically from our <a href="https://aiimpacts.org/ai-vignettes-project/" data-type="post" data-id="2985">Vignettes Project</a>, as described above.</li><li><strong>Relevant themes</strong>, e.g. &#8216;failure modes&#8217; or &#8216;largeness of mindspace&#8217;</li><li><strong>Scenario categories</strong>, e.g. &#8216;fast takeoff&#8217;, &#8216;government project&#8217;, &#8216;brain emulations&#8217;</li><li><strong>Recommendation rating</strong>: this is roughly how strongly we recommend the piece for people wanting to think about the future of AI. It takes into account a combination of realism, tendency to evoke some specific useful intuition, ease of reading. It is very rough and probably not consistent.</li></ul>



<p>Many entries are only partially filled out. These are marked &#8216;unfinished&#8217;, and so can be filtered out.</p>



<p>We would appreciate further submissions of stories or additional details for stories we have here, reviews of stories in the collection here, or other comments <a href="https://aiimpacts.org/feedback/">here</a>.</p>



<h3 class="wp-block-heading">Collection</h3>



<p>The collection can also be seen full screen <a href="https://airtable.com/shr5EIpLNHB7o2q9Z/tblMVjRvMKVNkoZVg?backgroundColor=cyan&amp;viewControls=on">here</a> or as a table <a href="https://airtable.com/shrVnjq9U53R5nrxO">here</a>.</p>



<iframe loading="lazy" class="airtable-embed" src="https://airtable.com/embed/shr5EIpLNHB7o2q9Z?backgroundColor=cyan&amp;viewControls=on" frameborder="0" onmousewheel="" width="100%" height="733" style="background: transparent; border: 1px solid #ccc;"></iframe>



<h2 class="wp-block-heading">Related</h2>



<ul class="wp-block-list"><li><a href="https://aiimpacts.org/ai-vignettes-project/" data-type="post" data-id="2985">AI Vignettes Project</a></li></ul>



<h2 class="wp-block-heading">Notes</h2>
]]></content:encoded>
					
		
		
			</item>
		<item>
		<title>How energy efficient are human-engineered flight designs relative to natural ones?</title>
		<link>http://aiimpacts.org/are-human-engineered-flight-designs-better-or-worse-than-natural-ones/</link>
		
		<dc:creator><![CDATA[Katja Grace]]></dc:creator>
		<pubDate>Thu, 10 Dec 2020 22:48:00 +0000</pubDate>
				<category><![CDATA[Evolution engineering comparison]]></category>
		<category><![CDATA[Featured Articles]]></category>
		<category><![CDATA[Power of Evolution]]></category>
		<category><![CDATA[Pages]]></category>
		<guid isPermaLink="false">http://aiimpacts.org/?p=2715</guid>

					<description><![CDATA[Nature is responsible for the most energy efficient flight, according to an investigation of albatrosses, butterflies and nine different human-engineered flying machines. <a class="mh-excerpt-more" href="http://aiimpacts.org/are-human-engineered-flight-designs-better-or-worse-than-natural-ones/" title="How energy efficient are human-engineered flight designs relative to natural ones?"></a>]]></description>
										<content:encoded><![CDATA[
<p><em><span class="has-inline-color has-cyan-bluish-gray-color">Updated Dec 10, 2020</span></em></p>



<p><strong><em>This page is out-of-date. Visit the <a href="https://wiki.aiimpacts.org/doku.php?id=power_of_evolution:evolution_engineering_comparison:how_energy_efficient_are_human-engineered_flight_designs_relative_to_natural_ones">updated version of this page</a> on our <a href="https://wiki.aiimpacts.org/doku.php?id=start">wiki</a>.<br></em></strong><br>Among two animals and nine machines:</p>



<ul class="wp-block-list">
<li>In terms of mass⋅distance/energy, the most efficient animal was 2-8x more efficient than the most efficient machine.  All entries fell within two orders of magnitude.</li>



<li>In terms of distance/energy, the most efficient animal was 3,000-20,000x more efficient than the most efficient machine. Both animals were more efficient than all machines. Entries ranged over more than eight orders of magnitude.</li>
</ul>



<h2 class="wp-block-heading">Details</h2>



<h3 class="wp-block-heading">Background</h3>



<p>This case study is part of <a href="https://aiimpacts.org/comparison-of-naturally-evolved-and-engineered-solutions/" data-type="post" data-id="2191">research</a> that intends to compare the performance of human engineers and natural evolution on problems where both have developed solutions. The goal of this is to inform our expectations about the performance of future artificial intelligence relative to biological minds. </p>



<h3 class="wp-block-heading">Metrics</h3>



<p>We consider two metrics: </p>



<ol class="wp-block-list">
<li>Distance per energy used (meters / kilojoule). </li>



<li>Mass times distance per energy used (kilograms⋅meters / joule). </li>
</ol>



<p>These operationalize the problem of flight into two more specific problems. There are many other aspects of flight performance that one could measure, such as energy efficiency of acceleration in a straight line, turning, hovering, vertical acceleration, vertical distance, landing, taking off, time flying per energy, and our same measures with fewer or further restrictions on acceptable entries. For instance, we might look at the problem of flying with flapping wings, or without the restriction that the solutions we consider are heavier than air and self powered. </p>



<p>We did not require that the flight of an entry be constantly powered. Solutions that spend some time gliding as well as some time using powered flight were allowed. Both <a href="https://aiimpacts.org/energy-efficiency-of-wandering-albatross-flight/" data-type="post" data-id="2772">albatrosses</a> and <a href="https://aiimpacts.org/energy-efficiency-of-monarch-butterfly-flight/" data-type="post" data-id="2776">butterflies</a> use air currents to fly further.<span id='easy-footnote-1-2715' class='easy-footnote-margin-adjust'></span><span class='easy-footnote'><a href='http://aiimpacts.org/are-human-engineered-flight-designs-better-or-worse-than-natural-ones/#easy-footnote-bottom-1-2715' title='“Albatrosses and other large seabirds use dynamic soaring to gain sufficient energy from the wind to travel large distances rapidly and with little apparent effort.“&lt;/p&gt;



&lt;p&gt;Richardson, Philip L., Ewan D. Wakefield, and Richard A. Phillips. “Flight Speed and Performance of the Wandering Albatross with Respect to Wind.” &lt;em&gt;Movement Ecology&lt;/em&gt; 6, no. 1 (March 7, 2018): 3. &lt;a href=&quot;https://doi.org/10.1186/s40462-018-0121-9&quot;&gt;https://doi.org/10.1186/s40462-018-0121-9&lt;/a&gt;.&lt;/p&gt;



&lt;p&gt;See page on &lt;a href=&quot;https://aiimpacts.org/energy-efficiency-of-monarch-butterfly-flight/&quot; data-type=&quot;post&quot; data-id=&quot;2776&quot;&gt;monarch butterflies&lt;/a&gt; for details of their soaring behavior.'><sup>1</sup></a></span> The energy gains from these techniques were not included in the final score, and entries were not penalized for spending a larger fraction of time gliding. It seems likely that paramotor pilots use similar techniques, since paramotors are well suited to gliding (being paragliders with propeller motors strapped to the backs of their pilots). Our energy efficiency estimate for the paramotor came from a record breaking distance flight in which the quantity of available fuel was limited, and so it is likely that some gliding was used to increase the distance traveled as much as possible.</p>



<p>When multiple input values could have been used, such as the takeoff weight and the landing weight, or different estimates for the energetic costs of different kinds of flight for the Monarch butterfly, we generally calculated a high and a low estimate, taking the most optimistic and pessimistic inputs respectively. In all cases, the resulting best and worst estimates differed by less than a factor of ten. </p>



<h3 class="wp-block-heading">Selection of case studies</h3>



<p>We selected case studies informally, according to judgments about possible high energy efficiencies, and with an eye to exploring a wider range of case studies.</p>



<p>We started by looking at the Boeing 747-400 plane, the Wandering Albatross, and the Monarch Butterfly. We chose the animals for both being known for their abilities to fly long distances, and for both having fairly different body plans.</p>



<p>All three scored surprisingly similarly on distance times weight per energy (details below). This prompted us to look for engineered solutions that were optimized for fuel efficiency. To that end, we looked at paramotors and record breaking flying machines. In the latter category, we found the MacCready Gassomer Albatross, which was a human powered flying device that crossed the English Channel, and the Spirit of Butts’ Farm, which was a model airplane that crossed the Atlantic on one gallon of gasoline.&nbsp;</p>



<p>For reasons that are now obscure, we also included a number of different planes.</p>



<p>We would have liked to include microdrones, since they are different enough from other entries that they might be unusually efficient. However we did not find data on them.</p>



<h3 class="wp-block-heading">Case studies</h3>



<p>These are the full articles calculating the efficiencies of different flying machines and animals: </p>



<ul class="wp-block-list">
<li><a href="https://aiimpacts.org/energy-efficiency-of-wright-flyer/">Wright Flyer</a></li>



<li><a href="https://aiimpacts.org/energy-efficiency-of-wright-model-b/">Wright model B</a></li>



<li><a href="https://aiimpacts.org/energy-efficiency-of-vickers-vimy-plane/">Vickers Vimy</a></li>



<li><a href="https://aiimpacts.org/energy-efficiency-of-north-american-p-51-mustang/">North American P-51 Mustang</a></li>



<li><a href="https://aiimpacts.org/energy-efficiency-of-paramotors/" data-type="post" data-id="2765">Paramotors</a></li>



<li><a href="https://aiimpacts.org/energy-efficiency-of-the-spirit-of-butts-farm/" data-type="post" data-id="2759">The Spirit of Butt&#8217;s Farm</a></li>



<li><a href="https://aiimpacts.org/energy-efficiency-of-monarch-butterfly-flight/" data-type="post" data-id="2776">Monarch butterfly</a></li>



<li><a href="https://aiimpacts.org/maccready-gossamer-albatross/" data-type="post" data-id="2756">MacCready Gossamer Albatross</a></li>



<li><a href="https://aiimpacts.org/energy-efficiency-of-airbus-a320/" data-type="post" data-id="2743">Airbus A-320</a></li>



<li><a href="https://aiimpacts.org/energy-efficiency-of-boeing-747-400/" data-type="post" data-id="2745">Boeing 747-400</a></li>



<li><a href="https://aiimpacts.org/energy-efficiency-of-wandering-albatross-flight/" data-type="post" data-id="2772">Wandering albatross</a></li>
</ul>



<h3 class="wp-block-heading">Summary results</h3>



<p>Results are available in Table 1 below, and in <a href="https://docs.google.com/spreadsheets/d/1hMyKszvJx4A-A-qlL-frQATnb7Wv9bI51ennFbbi_wU/edit?usp=sharing">this spreadsheet</a>. Figures 1 and 2 below illustrate the equivalent questions of how far each of these animals and machines can fly, given either the same amount of fuel energy, or fuel energy proportional to their body mass.</p>




<table id="tablepress-4" class="tablepress tablepress-id-4">
<thead>
<tr class="row-1 odd">
	<th class="column-1">Name</th><th class="column-2">natural or human-engineered</th><th class="column-3">&nbsp;</th><th class="column-4">kg⋅m/J</th><th class="column-5">&nbsp;</th><th class="column-6">&nbsp;</th><th class="column-7">&nbsp;</th><th class="column-8">&nbsp;</th><th class="column-9">m/kJ</th><th class="column-10">&nbsp;</th>
</tr>
</thead>
<tbody class="row-hover">
<tr class="row-2 even">
	<td class="column-1"></td><td class="column-2"></td><td class="column-3">worst</td><td class="column-4">mean</td><td class="column-5">best</td><td class="column-6"></td><td class="column-7"></td><td class="column-8">worst</td><td class="column-9">mean</td><td class="column-10">best</td>
</tr>
<tr class="row-3 odd">
	<td class="column-1">Monarch Butterfly</td><td class="column-2">natural</td><td class="column-3">0.065</td><td class="column-4">0.21</td><td class="column-5">0.36</td><td class="column-6"></td><td class="column-7"></td><td class="column-8">100000</td><td class="column-9">350000</td><td class="column-10">600000</td>
</tr>
<tr class="row-4 even">
	<td class="column-1">Wandering Albatross</td><td class="column-2">natural</td><td class="column-3">1.4</td><td class="column-4">2.2</td><td class="column-5">3</td><td class="column-6"></td><td class="column-7"></td><td class="column-8">240</td><td class="column-9">240</td><td class="column-10">240</td>
</tr>
<tr class="row-5 odd">
	<td class="column-1">The Spirit of Butt’s Farm</td><td class="column-2">human-engineered</td><td class="column-3">0.086</td><td class="column-4">0.12</td><td class="column-5">0.16</td><td class="column-6"></td><td class="column-7"></td><td class="column-8">32</td><td class="column-9">32</td><td class="column-10">32</td>
</tr>
<tr class="row-6 even">
	<td class="column-1">MacGready Gossamer Albatross</td><td class="column-2">human-engineered</td><td class="column-3">0.19</td><td class="column-4">0.32</td><td class="column-5">0.46</td><td class="column-6"></td><td class="column-7"></td><td class="column-8">2</td><td class="column-9">3.3</td><td class="column-10">4.6</td>
</tr>
<tr class="row-7 odd">
	<td class="column-1">Paramotor</td><td class="column-2">human-engineered</td><td class="column-3">0.058</td><td class="column-4">0.079</td><td class="column-5">0.1</td><td class="column-6"></td><td class="column-7"></td><td class="column-8">0.36</td><td class="column-9">0.36</td><td class="column-10">0.36</td>
</tr>
<tr class="row-8 even">
	<td class="column-1">Wright model B</td><td class="column-2">human-engineered</td><td class="column-3">0.036</td><td class="column-4">0.078</td><td class="column-5">0.12</td><td class="column-6"></td><td class="column-7"></td><td class="column-8">0.1</td><td class="column-9">0.16</td><td class="column-10">0.21</td>
</tr>
<tr class="row-9 odd">
	<td class="column-1">Wright Flyer</td><td class="column-2">human-engineered</td><td class="column-3">0.022</td><td class="column-4">0.042</td><td class="column-5">0.061</td><td class="column-6"></td><td class="column-7"></td><td class="column-8">0.080</td><td class="column-9">0.13</td><td class="column-10">0.18</td>
</tr>
<tr class="row-10 even">
	<td class="column-1">North American P-51 Mustang</td><td class="column-2">human-engineered</td><td class="column-3">0.25</td><td class="column-4">0.38</td><td class="column-5">0.5</td><td class="column-6"></td><td class="column-7"></td><td class="column-8">0.073</td><td class="column-9">0.083</td><td class="column-10">0.092</td>
</tr>
<tr class="row-11 odd">
	<td class="column-1">Vickers Vimy</td><td class="column-2">human-engineered</td><td class="column-3">0.081</td><td class="column-4">0.17</td><td class="column-5">0.25</td><td class="column-6"></td><td class="column-7"></td><td class="column-8">0.025</td><td class="column-9">0.038</td><td class="column-10">0.05</td>
</tr>
<tr class="row-12 even">
	<td class="column-1">Airbus A320</td><td class="column-2">human-engineered</td><td class="column-3">0.33</td><td class="column-4">0.47</td><td class="column-5">0.61</td><td class="column-6"></td><td class="column-7"></td><td class="column-8">0.0078</td><td class="column-9">0.0078</td><td class="column-10">0.0078</td>
</tr>
<tr class="row-13 odd">
	<td class="column-1">Boeing 747-400</td><td class="column-2">human-engineered</td><td class="column-3">0.39</td><td class="column-4">0.61</td><td class="column-5">0.83</td><td class="column-6"></td><td class="column-7"></td><td class="column-8">0.0021</td><td class="column-9">0.0021</td><td class="column-10">0.0021</td>
</tr>
</tbody>
</table>
<!-- #tablepress-4 from cache -->



<p><strong>Table 1: Energy efficiency of flight for a variety of natural and man-made flying entities.</strong></p>



<figure class="wp-block-image size-large"><img loading="lazy" decoding="async" width="1024" height="434" src="https://aiimpacts.org/wp-content/uploads/2020/12/129727480_757428541515114_4350085674692898023_n-1024x434.jpg" alt="" class="wp-image-2813" srcset="http://aiimpacts.org/wp-content/uploads/2020/12/129727480_757428541515114_4350085674692898023_n-1024x434.jpg 1024w, http://aiimpacts.org/wp-content/uploads/2020/12/129727480_757428541515114_4350085674692898023_n-300x127.jpg 300w, http://aiimpacts.org/wp-content/uploads/2020/12/129727480_757428541515114_4350085674692898023_n-768x325.jpg 768w, http://aiimpacts.org/wp-content/uploads/2020/12/129727480_757428541515114_4350085674692898023_n-1536x651.jpg 1536w, http://aiimpacts.org/wp-content/uploads/2020/12/129727480_757428541515114_4350085674692898023_n-2048x868.jpg 2048w, http://aiimpacts.org/wp-content/uploads/2020/12/129727480_757428541515114_4350085674692898023_n-1030x438.jpg 1030w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /><figcaption class="wp-element-caption"><strong>Figure 1: If you give each animal or machine energy proportional to its weight, how far can it fly?</strong><br></figcaption></figure>



<p>On mass⋅distance/energy, evolution beats engineers, but they are relatively evenly matched: the albatross (1.4-3.0 kg.m/J) and the Boeing 747-400 (0.39-0.83 kg.m/J) are the best in the natural and engineered classes respectively. Thus the best natural solution we found was roughly 2x-8x more efficient than the human-engineered one.<span id='easy-footnote-2-2715' class='easy-footnote-margin-adjust'></span><span class='easy-footnote'><a href='http://aiimpacts.org/are-human-engineered-flight-designs-better-or-worse-than-natural-ones/#easy-footnote-bottom-2-2715' title='For the best case for engineers we compare the Boeing 747-400’s best score to the Albatross’s worst, and for the best case for evolution we do the opposite. This gives an advantage for evolution by a factor of somewhere between 1.7 and 7.7.'><sup>2</sup></a></span> We found several flying machines more efficient on this metric than the monarch butterfly.</p>



<figure class="wp-block-image size-large"><img loading="lazy" decoding="async" width="1024" height="435" src="https://aiimpacts.org/wp-content/uploads/2020/12/129734907_191799035908993_4248841669315097267_n-1024x435.jpg" alt="" class="wp-image-2814" srcset="http://aiimpacts.org/wp-content/uploads/2020/12/129734907_191799035908993_4248841669315097267_n-1024x435.jpg 1024w, http://aiimpacts.org/wp-content/uploads/2020/12/129734907_191799035908993_4248841669315097267_n-300x128.jpg 300w, http://aiimpacts.org/wp-content/uploads/2020/12/129734907_191799035908993_4248841669315097267_n-768x326.jpg 768w, http://aiimpacts.org/wp-content/uploads/2020/12/129734907_191799035908993_4248841669315097267_n-1536x653.jpg 1536w, http://aiimpacts.org/wp-content/uploads/2020/12/129734907_191799035908993_4248841669315097267_n-2048x870.jpg 2048w, http://aiimpacts.org/wp-content/uploads/2020/12/129734907_191799035908993_4248841669315097267_n-1030x438.jpg 1030w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /><figcaption class="wp-element-caption"><strong>Figure 2: How far animals and machines can fly on the same amount of energy. Note that the vertical axis is log scaled, unlike that of Figure 1, so smaller looking differences are in fact much larger: over eight orders of magnitude (vs less than two in Figure 1).</strong><br></figcaption></figure>



<p>On distance/energy, the natural solutions have a much larger advantage. Both are better than all man-made solutions we considered. The best natural and engineered solutions respectively are the monarch butterfly (100,000-600,000 m/kJ) and the Spirit of Butts&#8217; Farm (32 m/kJ), for roughly a 3,000x to 20,000x advantage to natural evolution.</p>



<p></p>



<h3 class="wp-block-heading">Interpretation</h3>



<p>We take this as weak evidence about the best possible distance/energy and distance.mass/energy measures achievable by human engineers or natural evolution. One reason for this is that this is a small set of examples. Another is that none of these animals or machines were optimized purely for either of these flight metrics—they all had other constraints or more complex goals. For instance, the <a href="https://aiimpacts.org/energy-efficiency-of-paramotors/" data-type="post" data-id="2765">paramotor</a> was competing for a record in which a paramotor had to be used, specifically. For the longest human flight, the flying machine had to be capable of carrying a human. The albatross&#8217; body has many functions. Thus it seems plausible that either engineers or natural evolution could reach solutions far better on our metrics than those recorded here if they were directly aiming for those metrics. </p>



<p>The measurements for distance.mass/energy covered a much narrower band than those for distance/energy: a factor of under two orders of magnitude versus around eight. Comparing best scores between evolution and engineering, the gap is also much smaller, as noted above (a factor of less than one order of magnitude versus three orders of magnitude). This seems like some evidence that that band of performance is natural for some reason, and so that more pointed efforts to do better on these metrics would not readily lead to much higher performance.</p>



<p><br><br><em>Primary author: Ronny Fernandez</em></p>



<h2 class="wp-block-heading">Notes</h2>
]]></content:encoded>
					
		
		
			</item>
		<item>
		<title>Time for AI to cross the human performance range in ImageNet image classification</title>
		<link>http://aiimpacts.org/time-for-ai-to-cross-the-human-performance-range-in-imagenet-image-classification/</link>
		
		<dc:creator><![CDATA[Katja Grace]]></dc:creator>
		<pubDate>Mon, 19 Oct 2020 23:52:50 +0000</pubDate>
				<category><![CDATA[Featured Articles]]></category>
		<category><![CDATA[Range of Human Performance]]></category>
		<category><![CDATA[Speed of AI Transition]]></category>
		<category><![CDATA[Pages]]></category>
		<guid isPermaLink="false">http://aiimpacts.org/?p=2683</guid>

					<description><![CDATA[Computer image classification performance took 3 years to go from untrained human level to trained human level <a class="mh-excerpt-more" href="http://aiimpacts.org/time-for-ai-to-cross-the-human-performance-range-in-imagenet-image-classification/" title="Time for AI to cross the human performance range in ImageNet image classification"></a>]]></description>
										<content:encoded><![CDATA[
<p><em><span class="has-inline-color has-cyan-bluish-gray-color">Published 19 Oct 2020</span></em></p>



<p>Progress in computer image classification performance took:</p>



<ul class="wp-block-list"><li>Over 14 years to reach the level of an untrained human</li><li>3 years to pass from untrained human level to trained human level</li><li>5 years to continue from trained human to current performance (2020)</li></ul>



<h2 class="wp-block-heading">Details</h2>



<h3 class="wp-block-heading">Metric</h3>



<p>ImageNet<span id='easy-footnote-1-2683' class='easy-footnote-margin-adjust'></span><span class='easy-footnote'><a href='http://aiimpacts.org/time-for-ai-to-cross-the-human-performance-range-in-imagenet-image-classification/#easy-footnote-bottom-1-2683' title='&amp;#8220;&lt;strong&gt;ImageNet&lt;/strong&gt;&amp;nbsp;is an image database organized according to the&amp;nbsp;&lt;a rel=&quot;noreferrer noopener&quot; href=&quot;http://wordnet.princeton.edu/&quot; target=&quot;_blank&quot;&gt;WordNet&lt;/a&gt;&amp;nbsp;hierarchy (currently only the nouns), in which each node of the hierarchy is depicted by hundreds and thousands of images. Currently we have an average of over five hundred images per node.&amp;nbsp;&amp;#8220;&lt;/p&gt;



&lt;p&gt;“ImageNet.” Accessed October 19, 2020. &lt;a href=&quot;http://www.image-net.org/&quot;&gt;http://www.image-net.org/&lt;/a&gt;.'><sup>1</sup></a></span> is a large collection of images organized into a hierarchy of noun categories. We looked at &#8216;top-5 accuracy&#8217; in categorizing images. In this task, the player is given an image, and can guess five different categories that the image might represent. It is judged as correct if the image is in fact in any of those five categories.</p>



<h3 class="wp-block-heading">Human performance milestones</h3>



<h4 class="wp-block-heading">Beginner level</h4>



<p>We used Andrej Karpathy&#8217;s interface<span id='easy-footnote-2-2683' class='easy-footnote-margin-adjust'></span><span class='easy-footnote'><a href='http://aiimpacts.org/time-for-ai-to-cross-the-human-performance-range-in-imagenet-image-classification/#easy-footnote-bottom-2-2683' title='Karpathy, Andrej. “Ilsvrc.” Accessed October 19, 2020. &lt;a href=&quot;https://cs.stanford.edu/people/karpathy/ilsvrc/&quot;&gt;https://cs.stanford.edu/people/karpathy/ilsvrc/&lt;/a&gt;.'><sup>2</sup></a></span> for doing the ImageNet top-5 accuracy task ourselves, and asked a few friends to do it. Five people did it, with performances ranging from 74% to 89%, with a median performance of 81%. </p>



<p>This was not a random sample of people, and conditions for taking the test differed. Most notably, there was no time limit, so time allocated was set by patience for trying to marginally improve guesses.</p>



<h4 class="wp-block-heading">Trained human-level</h4>



<p>ImageNet categorization is not a popular activity for humans, so we do not know what highly talented and trained human performance would look like. The best relatively high human performance measure we have comes from Russakovsky et al, who report on performance of two &#8216;expert annotators&#8217;, who they say learned many of the categories. <span id='easy-footnote-3-2683' class='easy-footnote-margin-adjust'></span><span class='easy-footnote'><a href='http://aiimpacts.org/time-for-ai-to-cross-the-human-performance-range-in-imagenet-image-classification/#easy-footnote-bottom-3-2683' title='&amp;#8216;Therefore, in evaluating the human accuracy we relied primarily on expert annotators who learned to recognize a large portion of the 1000 ILSVRC classes. During training, the annotators labeled a few hundred validation images for practice and later switched to the test set images&amp;#8217;&lt;/p&gt;



&lt;p&gt;Russakovsky, Olga, Jia Deng, Hao Su, Jonathan Krause, Sanjeev Satheesh, Sean Ma, Zhiheng Huang, et al. “ImageNet Large Scale Visual Recognition Challenge.” &lt;em&gt;ArXiv:1409.0575 [Cs]&lt;/em&gt;, January 29, 2015. &lt;a href=&quot;http://arxiv.org/abs/1409.0575&quot;&gt;http://arxiv.org/abs/1409.0575&lt;/a&gt;.'><sup>3</sup></a></span> The better performing annotator there had a 5.1% error rate.<span id='easy-footnote-4-2683' class='easy-footnote-margin-adjust'></span><span class='easy-footnote'><a href='http://aiimpacts.org/time-for-ai-to-cross-the-human-performance-range-in-imagenet-image-classification/#easy-footnote-bottom-4-2683' title='&amp;#8220;Annotator A1 evaluated a total of 1500 test set images. The GoogLeNet classication error on this sample was estimated to be 6.8% (recall that the error on full test set of 100,000 images is 6.7%, as shown in Table 7). The human error was estimated to be 5.1%.&amp;#8221;&lt;br&gt;&lt;br&gt;Also see Table 9&lt;/p&gt;



&lt;p&gt;Russakovsky, Olga, Jia Deng, Hao Su, Jonathan Krause, Sanjeev Satheesh, Sean Ma, Zhiheng Huang, et al. “ImageNet Large Scale Visual Recognition Challenge.” &lt;em&gt;ArXiv:1409.0575 [Cs]&lt;/em&gt;, January 29, 2015. &lt;a href=&quot;http://arxiv.org/abs/1409.0575&quot;&gt;http://arxiv.org/abs/1409.0575&lt;/a&gt;.'><sup>4</sup></a></span>



<h3 class="wp-block-heading">AI achievement of human milestones</h3>



<h4 class="wp-block-heading">Earliest attempt</h4>



<p>The ImageNet database was released in 2009.<span id='easy-footnote-5-2683' class='easy-footnote-margin-adjust'></span><span class='easy-footnote'><a href='http://aiimpacts.org/time-for-ai-to-cross-the-human-performance-range-in-imagenet-image-classification/#easy-footnote-bottom-5-2683' title='&amp;#8220;They presented their database for the first time as a poster at the 2009&amp;nbsp;&lt;a href=&quot;https://en.wikipedia.org/wiki/Conference_on_Computer_Vision_and_Pattern_Recognition&quot;&gt;Conference on Computer Vision and Pattern Recognition&lt;/a&gt;&amp;nbsp;(CVPR) in Florida.&amp;#8221;&lt;br&gt;&lt;br&gt;“ImageNet.” In &lt;em&gt;Wikipedia&lt;/em&gt;, September 9, 2020. &lt;a href=&quot;https://en.wikipedia.org/w/index.php?title=ImageNet&amp;amp;oldid=977585441&quot;&gt;https://en.wikipedia.org/w/index.php?title=ImageNet&amp;amp;oldid=977585441&lt;/a&gt;.&lt;br&gt;'><sup>5</sup></a></span>. An annual contest, the ImageNet Large Scale Visual Recognition Challenge, began in 2010.<span id='easy-footnote-6-2683' class='easy-footnote-margin-adjust'></span><span class='easy-footnote'><a href='http://aiimpacts.org/time-for-ai-to-cross-the-human-performance-range-in-imagenet-image-classification/#easy-footnote-bottom-6-2683' title='&amp;#8220;&amp;#8230;The ImageNet Large Scale Visual Recognition Challenge (ILSVRC) has been running annually for five years (since 2010) and has become the standard benchmark for large-scale object recognition.&amp;#8221;&lt;/p&gt;



&lt;p&gt;Russakovsky, Olga, Jia Deng, Hao Su, Jonathan Krause, Sanjeev Satheesh, Sean Ma, Zhiheng Huang, et al. “ImageNet Large Scale Visual Recognition Challenge.” &lt;em&gt;ArXiv:1409.0575 [Cs]&lt;/em&gt;, January 29, 2015. &lt;a href=&quot;http://arxiv.org/abs/1409.0575&quot;&gt;http://arxiv.org/abs/1409.0575&lt;/a&gt;.'><sup>6</sup></a></span>



<p>In the 2010 contest, the best top-5 classification performance had 28.2% error.<span id='easy-footnote-7-2683' class='easy-footnote-margin-adjust'></span><span class='easy-footnote'><a href='http://aiimpacts.org/time-for-ai-to-cross-the-human-performance-range-in-imagenet-image-classification/#easy-footnote-bottom-7-2683' title='See table 6.&lt;/p&gt;



&lt;p&gt;Russakovsky, Olga, Jia Deng, Hao Su, Jonathan Krause, Sanjeev Satheesh, Sean Ma, Zhiheng Huang, et al. “ImageNet Large Scale Visual Recognition Challenge.” &lt;em&gt;ArXiv:1409.0575 [Cs]&lt;/em&gt;, January 29, 2015. &lt;a href=&quot;http://arxiv.org/abs/1409.0575&quot;&gt;http://arxiv.org/abs/1409.0575&lt;/a&gt;.'><sup>7</sup></a></span> </p>



<p>However image classification broadly is older. Pascal VOC was a similar previous contest, which ran from 2005.<span id='easy-footnote-8-2683' class='easy-footnote-margin-adjust'></span><span class='easy-footnote'><a href='http://aiimpacts.org/time-for-ai-to-cross-the-human-performance-range-in-imagenet-image-classification/#easy-footnote-bottom-8-2683' title='&amp;#8220;The PASCAL Visual Object Classes (VOC) challenge is a benchmark in visual object category recognition and detection, providing the vision and machine learning communities with a standard dataset of images and annotation, and standard evaluation procedures. Organised annually from 2005 to present, the challenge and its associated dataset has become accepted as the benchmark for object detection.&amp;#8221;&lt;/p&gt;



&lt;p&gt;Everingham, Mark, Luc Van Gool, Christopher K. I. Williams, John Winn, and Andrew Zisserman. “The Pascal Visual Object Classes (VOC) Challenge.” &lt;em&gt;International Journal of Computer Vision&lt;/em&gt; 88, no. 2 (June 2010): 303–38. &lt;a href=&quot;https://doi.org/10.1007/s11263-009-0275-4&quot;&gt;https://doi.org/10.1007/s11263-009-0275-4&lt;/a&gt;.'><sup>8</sup></a></span> We do not know when the first successful image classification systems were developed. In a blog post, Amidi &amp; Amidi point to LeNet as pioneering work in image classification<span id='easy-footnote-9-2683' class='easy-footnote-margin-adjust'></span><span class='easy-footnote'><a href='http://aiimpacts.org/time-for-ai-to-cross-the-human-performance-range-in-imagenet-image-classification/#easy-footnote-bottom-9-2683' title='See section &amp;#8216;LeNet&amp;#8217;.&lt;br&gt;&lt;br&gt;“The Evolution of Image Classification Explained.” Accessed October 19, 2020. &lt;a href=&quot;https://stanford.edu/~shervine/blog/evolution-image-classification-explained#lenet&quot;&gt;https://stanford.edu/~shervine/blog/evolution-image-classification-explained#lenet&lt;/a&gt;.'><sup>9</sup></a></span>, and it appears to have been developed in 1998.<span id='easy-footnote-10-2683' class='easy-footnote-margin-adjust'></span><span class='easy-footnote'><a href='http://aiimpacts.org/time-for-ai-to-cross-the-human-performance-range-in-imagenet-image-classification/#easy-footnote-bottom-10-2683' title='&amp;#8220;&lt;strong&gt;LeNet&lt;/strong&gt;&amp;nbsp;is a&amp;nbsp;&lt;a href=&quot;https://en.wikipedia.org/wiki/Convolutional_neural_network&quot;&gt;convolutional neural network&lt;/a&gt;&amp;nbsp;structure proposed by&amp;nbsp;&lt;a href=&quot;https://en.wikipedia.org/wiki/Yann_LeCun&quot;&gt;Yann LeCun&lt;/a&gt;&amp;nbsp;et al. in 1998.&amp;#8221;&lt;/p&gt;



&lt;p&gt;“LeNet.” In &lt;em&gt;Wikipedia&lt;/em&gt;, June 19, 2020. &lt;a href=&quot;https://en.wikipedia.org/w/index.php?title=LeNet&amp;amp;oldid=963418885&quot;&gt;https://en.wikipedia.org/w/index.php?title=LeNet&amp;amp;oldid=963418885&lt;/a&gt;.'><sup>10</sup></a></span>



<h4 class="wp-block-heading">Beginner level</h4>



<p>The first entrant in the ImageNet contest to perform better than our beginner level benchmark was SuperVision (commonly known as AlexNet) in 2012, with a 15.3% error rate.<span id='easy-footnote-11-2683' class='easy-footnote-margin-adjust'></span><span class='easy-footnote'><a href='http://aiimpacts.org/time-for-ai-to-cross-the-human-performance-range-in-imagenet-image-classification/#easy-footnote-bottom-11-2683' title='&amp;#8220;We also entered a variant of this model in the&lt;br&gt;ILSVRC-2012 competition and achieved a winning top-5 test error rate of 15.3%&amp;#8221;&lt;/p&gt;



&lt;p&gt;Krizhevsky, Alex, Ilya Sutskever, and Geoffrey E Hinton. “ImageNet Classification with Deep Convolutional Neural Networks.” In &lt;em&gt;Advances in Neural Information Processing Systems 25&lt;/em&gt;, edited by F. Pereira, C. J. C. Burges, L. Bottou, and K. Q. Weinberger, 1097–1105. Curran Associates, Inc., 2012. &lt;a href=&quot;http://papers.nips.cc/paper/4824-imagenet-classification-with-deep-convolutional-neural-networks.pdf&quot;&gt;http://papers.nips.cc/paper/4824-imagenet-classification-with-deep-convolutional-neural-networks.pdf&lt;/a&gt;.&lt;br&gt;&lt;br&gt;Also, see Table 6 for a list of other entrants: &lt;br&gt;&lt;br&gt;Russakovsky, Olga, Jia Deng, Hao Su, Jonathan Krause, Sanjeev Satheesh, Sean Ma, Zhiheng Huang, et al. “ImageNet Large Scale Visual Recognition Challenge.” &lt;em&gt;ArXiv:1409.0575 [Cs]&lt;/em&gt;, January 29, 2015. &lt;a href=&quot;http://arxiv.org/abs/1409.0575&quot;&gt;http://arxiv.org/abs/1409.0575&lt;/a&gt;.'><sup>11</sup></a></span>



<h4 class="wp-block-heading">Superhuman level</h4>



<p>In 2015 He et al apparently achieved a 4.5% error rate, slightly better than our high human benchmark.<span id='easy-footnote-12-2683' class='easy-footnote-margin-adjust'></span><span class='easy-footnote'><a href='http://aiimpacts.org/time-for-ai-to-cross-the-human-performance-range-in-imagenet-image-classification/#easy-footnote-bottom-12-2683' title='&amp;#8220;Our 152-layer ResNet has a single-model top-5 validation error of 4.49%.&amp;#8221; &lt;br&gt;&lt;br&gt;Also see Table 4&lt;br&gt;&lt;br&gt;He, Kaiming, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. “Deep Residual Learning for Image Recognition.” &lt;em&gt;ArXiv:1512.03385 [Cs]&lt;/em&gt;, December 10, 2015. &lt;a href=&quot;http://arxiv.org/abs/1512.03385&quot;&gt;http://arxiv.org/abs/1512.03385&lt;/a&gt;.'><sup>12</sup></a></span>



<h4 class="wp-block-heading">Current level</h4>



<p>According to paperswithcode.com, performance has continued to climb, to 2020, though slower than earlier.<span id='easy-footnote-13-2683' class='easy-footnote-margin-adjust'></span><span class='easy-footnote'><a href='http://aiimpacts.org/time-for-ai-to-cross-the-human-performance-range-in-imagenet-image-classification/#easy-footnote-bottom-13-2683' title='See figure:&lt;/p&gt;



&lt;p&gt;“Papers with Code &amp;#8211; ImageNet Benchmark (Image Classification).” Accessed October 19, 2020. &lt;a href=&quot;https://paperswithcode.com/sota/image-classification-on-imagenet&quot;&gt;https://paperswithcode.com/sota/image-classification-on-imagenet&lt;/a&gt;.'><sup>13</sup></a></span>



<h3 class="wp-block-heading">Times for AI to cross human-relative ranges&nbsp;</h3>



<p>Given the above dates, we have:</p>



<figure class="wp-block-table"><table><tbody><tr><td>Range</td><td>Start</td><td>End</td><td>Duration (years)</td></tr><tr><td>First attempt to beginner level</td><td>&lt;1998</td><td>2012</td><td>&gt;14</td></tr><tr><td>Beginner to superhuman</td><td>2012</td><td>2015</td><td>3</td></tr><tr><td>Above superhuman</td><td>2015</td><td>&gt;2020</td><td>&gt;5</td></tr></tbody></table></figure>



<p></p>



<p><em>Primary author: Rick Korzekwa</em></p>



<h2 class="wp-block-heading">Notes</h2>
]]></content:encoded>
					
		
		
			</item>
		<item>
		<title>Surveys on fractional progress towards HLAI</title>
		<link>http://aiimpacts.org/surveys-on-fractional-progress-towards-hlai/</link>
		
		<dc:creator><![CDATA[Asya Bergal]]></dc:creator>
		<pubDate>Tue, 14 Apr 2020 22:34:35 +0000</pubDate>
				<category><![CDATA[AI Timeline Surveys]]></category>
		<category><![CDATA[AI Timelines]]></category>
		<category><![CDATA[Featured Articles]]></category>
		<category><![CDATA[Predictions of Human-Level AI Timelines]]></category>
		<category><![CDATA[Pages]]></category>
		<guid isPermaLink="false">http://aiimpacts.org/?p=2416</guid>

					<description><![CDATA[How long until human-level performance, if we naively extrapolate progress since researchers joined their subfields? <a class="mh-excerpt-more" href="http://aiimpacts.org/surveys-on-fractional-progress-towards-hlai/" title="Surveys on fractional progress towards HLAI"></a>]]></description>
										<content:encoded><![CDATA[
<p>Given simplistic assumptions, extrapolating fractional progress estimates suggests a median time from 2020 to human-level AI of:</p>



<ul class="wp-block-list"><li>372 years (2392), based on responses collected in Robin Hanson’s informal 2012-2017 survey.</li><li>36 years (2056), based on all responses collected in the 2016 Expert Survey on Progress in AI.</li><li>142 years (2162), based on the subset of responses to the 2016 Expert Survey on Progress in AI who had been in their subfield for at least 20 years.</li><li>32 years (2052), based on the subset of responses to the 2016 Expert Survey on Progress in AI about progress in deep learning or machine learning as a whole rather than narrow subfields.</li></ul>



<p>67% of respondents of the 2016 expert survey on AI and 44% of respondents who answered from Hanson’s informal survey said that progress was accelerating.</p>



<h2 class="wp-block-heading">Details</h2>



<p>One way of estimating how many years something will take is to estimate what fraction of progress toward it has been made over a fixed number of years, then to extrapolate the number of years needed for full progress. As suggested by Robin Hanson,<span id='easy-footnote-1-2416' class='easy-footnote-margin-adjust'></span><span class='easy-footnote'><a href='http://aiimpacts.org/surveys-on-fractional-progress-towards-hlai/#easy-footnote-bottom-1-2416' title='From &lt;a href=&quot;http://www.overcomingbias.com/2012/08/ai-progress-estimate.html&quot;&gt;this Overcoming Bias post&lt;/a&gt;:&lt;/p&gt;



&lt;p&gt;“I’d guess that relative to the starting point of our abilities of twenty years ago, we’ve come about 5-10% of the distance toward human level abilities. At least in probability-related areas, which I’ve known best. I’d also say there hasn’t been noticeable acceleration over that time. … If this 5-10% estimate is typical, as I suspect it is, then an outside view calculation suggests we probably have at least a century to go, and maybe a great many centuries, at current rates of progress.”Hanson, Robin. “AI Progress Estimate.” Overcoming Bias. Accessed April 14, 2020. &lt;a href=&quot;http://www.overcomingbias.com/2012/08/ai-progress-estimate.html&quot;&gt;http://www.overcomingbias.com/2012/08/ai-progress-estimate.html&lt;/a&gt;.'><sup>1</sup></a></span> this method can provide an estimate for when human-level AI will be developed, if we have data on what fraction of progress toward human-level AI has been made and whether it is proceeding at a constant rate.&nbsp;<br></p>



<p>We know of two surveys that ask about fractional progress and acceleration in specific AI subfields: an <a href="https://aiimpacts.org/hanson-ai-expert-survey/">informal survey conducted by Robin Hanson in 2012 &#8211; 2017</a>, and our <a href="https://aiimpacts.org/2016-expert-survey-on-progress-in-ai/">2016 Expert Survey on Progress in AI</a>. We use them to extrapolate progress to human-level AI, assuming that:</p>



<ol class="wp-block-list"><li>AI progresses at the average rate that people have observed so far.</li><li>Human-level AI will be achieved when the median subfield reaches human-level.</li></ol>



<h3 class="wp-block-heading">Assumptions</h3>



<h4 class="wp-block-heading">AI progresses at the average rate that people have observed so far</h4>



<p>The naive extrapolation method described above assumes that AI progresses at the average rate that people have observed so far, but some respondents perceived acceleration or deceleration. If we guess that this change in the rate of the progress continues into the future, this suggests that a truer extrapolation of each person’s observations would place human-level performance in their subfield either before or after the naively extrapolated date.</p>



<h4 class="wp-block-heading">Human-level AI will be achieved when the median subfield reaches human-level</h4>



<p>Both surveys asked respondents about fractional progress in their subfields. Extrapolating out these estimates to get to human-level performance gives some evidence for when AGI may come, but is not a perfect proxy. It may turn out that we get human-level performance in a small number of subfields much earlier than others, such that we count the resulting AI as ‘AGI’, or it may be the case that certain subfields important to AGI do not exist yet.</p>



<h3 class="wp-block-heading">Hanson AI Expert Survey</h3>



<p><a href="https://aiimpacts.org/hanson-ai-expert-survey/">Hanson’s survey</a> informally asked ~15 AI experts to estimate how far we’ve come in their own subfield of AI research in the last twenty years, compared to how far we have to go to reach human level abilities. The subfields represented were analogical reasoning, knowledge representation, computer-assisted training, natural language processing, constraint satisfaction, robotic grasping manipulation, early-human vision processing, constraint reasoning, and “no particular subfield”. Three respondents said the rate of progress was staying the same, four said it was getting faster, two said it was slowing down, and six did not answer (or may not have been asked).&nbsp;<br></p>



<p>The naive extrapolations<span id='easy-footnote-2-2416' class='easy-footnote-margin-adjust'></span><span class='easy-footnote'><a href='http://aiimpacts.org/surveys-on-fractional-progress-towards-hlai/#easy-footnote-bottom-2-2416' title='Naively, we simply divide twenty years by the fraction of progress made to get an estimate of total years necessary, not accounting for possible acceleration. To get the time from &lt;em&gt;now&lt;/em&gt; to human-level performanceI we subtract the twenty years of progress already made and subtract the difference between the year the question was asked and now (2020).'><sup>2</sup></a></span> of the answers from <a href="https://aiimpacts.org/hanson-ai-expert-survey/">Hanson’s survey</a> give a median time from 2020 to <a href="https://aiimpacts.org/human-level-ai/">human-level AI</a> (HLAI) of 372 years (2392). See the survey data and our calculations <a href="https://docs.google.com/spreadsheets/d/1KEttYmpOgyISY8pLAR4syU-0QA4yekI7GFaoabRNDTs/edit?usp=sharing">here</a>.</p>



<h3 class="wp-block-heading">2016 Expert Survey on Progress in AI</h3>



<p>The <a href="https://aiimpacts.org/2016-expert-survey-on-progress-in-ai/">2016 Expert Survey on Progress in AI</a> (2016 ESPAI) asked machine learning researchers which subfield they were in, how long they had been in their subfield, and what fraction of the remaining path to human-level performance (in their subfield) they thought had been traversed in that time.<span id='easy-footnote-3-2416' class='easy-footnote-margin-adjust'></span><span class='easy-footnote'><a href='http://aiimpacts.org/surveys-on-fractional-progress-towards-hlai/#easy-footnote-bottom-3-2416' title='



&lt;ul class=&quot;wp-block-list&quot;&gt;&lt;li&gt;Which AI research area have you worked in for the longest time?&lt;/li&gt;&lt;li&gt;How long have you worked in this area?&lt;/li&gt;&lt;li&gt;Consider three levels of progress or advancement in this area: &amp;nbsp; A. Where the area was when you started working in it B. Where it is now C. Where it would need to be for AI software to have roughly human level abilities at the tasks studied in this area &amp;nbsp; What fraction of the distance between where progress was when you started working in the area (A) and where it would need to be to attain human level abilities in the area (C) have we come so far (B)?&lt;/li&gt;&lt;/ul&gt;



&lt;p&gt;&amp;#8212; From &lt;a href=&quot;https://aiimpacts.org/2016-esopai-questions-printout/&quot;&gt;the printout of the 2016 ESPAI questions&lt;/a&gt;.'><sup>3</sup></a></span> 107 out of 111 responses were used in our calculation.<span id='easy-footnote-4-2416' class='easy-footnote-margin-adjust'></span><span class='easy-footnote'><a href='http://aiimpacts.org/surveys-on-fractional-progress-towards-hlai/#easy-footnote-bottom-4-2416' title='We excluded responses which said a subfield had seen 100% or more progress, since we’re interested in the remaining progress required in the subfields that haven’t gotten to human-level yet.'><sup>4</sup></a></span> 42 subfields were reported, including “Machine learning”, “Graphical models”, “Speech recognition”, “Optimization”, “Bayesian Learning”, and “Robotics”.<span id='easy-footnote-5-2416' class='easy-footnote-margin-adjust'></span><span class='easy-footnote'><a href='http://aiimpacts.org/surveys-on-fractional-progress-towards-hlai/#easy-footnote-bottom-5-2416' title='The complete list is: “Image Processing”, “Machine learning”, “Deep learning”, “Graphical models”, “Speech recognition”, “Optimization”, “Deep neural networks”, “Computer vision”, “Learning theory”, “Classifiers and statistical learning”, “Natural language processing”, “Sequential decision-making, “Online learning”, “Visual perception”, “Bayesian learning”, “Manifold learning”, “Reinforcement learning”, “Probabilistic modeling”, “Robotics”, “Active learning”, “Graph-based pattern recognition”, “Image processing”, “Continuous control”, “Planning algorithms”, and “Network analysis”.'><sup>5</sup></a></span> Notably, Hanson’s survey included subfields that weren’t represented in 2016 ESPAI, including analogic reasoning and knowledge representation. Since&nbsp;2016 ESPAI was restricted to machine learning researchers, it may exclude non-machine-learning subfields that turn out to be important to fully human-level capabilities.</p>



<h4 class="wp-block-heading">Acceleration</h4>



<p>67% of all respondents said progress in their subfield was accelerating (see Figure 1). Most respondents said progress in their subfield was accelerating in each of the subsets we look at below (ML vs narrow subfield, and time in field).</p>



<figure class="wp-block-image is-resized"><img loading="lazy" decoding="async" src="https://lh4.googleusercontent.com/5xtsl-kgfjKfdkoRudpxer9vf3FCdGCnG6NinopziPPhvTIe-ZoX4fTiPB3ZU6YA4PS1dmwKINL6UqY9oq3Z7frRtaoPI7Bgeh2-cyb1-Ss0qoaNl6lG5sCXlhpBzfPpWL86yGLY" alt="" width="577" height="355"/><figcaption>Figure 1: Number of responses that progress was faster in the first half of the time in the field worked by respondents, the second half, or was about the same in both halves.</figcaption></figure>



<p>Most respondents think progress is accelerating. If this acceleration continues, our naively extrapolated estimates below may be overestimates for time to human-level performance.</p>



<h4 class="wp-block-heading">Time to HLAI</h4>



<p>We calculated estimated years from 2020 until human-level subfield performance by naively extrapolating the reported fractions of the subfield already traversed.<span id='easy-footnote-6-2416' class='easy-footnote-margin-adjust'></span><span class='easy-footnote'><a href='http://aiimpacts.org/surveys-on-fractional-progress-towards-hlai/#easy-footnote-bottom-6-2416' title='As with the Hanson survey, we divided time in the field by the fraction of the remaining path traversed, then subtracted the number of years worked in the subfield, then subtracted an additional four years to account for the difference between when these questions were asked (2016) and now (2020).'><sup>6</sup></a></span> Figure 2 below shows the implied estimates for time until human-level performance for all respondents’ answers. These estimates give a median time from 2020 until HLAI of 36 years (2056).</p>



<figure class="wp-block-image is-resized"><img loading="lazy" decoding="async" src="https://lh4.googleusercontent.com/Zhz3eHQVJt1CNB-zfNap2d3nLJWm_9XvUejIvmDJ5cxNgTRB7JA-1PUlDm82PCigAjbanpMeA5IvSXNigeAPiII67UXjTsMEEXuOW9GmRJ8hN73rCY_i3kgVoUeuibI15gJbeqCG" alt="" width="580" height="358"/><figcaption>Figure 2: Extrapolated estimated time until human-level subfield performance for each respondent, arranged by length of time. The last four responses are above 1000 but have been cut off.</figcaption></figure>



<h5 class="wp-block-heading">Machine learning vs subfield progress</h5>



<p>Some respondents reported broad ‘subfields’, which encompassed all of machine learning, in particular “Machine learning” or “Deep learning”, while others reported narrow subfields, e.g. “Natural language processing” or “Robotics”. We split the survey data based on this subfield narrowness, guessing that progress on machine learning overall may be a better proxy for AGI overall. Among the 69 respondents who gave answers corresponding to the entire field of machine learning, the median implied time was 32 years (2052). Among the 70 respondents who gave narrow answers, the median implied time was 44 years (2064). Figures 3 and 4 show these estimates.<br></p>



<figure class="wp-block-image is-resized"><img loading="lazy" decoding="async" src="https://lh5.googleusercontent.com/hdHrjTB1A2uoiriGWrOKMG8z72t3rDqWHEN_h3U-Auc-TmM2hheaEpXoNpxr2xsDafVZ5UPVbv2p9LLBTOghIt63pEowb0zVlRyJZCAfOHlPet0eroPXc0DSgcaMb4KXP2qSaAXv" alt="" width="583" height="359"/><figcaption>Figure 3: Implied estimates for human-level performance based on respondents who specified broad answers, e.g. “Machine learning” when asked about their subfield. The last three responses are above 1000 but have been cut off.</figcaption></figure>



<p>Figure 3: Implied estimates for human-level performance based on respondents who specified broad answers, e.g. “Machine learning” when asked about their subfield. The last three responses are above 1000 but have been cut off.</p>



<figure class="wp-block-image is-resized"><img loading="lazy" decoding="async" src="https://lh4.googleusercontent.com/UeNSWgUOl1Df4UkZjYhrIx5SKy_7CFz7ahMWnlcfpGVUqdcQpB3diGknCOYb7if-xCaLrfLbrHwLnlBBPketfdCRjs1jqUKA3sZ1rttdo4Ft0PL_d48PJwN40ylU0ilaE8A60vsU" alt="" width="582" height="358"/><figcaption>Figure 4: Implied estimates for human-level performance based on respondents who specified narrow answers, e.g. “Natural language processing” when asked about their subfield. The last response is above 1000 but has been cut off.</figcaption></figure>



<p>The median implied estimate until human-level performance for machine learning broadly was 12 years sooner than the median estimate for specific subfields. This is counter to what we might expect, if human-level performance in machine learning broadly implies human-level performance on each individual subfield.</p>



<h5 class="wp-block-heading">Time spent in field</h5>



<p>Robin Hanson has suggested that his survey may get longer implied forecasts than 2016 ESPAI because he asks exclusively people who have spent at least 20 years in their field.<span id='easy-footnote-7-2416' class='easy-footnote-margin-adjust'></span><span class='easy-footnote'><a href='http://aiimpacts.org/surveys-on-fractional-progress-towards-hlai/#easy-footnote-bottom-7-2416' title='“One obvious difference is that I limited my sample to people who’d been in a field for at least 20 years.&amp;nbsp;Can you try limiting your sample in that way, or at least looking at the correlation between time in field and their rate estimates?“&lt;br&gt;&amp;#8212; From an email chain with Robin Hanson on February 15, 2020'><sup>7</sup></a></span> Filtering for people who have spent at least 20 years in their field, we have eight responses, and get a median implied time until HLAI of 142 years from 2020 (2162). Filtering for people who have spent at least 10 years in their field, we have 38 responses, and get a median implied time of 86 years (2106). Filtering for people who have spent less than 10 years in their field, we have 69 responses, and get a median implied time of 24 years (2044). Figures 5, 6 and 7 show estimates for each respondent, for each of these classes of time in field. </p>



<figure class="wp-block-image is-resized"><img loading="lazy" decoding="async" src="https://lh3.googleusercontent.com/yi2qtCBl_Ba436R10nGXKKZ6Y_Rcl7xPTgCFb_YbsFAb2ociDBUT7J0KGrH3vORVm19jDtdr_1xGw6WvItaA_QT2QbCPDVkTOiJV9fHf36mOEOePLcohLGjb_o5PVAaKdRQy9h2j" alt="" width="581" height="359"/><figcaption>Figure 5: Implied estimates for human-level performance based on respondents who were working on their subfield for at least 20 years. The last response is above 1000 but has been cut off.</figcaption></figure>



<figure class="wp-block-image is-resized"><img loading="lazy" decoding="async" src="https://lh6.googleusercontent.com/vjsPF7Z4ldZbVt6K1RMiNqm3obQYtMcNOTyamTsy2g_vAH14NAa72GEwmiuVd2Sqrs5y2CN7LGiwqJMcjqewNy4hcUTpNy3ZunTulV_24HX64qQE00VUgxADHQxmAtf_Y72Zd_fO" alt="" width="580" height="358"/><figcaption>Figure 6: Implied estimates for human-level performance based on respondents who were working on their subfield for at least 10 years. The last three responses are above 1000 but have been cut off.<br></figcaption></figure>



<figure class="wp-block-image is-resized"><img loading="lazy" decoding="async" src="https://lh4.googleusercontent.com/FLSRMkqRuMkdi5Y2ms1BQOM4d-iCXTbeK1XcOjJNbSqmmCLRh0ZMgImQPRLeDzpBHvxupLMhdMVQ3tlryUA4wApKZn2WxdkFHod4qvHfCX8pgxNQGwkTryCSKezPHpex2pTHWn87" alt="" width="580" height="358"/><figcaption>Figure 7: Implied estimates for human-level performance based on respondents who were working on their subfield for less than 10 years. The last response is above 1000 but has been cut off.</figcaption></figure>



<h3 class="wp-block-heading">Comparison of the two surveys</h3>



<p>The median implied estimate from 2020 until human-level performance suggested by responses from 2016 ESPAI (36 years) is an order of magnitude smaller than the one suggested by the Hanson survey (372 years). This appears to be at least partly explained by more experienced researchers giving responses that imply longer estimates. Hanson asks exclusively people who have spent at least 20 years in their subfield, whereas the 2016 survey does not filter based on experience. If we filter 2016 survey respondents for researchers who have spent at least 20 years in their subfield we instead get a median estimate of 142 years.&nbsp;<br></p>



<p>More experienced researchers may generate longer implied estimates because the majority of progress has happened recently&#8211; many people think progress accelerated, which is some evidence of this. It could also be that less-experienced researchers feel that progress is more significant than it actually is.<br></p>



<p>If AI research is accelerating and is going to continue accelerating until we get to human-level AI, the time to HLAI may be sooner than these estimates. If AI research is accelerating now but is not representative of what progress will look like in the future, longer naive estimates by more experienced researchers may be more appropriate.<br></p>



<h3 class="wp-block-heading">Comparison to estimates reached by other survey methods</h3>



<p><a href="https://aiimpacts.org/2016-expert-survey-on-progress-in-ai/">2016 ESPAI</a> also asked people to estimate time until human-level machine intelligence (HLMI) by asking them how many years they would give until a 50% chance of HLMI. The median answer for <a href="https://aiimpacts.org/2016-expert-survey-on-progress-in-ai/#Human-level_intelligence">this question</a> in 2016 was 40 years, or 36 years from 2020 (2056), exactly the same as the median answer of 36 years implied by extrapolating fractional progress. The survey also asked about time to HLMI in other ways, which yielded <a href="https://aiimpacts.org/2016-expert-survey-on-progress-in-ai/#Answers">less consistent answers</a>.<br></p>



<p><em>Primary author: Asya Bergal</em><br></p>



<h2 class="wp-block-heading">Notes</h2>
]]></content:encoded>
					
		
		
			</item>
		<item>
		<title>Precedents for economic n-year doubling before 4n-year doubling</title>
		<link>http://aiimpacts.org/precedents-for-economic-n-year-doubling-before-4n-year-doubling/</link>
		
		<dc:creator><![CDATA[Katja Grace]]></dc:creator>
		<pubDate>Tue, 14 Apr 2020 20:42:41 +0000</pubDate>
				<category><![CDATA[Featured Articles]]></category>
		<category><![CDATA[Takeoff speed]]></category>
		<category><![CDATA[front]]></category>
		<category><![CDATA[Pages]]></category>
		<guid isPermaLink="false">http://aiimpacts.org/?p=2406</guid>

					<description><![CDATA[Does the economy ever double without having first doubled four times slower? Yes, but not since 3000BC. <a class="mh-excerpt-more" href="http://aiimpacts.org/precedents-for-economic-n-year-doubling-before-4n-year-doubling/" title="Precedents for economic n-year doubling before 4n-year doubling"></a>]]></description>
										<content:encoded><![CDATA[
<p>The only times gross world product appears to have doubled in <em>n</em> years without having doubled previously in 4<em>n</em> years were between 4,000 BC and 3,000 BC, and most likely between 10,000 BC and 4,000 BC.</p>



<h2 class="wp-block-heading">Details</h2>



<h3 class="wp-block-heading">Background</h3>



<p>A key open question regarding AI risk is how quickly advanced artificial intelligence will &#8216;take off&#8217;, which is to say something like &#8216;go from being a small source of influence in the world to an overwhelming one&#8217;. </p>



<p>In <em>Superintelligence</em><span id='easy-footnote-1-2406' class='easy-footnote-margin-adjust'></span><span class='easy-footnote'><a href='http://aiimpacts.org/precedents-for-economic-n-year-doubling-before-4n-year-doubling/#easy-footnote-bottom-1-2406' title='   Bostrom, Nick. &lt;em&gt;Superintelligence: Paths, Dangers, Strategies&lt;/em&gt;. 1 edition. Oxford: Oxford University Press, 2014.    &lt;/p&gt;



'><sup>1</sup></a></span>, Nick Bostrom defines the following answers, seemingly in line with common usage:</p>



<ul class="wp-block-list"><li><strong>Slow takeoff</strong> takes decades or centuries</li><li><strong>Moderate takeoff</strong> takes months or years </li><li><strong>Fast takeoff</strong> takes minutes to days</li></ul>



<p>However the specific criteria for takeoff having occurred are generally ambiguous.</p>



<p>Paul Christiano has suggested<span id='easy-footnote-2-2406' class='easy-footnote-margin-adjust'></span><span class='easy-footnote'><a href='http://aiimpacts.org/precedents-for-economic-n-year-doubling-before-4n-year-doubling/#easy-footnote-bottom-2-2406' title='paulfchristiano. “Takeoff Speeds.” &lt;em&gt;The Sideways View&lt;/em&gt; (blog), February 24, 2018. &lt;a href=&quot;https://sideways-view.com/2018/02/24/takeoff-speeds/&quot;&gt;https://sideways-view.com/2018/02/24/takeoff-speeds/&lt;/a&gt;.    '><sup>2</sup></a></span> operationalizing &#8216;slow takeoff&#8217; as, </p>



<blockquote class="wp-block-quote is-layout-flow wp-block-quote-is-layout-flow"><p>There will be a complete 4 year interval in which world output doubles, before the first 1 year interval in which world output doubles. (Similarly, we’ll see an 8 year doubling before a 2 year doubling, etc.)</p></blockquote>



<h3 class="wp-block-heading">Historic precedents</h3>



<p>We were interested in whether anything faster than a &#8216;slow takeoff&#8217; by this definition would be historically unprecedented. That is, we wanted to know whether whenever the economy has doubled in <em>n</em> years, it has always completed a doubling in 4<em>n</em> years or less before the beginning of the <em>n</em> year doubling.</p>



<p>We took historic gross world product (GWP) estimates from Wikipedia<span id='easy-footnote-3-2406' class='easy-footnote-margin-adjust'></span><span class='easy-footnote'><a href='http://aiimpacts.org/precedents-for-economic-n-year-doubling-before-4n-year-doubling/#easy-footnote-bottom-3-2406' title='“Gross World Product.” In &lt;em&gt;Wikipedia&lt;/em&gt;, August 14, 2019. &lt;a href=&quot;https://en.wikipedia.org/w/index.php?title=Gross_world_product&amp;amp;oldid=910796857&quot;&gt;https://en.wikipedia.org/w/index.php?title=Gross_world_product&amp;amp;oldid=910796857&lt;/a&gt;.    &lt;br&gt;&lt;br&gt;The page notes that most of their data comes from J Bradford de Long&amp;#8217;s dataset: &lt;br&gt;&lt;br&gt;J. Bradford DeLong (24 May 1998).&amp;nbsp;&lt;a href=&quot;http://holtz.org/Library/Social%20Science/Economics/Estimating%20World%20GDP%20by%20DeLong/Estimating%20World%20GDP.htm&quot;&gt;&amp;#8220;Estimating World GDP, One Million B.C. – Present&amp;#8221;&lt;/a&gt;. Retrieved&amp;nbsp;5 February&amp;nbsp;2013.'><sup>3</sup></a></span> and checked at each date how long it had taken for the economy to double, and whether it had always at some point doubled in as few as four times as many years prior to the start of that doubling.<span id='easy-footnote-4-2406' class='easy-footnote-margin-adjust'></span><span class='easy-footnote'><a href='http://aiimpacts.org/precedents-for-economic-n-year-doubling-before-4n-year-doubling/#easy-footnote-bottom-4-2406' title='[Note May 13 2020: This sheet is temporarily wrong.]&lt;s&gt;Instances coincide with &lt;a href=&quot;https://docs.google.com/spreadsheets/d/1Muz2ftyDUUewMTZPxYxeXF-uj6lBKYP-O3-IvdtHhCo/edit?ts=5e95f280#gid=0&amp;amp;range=G:G&quot;&gt;Column G in this spreadsheet&lt;/a&gt; giving a number higher than 4, when &lt;a href=&quot;https://docs.google.com/spreadsheets/d/1Muz2ftyDUUewMTZPxYxeXF-uj6lBKYP-O3-IvdtHhCo/edit?ts=5e95f280#gid=0&amp;amp;range=E2&quot;&gt;E2&lt;/a&gt; is set to 2.&lt;/s&gt;'><sup>4</sup></a></span>



<p>We found two apparent examples of faster takeoffs, so defined:</p>



<ul class="wp-block-list"><li>Between 4,000 BC and 3,000 BC, GWP doubled in 1,000 years, yet it had never before doubled in as few as 4000 years</li><li>Between 10,000 BC and 4,000 BC, GWP doubled in 6,000 years, yet there is no record of it doubling earlier in as few as 24,000 years. The records at that point are fairly sparse, so this is less clear, but it seems unlikely that there was a doubling in 24,000 years.<span id='easy-footnote-5-2406' class='easy-footnote-margin-adjust'></span><span class='easy-footnote'><a href='http://aiimpacts.org/precedents-for-economic-n-year-doubling-before-4n-year-doubling/#easy-footnote-bottom-5-2406' title='Toward the end of the period it took 15,000 years to grow by $0.6Bn, and growth of $1.8Bn would have been needed for a doubling. So assuming linear growth at the end-of-period rate, this would have taken around 45,000 years, whereas in if growth was speeding up, it should have taken longer.'><sup>5</sup></a></span> This appears to coincide with the beginning of agriculture, in around 9000BC.<span id='easy-footnote-6-2406' class='easy-footnote-margin-adjust'></span><span class='easy-footnote'><a href='http://aiimpacts.org/precedents-for-economic-n-year-doubling-before-4n-year-doubling/#easy-footnote-bottom-6-2406' title='   Khan Academy. “The Dawn of Agriculture (Article).” Accessed April 14, 2020. &lt;a href=&quot;https://www.khanacademy.org/humanities/world-history/world-history-beginnings/birth-agriculture-neolithic-revolution/a/where-did-agriculture-come-from&quot;&gt;https://www.khanacademy.org/humanities/world-history/world-history-beginnings/birth-agriculture-neolithic-revolution/a/where-did-agriculture-come-from&lt;/a&gt;.    '><sup>6</sup></a></span></li></ul>



<p>The 300 year period immediately after 1300 saw a doubling of GWP growth, and the 1200 years beforehand did not see a doubling, however there was an earlier doubling within the 1200 years ending at 1200AD. So this is not technically an instance, but was a case of briefly accelerating growth. GWP between 1100 and 1300 actually declined though, so this is perhaps a different kind of case to the ones we are interested in.</p>



<p><em>Corresponding author: Daniel Kokotajlo</em></p>



<h2 class="wp-block-heading">Notes</h2>



<p></p>
]]></content:encoded>
					
		
		
			</item>
		<item>
		<title>Resolutions of mathematical conjectures over time</title>
		<link>http://aiimpacts.org/resolutions-of-mathematical-conjectures-over-time/</link>
		
		<dc:creator><![CDATA[Asya Bergal]]></dc:creator>
		<pubDate>Tue, 14 Apr 2020 20:38:13 +0000</pubDate>
				<category><![CDATA[AI Timelines]]></category>
		<category><![CDATA[Algorithmic Progress]]></category>
		<category><![CDATA[Featured Articles]]></category>
		<category><![CDATA[front]]></category>
		<category><![CDATA[Pages]]></category>
		<guid isPermaLink="false">http://aiimpacts.org/?p=2409</guid>

					<description><![CDATA[The time-to-proof for past mathematical problems currently remembered as notable is exponentially distributed with a half life of around 100 years.  <a class="mh-excerpt-more" href="http://aiimpacts.org/resolutions-of-mathematical-conjectures-over-time/" title="Resolutions of mathematical conjectures over time"></a>]]></description>
										<content:encoded><![CDATA[
<p>Conditioned on being remembered as a notable conjecture, the time-to-proof for a mathematical problem appears to be exponentially distributed with a half-life of about 100 years. However, these observations are likely to be distorted by various biases.</p>



<h2 class="wp-block-heading">Support</h2>



<p>In 2014, we found conjectures referenced on Wikipedia, and recorded the dates that they were proposed and resolved, if they were resolved. We updated this list of conjectures in 2020, marking any whose status had changed. We then used a Kaplan-Meier estimator<span id='easy-footnote-1-2409' class='easy-footnote-margin-adjust'></span><span class='easy-footnote'><a href='http://aiimpacts.org/resolutions-of-mathematical-conjectures-over-time/#easy-footnote-bottom-1-2409' title='“Kaplan–Meier Estimator.” Wikipedia. Wikimedia Foundation, April 1, 2020. &lt;a href=&quot;https://en.wikipedia.org/w/index.php?title=Kaplan–Meier_estimator&amp;amp;oldid=948523181&quot;&gt;https://en.wikipedia.org/w/index.php?title=Kaplan–Meier_estimator&amp;amp;oldid=948523181&lt;/a&gt;.'><sup>1</sup></a></span> to approximate the survivorship function.<span id='easy-footnote-2-2409' class='easy-footnote-margin-adjust'></span><span class='easy-footnote'><a href='http://aiimpacts.org/resolutions-of-mathematical-conjectures-over-time/#easy-footnote-bottom-2-2409' title='“Survival Function.” Wikipedia. Wikimedia Foundation, October 8, 2019. &lt;a href=&quot;https://en.wikipedia.org/w/index.php?title=Survival_function&amp;amp;oldid=920310453&quot;&gt;https://en.wikipedia.org/w/index.php?title=Survival_function&amp;amp;oldid=920310453&lt;/a&gt;.'><sup>2</sup></a></span>



<p>The results of this exercise are recorded <a href="https://docs.google.com/spreadsheets/d/119VtkbzNWGdAhGsDx-IjYODkwCz0GHRNKDkipRGqjf8/edit?usp=sharing">here</a>.<span id='easy-footnote-3-2409' class='easy-footnote-margin-adjust'></span><span class='easy-footnote'><a href='http://aiimpacts.org/resolutions-of-mathematical-conjectures-over-time/#easy-footnote-bottom-3-2409' title='The ‘Data’ tab of &lt;a href=&quot;https://docs.google.com/spreadsheets/d/119VtkbzNWGdAhGsDx-IjYODkwCz0GHRNKDkipRGqjf8/edit?usp=sharing&quot;&gt;this spreadsheet&lt;/a&gt; contains the list of conjectures we used and their sources. The ‘Kaplan Meier’ tab contains the calculation of the survival function. &lt;a href=&quot;https://docs.google.com/spreadsheets/d/119VtkbzNWGdAhGsDx-IjYODkwCz0GHRNKDkipRGqjf8/edit#gid=380799079&amp;amp;range=K2&quot;&gt;The cell next to the cell marked ‘Exponential trendline’&lt;/a&gt; contains our calculation for the exponential function fitting our Kaplan-Meier estimator.'><sup>3</sup></a></span> Figure 1 below shows the survivorship function for the mathematical conjectures we found. The data is fit closely by an exponential function with a half-life of 117 years.<span id='easy-footnote-4-2409' class='easy-footnote-margin-adjust'></span><span class='easy-footnote'><a href='http://aiimpacts.org/resolutions-of-mathematical-conjectures-over-time/#easy-footnote-bottom-4-2409' title='When fitting our exponential, we did not count the last point at 750 years, because it had a y-value of 0, which the Google Sheets LOGEST function would not accept when generating a best-fit curve. Nonetheless, Figure 1 suggests that the last point seems to fit our exponential reasonably well.'><sup>4</sup></a></span>



<figure class="wp-block-image size-large is-resized"><img loading="lazy" decoding="async" src="https://aiimpacts.org/wp-content/uploads/2020/04/image-1024x633.png" alt="" class="wp-image-2410" width="580" height="358" srcset="http://aiimpacts.org/wp-content/uploads/2020/04/image-1024x633.png 1024w, http://aiimpacts.org/wp-content/uploads/2020/04/image-300x186.png 300w, http://aiimpacts.org/wp-content/uploads/2020/04/image-768x475.png 768w, http://aiimpacts.org/wp-content/uploads/2020/04/image-1536x950.png 1536w, http://aiimpacts.org/wp-content/uploads/2020/04/image.png 1646w" sizes="auto, (max-width: 580px) 100vw, 580px" /><figcaption>Figure 1: Survivorship function of mathematical conjectures over time, also known as the fraction of mathematical conjectures unresolved at time t after being posed.</figcaption></figure>



<h2 class="wp-block-heading">Biases</h2>



<p>We are using resolution times for remembered conjectures as a proxy for resolution times for all conjectures. Resolution time for remembered conjectures might be biased in several ways: old conjectures are perhaps more likely to be remembered if they are solved than if they are not, very recently solved conjectures are probably more likely to be remembered (though this only matters because the rate of conjecture posing has probably changed over time), and conjectures that were especially hard to solve might also be more notable. The latter hundred years contains few data points, which makes it particularly easy for it to be inaccurate.</p>



<h2 class="wp-block-heading">Relevance</h2>



<p>To the extent that open theoretical problems in AI are similar to math problems, time to solve math problems may be informative for forming a prior on time to solve AI problems.</p>



<p><em>Corresponding author: Asya Bergal</em></p>



<h2 class="wp-block-heading">Notes</h2>
]]></content:encoded>
					
		
		
			</item>
		<item>
		<title>Preliminary survey of prescient actions</title>
		<link>http://aiimpacts.org/survey-of-prescient-actions/</link>
		
		<dc:creator><![CDATA[richardkorzekwa]]></dc:creator>
		<pubDate>Sat, 04 Apr 2020 00:15:54 +0000</pubDate>
				<category><![CDATA[Featured Articles]]></category>
		<category><![CDATA[Reference]]></category>
		<category><![CDATA[Pages]]></category>
		<guid isPermaLink="false">http://aiimpacts.org/?p=2362</guid>

					<description><![CDATA[In a short search, we did not find clear examples of 'prescient actions'—specific efforts to address severe and complex problems decades ahead of time and in the absence of broader scientific concern, experience with analogous problems... <a class="mh-excerpt-more" href="http://aiimpacts.org/survey-of-prescient-actions/" title="Preliminary survey of prescient actions"></a>]]></description>
										<content:encoded><![CDATA[
<p><em><span class="has-inline-color has-cyan-bluish-gray-color">Published 3 April 2020</span></em></p>



<p> In a 10-20 hour exploration, we did not find clear examples of &#8216;prescient actions&#8217;—specific efforts to address severe and complex problems decades ahead of time and in the absence of broader scientific concern, experience with analogous problems, or feedback on the success of the effort—though we found six cases that may turn out to be examples on further investigation. </p>



<h2 class="wp-block-heading">Details</h2>



<p> We briefly investigated 20 leads on historical cases of actions taken to eliminate or mitigate a problem a decade or more in advance, evaluating them for their ‘prescience’. None were clearly as prescient as the <a href="https://intelligence.org/files/SzilardNuclearWeapons.pdf">actions of Leó Szilárd</a>, which were previously the best examples of such actions that we found. The primary ways in which these actions failed to exhibit prescience were the amount of feedback that was available while developing a solution and the number of years in advance of the threat that the action was taken. Although we are uncertain about most of the cases, we believe that six of them are promising for future investigation. </p>



<h2 class="wp-block-heading">Background</h2>



<p> Current efforts to prepare for the impacts of artificial intelligence have several features that could make them unlikely to succeed. They typically require us to make complex predictions about novel threats over a timescale of decades, and many of these efforts will receive little feedback on whether they are on the right track, receive little input from the larger scientific community, and produce results that are not useful outside the problem of mitigating AI risk.</p>



<p>It may be useful to search for past cases of preparations that have similar features. It is important to know if humanity has failed to solve problems in advance because the attempts to do so have failed or because solutions were not attempted. If we find failed attempts, we want to know why they failed.&nbsp; For example, if it turns out that most previous actions were not successful because of failure to accurately predict the future, we may want to focus more of our efforts on forecasting. To this end, we use the following set of criteria for evaluating past efforts for their ‘prescience’, or the extent to which they represent early actions to mitigate a risk in absence of feedback:<span id='easy-footnote-1-2362' class='easy-footnote-margin-adjust'></span><span class='easy-footnote'><a href='http://aiimpacts.org/survey-of-prescient-actions/#easy-footnote-bottom-1-2362' title='Originally proposed by &lt;a href=&quot;https://docs.google.com/document/d/1oD0Ti9WiET3mTKBfowxWJaosV1OdnBl5jK8DYb-bWmc/edit?usp=sharing&quot;&gt;Alexander Berger in 2015&lt;/a&gt;.'><sup>1</sup></a></span>



<ul class="wp-block-list">
<li><strong>Years in Advance: </strong>How many years in advance of the expected emergence of the threat was the action taken?</li>



<li><strong>Novelty:</strong> Was the threat novel, or can we re-use (perhaps with modification) the solution to past threats?</li>



<li><strong>Scientific Concern:</strong> Was the effort to address the threat endorsed by the larger scientific community?</li>



<li><strong>Complex Prediction:</strong> Did the solution require a complex prediction, or is the solution clear and closely related to the problem? </li>



<li><strong>Specificity: </strong>Was the solution specific to the threat or is it something that is broadly useful and may be done anyway?</li>



<li><strong>Feedback:</strong> Was feedback available while developing a solution, so that we can make mistakes and learn from them, or will we need to get it right on the first try?</li>



<li><strong>Severity:</strong> Was it a severe threat of global importance?</li>
</ul>



<p> In addition to these criteria, we took note of whether the outcome of the efforts is known, as cases with a known outcome may be more informative and more fruitful for further investigation. </p>



<h2 class="wp-block-heading">Methodology</h2>



<p> Potential cases of interest were found by searching the Internet, asking our friends and colleagues, and offering a bounty on promising leads. We compiled a list of topics to research that were sufficiently narrow to allow for evaluation over a short period of time. This list included individual people that took actions (like Clair Patterson), specific actions that were taken (e.g. the installation of the Moscow-Washington Hotline), and the threats themselves (such as the destruction of infrastructure by a geomagnetic storm). </p>



<p> One researcher spent approximately 30 minutes reviewing each case, and rated them on a scale of 0 to 10 on the criteria described in the previous section.<span id='easy-footnote-2-2362' class='easy-footnote-margin-adjust'></span><span class='easy-footnote'><a href='http://aiimpacts.org/survey-of-prescient-actions/#easy-footnote-bottom-2-2362' title='All of the ratings were assigned by Rick Korzekwa'><sup>2</sup></a></span> A score of 1 indicates the criterion described the case very poorly, while a score of 10 indicates the case demonstrated the criterion extremely well. These ratings were highly subjective, though we made efforts to evaluate the cases in a way that is consistent and which would avoid too many false negatives.<span id='easy-footnote-3-2362' class='easy-footnote-margin-adjust'></span><span class='easy-footnote'><a href='http://aiimpacts.org/survey-of-prescient-actions/#easy-footnote-bottom-3-2362' title='For example, efforts to reduce the risks of geomagnetic storms and antibiotic resistance both involve some actions that are high in specificity and others that are low in specificity. We evaluated both cases on the most specific-to-the-problem actions that we are aware of.'><sup>3</sup></a></span> A composite score was calculated from these by taking a weighted average with the following weights:<span id='easy-footnote-4-2362' class='easy-footnote-margin-adjust'></span><span class='easy-footnote'><a href='http://aiimpacts.org/survey-of-prescient-actions/#easy-footnote-bottom-4-2362' title=' Because we were highly uncertain about our scores given only a half hour of research per case, we assigned scores for our best guess, or ‘median guess’ score, as well as 10th and 90th percentile estimates for each criterion for each case. These should be interpreted as the range of scores which we expect we would arrive at given several hours of investigation, with 80% credence, and equal likelihood of having over- or underestimated the score. We calculated 10th and 90th percentile estimates of the average by modeling the high and low estimates as uncorrelated deviations from the mean, so that they could be added in the usual way for propagating uncorrelated errors.'><sup>4</sup></a></span>



<figure class="wp-block-table"><table><tbody><tr><td><strong>Criterion</strong></td><td><strong>Weight</strong></td></tr><tr><td>Number of years in advance<span id='easy-footnote-5-2362' class='easy-footnote-margin-adjust'></span><span class='easy-footnote'><a href='http://aiimpacts.org/survey-of-prescient-actions/#easy-footnote-bottom-5-2362' title=' This score was calculated directly from the estimated number of years by a root logistic function with values 2.75, 7.1, and 9.6 for 0, 10, and 20 years, respectively '><sup>5</sup></a></span></td><td>20</td></tr><tr><td>Overall severity of threat</td><td>2</td></tr><tr><td>Novelty of threat/solution</td><td>3</td></tr><tr><td>Overall level of concern from the scientific community at large</td><td>2</td></tr><tr><td>Complexity of prediction required to produce a solution</td><td>5</td></tr><tr><td>Specificity of solution</td><td>2</td></tr><tr><td>Level of feedback available while developing a solution</td><td>10</td></tr></tbody></table></figure>



<p> In addition to these ratings, we rated each one for how promising it was for further research, and annotated the ratings in the spreadsheet as seemed appropriate. We also assigned ratings to two cases that were previously the subject of in-depth investigations, for comparison. These were the <a href="https://intelligence.org/files/TheAsilomarConference.pdf">Asilomar Conference</a> and <a href="https://intelligence.org/files/SzilardNuclearWeapons.pdf">the actions of Leó Szilárd</a>.</p>



<h2 class="wp-block-heading">Results</h2>



<p> The following table shows our ratings. The two reference cases are in italics. Our full spreadsheet of ratings and notes can be found <a href="https://docs.google.com/spreadsheets/d/12mMQFjgWPjE6agOxD8jDceNTKBzIeplkjrpyTdcMa48/edit?usp=sharing">here</a>.</p>



<figure class="wp-block-table"><table><tbody><tr><td><strong>Case</strong></td><td><strong>Score</strong></td><td><strong>Suitability for Further Research</strong></td></tr><tr><td><em>Leo Szilard</em></td><td><em>7.24</em></td><td></td></tr><tr><td>Antibiotic resistance</td><td>7.11</td><td>7</td></tr><tr><td>Open Quantum Safe</td><td>6.80</td><td>5</td></tr><tr><td>Nordic Gene Bank</td><td>6.74</td><td>4</td></tr><tr><td>Geomagnetic Storm Prep</td><td>6.74</td><td>5</td></tr><tr><td>Fukushima Daichii</td><td>6.74</td><td>5</td></tr><tr><td>Swiss Redoubt</td><td>6.60</td><td>2</td></tr><tr><td>Nonproliferation Treaty</td><td>6.14</td><td>6</td></tr><tr><td>Cavendish Banana and TR4</td><td>6.12</td><td>5</td></tr><tr><td>WIPP</td><td>6.02</td><td>4</td></tr><tr><td>Population Bomb</td><td>5.99</td><td>3</td></tr><tr><td>Y2k</td><td>5.76</td><td>4</td></tr><tr><td><em>Asilomar Conference</em></td><td><em>5.70</em></td><td></td></tr><tr><td>Cold War Civil Defense</td><td>5.29</td><td>3</td></tr><tr><td>Religious Apocalypse</td><td>4.88</td><td>2</td></tr><tr><td>Hurricane Katrina</td><td>4.18</td><td>4</td></tr><tr><td>Iran Nuclear Deal</td><td>4.18</td><td>4</td></tr><tr><td>Moscow-Washington Hotline</td><td>3.90</td><td>3</td></tr><tr><td>England 1800s Policy Reform</td><td>3.89</td><td>2</td></tr><tr><td>Clair Patterson</td><td>3.74</td><td>2</td></tr><tr><td>Missile gap</td><td>3.22</td><td>2</td></tr><tr><td>PQCrypto Conference 2006</td><td><br></td><td>4</td></tr></tbody></table></figure>



<p> For one case, the PQCrypto 2006 conference, we were unable to find sufficient information after 45 minutes of investigation to provide an evaluation.</p>



<p>In general the cases we investigated did not score highly on these criteria. The average score was 5.6 out of 10, with the US-Russia missile gap receiving the minimum score of 3.0 and antibiotic resistance receiving the maximum score of 7.11. None of the cases received a higher score than our reference case, the actions of Leó Szilárd (score = 7.24), which we consider to be sufficiently ‘prescient’ to be worth examining. Just over half (11) of our cases received higher ratings than the Asilomar Conference (rating = 5.6), which was previously judged to be less prescient.</p>



<p>The ratings are highly uncertain, as is natural for thirty minute reviews of complex topics.&nbsp; On average, our 90th percentile estimates were 80% larger than their corresponding 10th percentile estimate. All but four cases had minimum ratings lower than the best guess for Asilomar, and more than half had maximum ratings higher than the best guess for Leó Szilárd.</p>



<p>The axes on which the cases were least prescient were feedback and years in advance.<span id='easy-footnote-6-2362' class='easy-footnote-margin-adjust'></span><span class='easy-footnote'><a href='http://aiimpacts.org/survey-of-prescient-actions/#easy-footnote-bottom-6-2362' title='On average, the cases lost 1.35 points from their composite score on each of these criteria. This is partly due to the large weight assigned to these criteria. If we used an unweighted average to compute the scores, cases would lose .77 points for feedback and .39 for years in advance, with years in advance being the axis with the highest average score.'><sup>6</sup></a></span> The cases were most analogous on severity, novelty, and specificity of solution, losing on average .20, .30, and .20 points from their composite scores, respectively.</p>



<p>Two cases, antibiotic resistance and the Treaty on the Non-Proliferation of Nuclear Weapons, seemed particularly promising for additional research, and received scores of 7 and 6 accordingly. Five other cases received scores of at least five and seemed less promising, but likely worth some additional research.</p>



<h2 class="wp-block-heading">Discussion</h2>



<p> Although the very short research time allotted to each case limits our ability to confidently draw conclusions, we ruled out some cases which were clearly not prescient, identified some promising cases, and roughly characterized some ways in which efforts to reduce AI risk may be different from past efforts to reduce risks.</p>



<h3 class="wp-block-heading">Irrelevant Cases</h3>



<p> There were four cases that we found to be poor examples of prescient actions: The<strong> US-Russia Missile Gap</strong> of the late 1950’s, the actions of <strong>Clair Patterson</strong> to combat the use of leaded gasoline, <strong>19th century policy reforms in England</strong> that were made in response to the industrial revolution, and the <strong>Moscow-US Nuclear Hotline</strong>. All of these cases involved actions that were taken in response to, rather than in anticipation of, the emergence of a problem (or perceived problem), and for which the solutions were relatively straightforward, with the primary barriers being political.<span id='easy-footnote-7-2362' class='easy-footnote-margin-adjust'></span><span class='easy-footnote'><a href='http://aiimpacts.org/survey-of-prescient-actions/#easy-footnote-bottom-7-2362' title='Clair Patterson made some impressive inferences about the present state of the world, and seemed to believe that the problems he was observing would continue to get worse without intervention. In this respect, his actions were prescient. But in general, he was working to prevent a present problem from becoming worse, rather than working to avoid a future problem.'><sup>7</sup></a></span>



<h3 class="wp-block-heading">Questionable Cases</h3>



<p> Two cases involved actions based on highly dubious predictions: Preparations for a <strong>religious apocalypse</strong><span id='easy-footnote-8-2362' class='easy-footnote-margin-adjust'></span><span class='easy-footnote'><a href='http://aiimpacts.org/survey-of-prescient-actions/#easy-footnote-bottom-8-2362' title=' Preparations for religious apocalypse is a broad category. We attempted to find examples in this category that fell within our target reference class, but we were generally unable to find examples that involved specific actions taken more than a few years in advance. We are not highly confident that there do not exist examples that meet these criteria.'><sup>8</sup></a></span> and the book <strong><em>The Population Bomb</em></strong> and the accompanying actions of author Paul Erhlich. Although the actors in these cases were acting on predictions that have since been shown to be inaccurate, the cases do have some similarity to AI risk. They were addressing predictions of severe consequences from novel threats, they were acting without help from the scientific community, and they did not expect to receive a great deal of feedback along the way. However, the actions were only taken 5-10 years in advance of the threat, and we expect the apparent disconnect between the forecasts and reality to make it more difficult to learn from the actions.</p>



<p>Some cases involved threats that had already emerged, in the sense that they could happen immediately, but had sufficiently low per-year risk for a reasonable person to expect the outcome to be at least a decade in the future. These include&nbsp; <strong>Hurricane Katrina</strong>, <strong>US civil defense during the cold war</strong>, <strong>Fukushima Daichii</strong>, the comparison case <strong>Asilomar Conference</strong>, and the <strong>Nordic Gene Bank</strong>.<span id='easy-footnote-9-2362' class='easy-footnote-margin-adjust'></span><span class='easy-footnote'><a href='http://aiimpacts.org/survey-of-prescient-actions/#easy-footnote-bottom-9-2362' title='The Nordic Gene Bank addresses a low per-year risk, so that it seems reasonable to consider it to be addressing a future risk. However, the first withdrawal from the seed vault happened relatively quickly, suggesting that either the risk is near term or that the solution is not highly specific to long term risks.'><sup>9</sup></a></span> <span id='easy-footnote-10-2362' class='easy-footnote-margin-adjust'></span><span class='easy-footnote'><a href='http://aiimpacts.org/survey-of-prescient-actions/#easy-footnote-bottom-10-2362' title=' Although geomagnetic storm preparation has a similar quality, it seems that the per-year risk of a catastrophic outcome is low enough, and the preparations for such severe outcomes is specific enough that it qualifies as a promising case, as described in the next section.'><sup>10</sup></a></span>



<p>Other cases involved solutions that were easy or not dependent on complex forecasting. The <strong>Swiss National Redoubt</strong> relied on long-range forecasting, but was more of a large investment in defense than a complex search for a solution. The <strong>year 2000 problem</strong> was easy to address, even without taking action until relatively shortly before the event took place. The <strong>Iran Nuclear Deal</strong> (and perhaps also the <strong>Nuclear Non-Proliferation Treaty</strong>) required difficult political negotiations, but did not appear to rely on complex predictions.</p>



<h3 class="wp-block-heading">Promising Cases</h3>



<p> We identified six cases that seem promising for further investigation:</p>



<p><strong>Alexander Fleming</strong> warned, in his 1945 Nobel Lecture, that widespread access to antibiotics without supervision may lead to antibiotic resistance.<span id='easy-footnote-11-2362' class='easy-footnote-margin-adjust'></span><span class='easy-footnote'><a href='http://aiimpacts.org/survey-of-prescient-actions/#easy-footnote-bottom-11-2362' title='“The time may come when penicillin can be bought by anyone in the shops. Then there is the danger that the ignorant man may easily underdose himself and by exposing his microbes to non-lethal quantities of the drugmake them resistant.” “Wayback Machine,” March 31, 2018.&lt;a href=&quot;https://web.archive.org/web/20180331001640/https://www.nobelprize.org/nobel_prizes/medicine/laureates/1945/fleming-lecture.pdf&quot;&gt; https://web.archive.org/web/20180331001640/https://www.nobelprize.org/nobel_prizes/medicine/laureates/1945/fleming-lecture.pdf&lt;/a&gt;.'><sup>11</sup></a></span> We are uncertain of the impact of Fleming’s warning, whether he took additional action to mitigate the risk, or how widespread within the scientific community such concerns were, but our impression is that it was not a widely known issue, that his was an early warning, and that his judgement was generally taken seriously by the time of his speech. His warning preceded the first documented cases of penicillin-resistant bacteria by more than 20 years, and the threat of antimicrobial resistance  seems to be broadly analogous with AI risk on most of our criteria, though it does seem that feedback was available throughout efforts to reduce the threat. </p>



<p><strong>The Treaty on the Non-Proliferation of Nuclear Weapons</strong> required many actions from many actors, but it seems to have required a complex prediction about technological development and geopolitics to address a severe threat, was specific to a particular threat, and had limited opportunities for feedback. We are uncertain if any of the specific actions will prove to be prescient on further investigation, but it seems promising.<br></p>



<p><strong>Open Quantum Safe</strong> is an open-source project to develop cryptographic techniques that are resistant to the use of quantum computers. The threat of quantum computing to cryptography has several relevant features, including complex forecasting over a decades-time scale of a novel threat. We found limited information on the circumstances surrounding the founding of the project or the related case, the <strong>2006 PQCrypto Conference</strong>, but the problem generally seems prescient.</p>



<p><strong>Geomagnetic Storm Preparation</strong> addresses the threat caused by severe damage and disruption by solar weather to electronics and power infrastructure, which could be a severe global catastrophe.<span id='easy-footnote-12-2362' class='easy-footnote-margin-adjust'></span><span class='easy-footnote'><a href='http://aiimpacts.org/survey-of-prescient-actions/#easy-footnote-bottom-12-2362' title=' See, for example https://allfed.info/industrial-civilisation/'><sup>12</sup></a></span> The expected time between such events is decades or centuries, and mitigating the risk involves actions that may be specific to the particular problem and requires complex predictions about the physics involved and how our infrastructure and institutions would be able to respond. However, we are uncertain about which actions were taken and when, and whether there is evidence that they are working. Additionally, there is substantial investment from the scientific community and we are uncertain how much feedback is available while developing solutions.</p>



<p><strong>Panama Disease</strong> is a fungal infection that has been spreading globally for decades and threatens the viability of the cavendish banana as a commercial crop. Cavendish bananas account for the vast majority of banana exports, and are integral to the food security of countries such as Costa Rica and Guatemala.<span id='easy-footnote-13-2362' class='easy-footnote-margin-adjust'></span><span class='easy-footnote'><a href='http://aiimpacts.org/survey-of-prescient-actions/#easy-footnote-bottom-13-2362' title='“export revenue from bananas covered 40 percent of Costa Rica’s food import bill and 27 percent of Guatemala’s in 2014” “EST: Banana Facts.” Accessed February 6, 2020.&lt;a href=&quot;https://www.fao.org/economic/est/est-commodities/oilcrops/bananas/bananafacts/en/#.ZEcCbuzMJAc&quot;&gt; http://www.fao.org/economic/est/est-commodities/bananas/bananafacts/en/#.XjyilyOIYuV.&lt;/a&gt; '><sup>13</sup></a></span> Early action included measures to slow the spread of the fungus, a search for cultivars to replace the Cavendish, calls for greater diversity in banana varietals, and searches for fungicides that are able to kill the fungus. Although these actions have many opportunities for feedback, some of them involve complex predictions and searches for specific technical solutions, and, from the perspective of farmers on continents that have not yet encountered the infection, the arrival of the fungus represents a discrete event at some undetermined time in the future. We are uncertain if these are good examples of prescient actions, but they may be worth additional investigation. </p>



<h3 class="wp-block-heading">Presence of Feedback</h3>



<p> The axis on which our cases most differed from efforts to reduce AI risk was the level of feedback available while developing a solution. The average score on feedback was 3.8, and none of the cases received a score higher than 7. Even cases that initially seemed that they would have very little feedback proved to have enough to aid those that were making preparations. Examples include Hurricane Katrina, which benefited from lessons learned from preceding hurricanes, and the National Redoubt of Switzerland, which benefited from the observation of conflicts between other actors, providing information about which military equipment and tactics were viable against likely adversaries. Assuming that these results are representative, here are two ways to interpret these results:</p>



<p><strong>Feedback is abundant: </strong>Feedback is abundant in a wide variety of situations, so that we should also expect to have opportunities for feedback while preparing for advanced artificial intelligence. In support of this view are the cases mentioned above that were initially expected to lack feedback, even on the part of those making preparations, but which nonetheless benefited from feedback.<br></p>



<p><strong>AI risk is unusual: </strong>The common perception that there is very little feedback available to efforts to reduce the risks of advanced AI is correct, and AI risk is unique (or very rare) in this regard. Support for this view comes from <a href="https://intelligence.org/2018/10/03/rocket-alignment/">arguments</a> for the one-shot nature of solving the AI control problem.<span id='easy-footnote-14-2362' class='easy-footnote-margin-adjust'></span><span class='easy-footnote'><a href='http://aiimpacts.org/survey-of-prescient-actions/#easy-footnote-bottom-14-2362' title='For instance, Eliezer Yudkowsky obliquely argues this in &lt;em&gt;The Rocket Alignment Problem&lt;/em&gt;.  “The Rocket Alignment Problem &amp;#8211; Machine Intelligence Research Institute.” Accessed March 26, 2020.&lt;a href=&quot;https://intelligence.org/2018/10/03/rocket-alignment/&quot;&gt; https://intelligence.org/2018/10/03/rocket-alignment/&lt;/a&gt;.'><sup>14</sup></a></span>



<p><em>Primary author: Rick Korzekwa</em></p>



<h2 class="wp-block-heading">Notes</h2>
]]></content:encoded>
					
		
		
			</item>
		<item>
		<title>2019 recent trends in GPU price per FLOPS</title>
		<link>http://aiimpacts.org/2019-recent-trends-in-gpu-price-per-flops/</link>
					<comments>http://aiimpacts.org/2019-recent-trends-in-gpu-price-per-flops/#comments</comments>
		
		<dc:creator><![CDATA[Asya Bergal]]></dc:creator>
		<pubDate>Wed, 25 Mar 2020 23:46:49 +0000</pubDate>
				<category><![CDATA[AI Timelines]]></category>
		<category><![CDATA[Featured Articles]]></category>
		<category><![CDATA[Hardware and AI Timelines]]></category>
		<category><![CDATA[Hardware progress]]></category>
		<category><![CDATA[front]]></category>
		<category><![CDATA[major investigation]]></category>
		<category><![CDATA[Pages]]></category>
		<guid isPermaLink="false">http://aiimpacts.org/?p=2316</guid>

					<description><![CDATA[Published 25 March, 2020 We estimate that in recent years, GPU prices have fallen at rates that would yield an order of magnitude over roughly: Details GPUs (graphics processing units) are specialized electronic circuits originally <a class="mh-excerpt-more" href="http://aiimpacts.org/2019-recent-trends-in-gpu-price-per-flops/" title="2019 recent trends in GPU price per FLOPS"></a>]]></description>
										<content:encoded><![CDATA[
<p class="has-text-color" style="color:#707070"><em>Published 25 March, 2020</em></p>



<p>We estimate that in recent years, GPU prices have fallen at rates that would yield an order of magnitude over roughly:</p>



<ul class="wp-block-list">
<li>17 years for single-precision FLOPS</li>



<li>10 years for half-precision FLOPS</li>



<li>5 years for half-precision fused multiply-add FLOPS</li>
</ul>



<h1 class="wp-block-heading">Details</h1>



<p>GPUs (graphics processing units) are specialized electronic circuits originally used for computer graphics.<span id='easy-footnote-1-2316' class='easy-footnote-margin-adjust'></span><span class='easy-footnote'><a href='http://aiimpacts.org/2019-recent-trends-in-gpu-price-per-flops/#easy-footnote-bottom-1-2316' title='“A graphics processing unit (GPU) is a specialized electronic circuit designed to rapidly manipulate and alter memory to accelerate the creation of images in a frame buffer intended for output to a display device … Modern GPUs are very efficient at manipulating computer graphics and image processing … The term was popularized by Nvidia in 1999, who marketed the GeForce 256 as &amp;#8220;the world&amp;#8217;s first GPU&amp;#8221;. It was presented as a &amp;#8220;single-chip processor with integrated transform, lighting, triangle setup/clipping, and rendering engines&amp;#8221;.”&lt;br&gt;“Graphics Processing Unit.” Wikipedia. Wikimedia Foundation, March 24, 2020. &lt;a href=&quot;https://en.wikipedia.org/w/index.php?title=Graphics_processing_unit&amp;amp;oldid=947270104&quot;&gt;https://en.wikipedia.org/w/index.php?title=Graphics_processing_unit&amp;amp;oldid=947270104&lt;/a&gt;.'><sup>1</sup></a></span> In recent years, they have been popularly used for machine learning applications.<span id='easy-footnote-2-2316' class='easy-footnote-margin-adjust'></span><span class='easy-footnote'><a href='http://aiimpacts.org/2019-recent-trends-in-gpu-price-per-flops/#easy-footnote-bottom-2-2316' title='Fraenkel, Bernard. “Council Post: For Machine Learning, It&amp;#8217;s All About GPUs.” Forbes. Forbes Magazine, December 8, 2017. &lt;a href=&quot;https://www.forbes.com/sites/forbestechcouncil/2017/12/01/for-machine-learning-its-all-about-gpus/#5ed90c227699&quot;&gt;https://www.forbes.com/sites/forbestechcouncil/2017/12/01/for-machine-learning-its-all-about-gpus/#5ed90c227699&lt;/a&gt;.'><sup>2</sup></a></span> One measure of GPU performance is FLOPS, the number of operations on floating-point numbers a GPU can perform in a second.<span id='easy-footnote-3-2316' class='easy-footnote-margin-adjust'></span><span class='easy-footnote'><a href='http://aiimpacts.org/2019-recent-trends-in-gpu-price-per-flops/#easy-footnote-bottom-3-2316' title='“In computing, floating point operations per second (FLOPS, flops or flop/s) is a measure of computer performance, useful in fields of scientific computations that require floating-point calculations. For such cases it is a more accurate measure than measuring instructions per second.”&lt;br&gt;“FLOPS.” Wikipedia. Wikimedia Foundation, March 24, 2020. &lt;a href=&quot;https://en.wikipedia.org/w/index.php?title=FLOPS&amp;amp;oldid=947177339&quot;&gt;https://en.wikipedia.org/w/index.php?title=FLOPS&amp;amp;oldid=947177339&lt;/a&gt;'><sup>3</sup></a></span> This page looks at the trends in GPU price / FLOPS of theoretical peak performance over the past 13 years. It does not include the cost of operating the GPUs, and it does not consider GPUs rented through cloud computing.</p>



<h2 class="wp-block-heading">Theoretical peak performance</h2>



<p>‘Theoretical peak performance’ numbers appear to be determined by adding together the theoretical performances of the processing components of the GPU, which are calculated by multiplying the clock speed of the component by the number of instructions it can perform per cycle.<span id='easy-footnote-4-2316' class='easy-footnote-margin-adjust'></span><span class='easy-footnote'><a href='http://aiimpacts.org/2019-recent-trends-in-gpu-price-per-flops/#easy-footnote-bottom-4-2316' title='From this discussion on Nvidia&amp;#8217;s forums about theoretical GFLOPS: “GPU theoretical flops calculation is similar conceptually. It will vary by GPU just as the CPU calculation varies by CPU architecture and model. To use K40m as an example: http://www.nvidia.com/content/PDF/kepler/Tesla-K40-PCIe-Passive-Board-Spec-BD-06902-001_v05.pdf&lt;br&gt;&lt;/p&gt;



&lt;p&gt;there are 15 SMs (2880/192), each with 64 DP ALUs that are capable of retiring one DP FMA instruction per cycle (== 2 DP Flops per cycle).&lt;br&gt;&lt;/p&gt;



&lt;p&gt;15 x 64 x 2 * 745MHz = 1.43 TFlops/sec&lt;br&gt;&lt;/p&gt;



&lt;p&gt;which is the stated perf:&lt;/p&gt;



&lt;p&gt;http://www.nvidia.com/content/tesla/pdf/NVIDIA-Tesla-Kepler-Family-Datasheet.pdf &amp;#8220;&lt;/p&gt;



&lt;p&gt;Person. “Comparing CPU and GPU Theoretical GFLOPS.” NVIDIA Developer Forums, May 21, 2014. &lt;a href=&quot;https://forums.developer.nvidia.com/t/comparing-cpu-and-gpu-theoretical-gflops/33335&quot;&gt;https://forums.developer.nvidia.com/t/comparing-cpu-and-gpu-theoretical-gflops/33335&lt;/a&gt;.'><sup>4</sup></a></span> These numbers are given by the developer and may not reflect actual performance on a given application.<span id='easy-footnote-5-2316' class='easy-footnote-margin-adjust'></span><span class='easy-footnote'><a href='http://aiimpacts.org/2019-recent-trends-in-gpu-price-per-flops/#easy-footnote-bottom-5-2316' title='From this blog post on the performance of TensorCores, a component of new Nvidia GPUs specialized for deep learning: “The problem is it’s totally unclear how to approach the peak performance of 120 TFLOPS, and as far as I know, no one could achieve so significant speedup on real tasks. Let me know if you aware of good cases.&amp;#8221;&lt;br&gt;Sapunov, Grigory. “Hardware for Deep Learning. Part 3: GPU.” Medium. Intento, January 20, 2020. &lt;a href=&quot;https://blog.inten.to/hardware-for-deep-learning-part-3-gpu-8906c1644664&quot;&gt;https://blog.inten.to/hardware-for-deep-learning-part-3-gpu-8906c1644664&lt;/a&gt;.'><sup>5</sup></a></span>



<h2 class="wp-block-heading">Metrics</h2>



<p>We collected data on multiple slightly different measures of GPU price and FLOPS performance.</p>



<h3 class="wp-block-heading">Price metrics</h3>



<p>GPU prices are divided into release prices, which reflect the manufacturer suggested retail prices that GPUs are originally sold at, and active prices, which are the prices at which GPUs are actually sold at over time, often by resellers.</p>



<p>We expect that active prices better represent prices available to hardware users, but collect release prices also, as supporting evidence.</p>



<h3 class="wp-block-heading">FLOPS performance metrics</h3>



<p>Several varieties of ‘FLOPS’ can be distinguished based on the specifics of the operations they involve. Here we are interested in single-precision FLOPS, half-precision FLOPS, and half-precision fused-multiply add FLOPS.</p>



<p>‘Single-precision’ and ‘half-precision’ refer to the number of bits used to specify a floating point number.<span id='easy-footnote-6-2316' class='easy-footnote-margin-adjust'></span><span class='easy-footnote'><a href='http://aiimpacts.org/2019-recent-trends-in-gpu-price-per-flops/#easy-footnote-bottom-6-2316' title='Gupta, Geetika. “Difference Between Single-, Double-, Multi-, Mixed-Precision: NVIDIA Blog.” The Official NVIDIA Blog, November 21, 2019. &lt;a href=&quot;https://blogs.nvidia.com/blog/2019/11/15/whats-the-difference-between-single-double-multi-and-mixed-precision-computing/&quot;&gt;https://blogs.nvidia.com/blog/2019/11/15/whats-the-difference-between-single-double-multi-and-mixed-precision-computing/&lt;/a&gt;.'><sup>6</sup></a></span> Using more bits to specify a number achieves greater precision at the cost of more computational steps per calculation. Our data suggests that GPUs have largely been improving in single-precision performance in recent decades,<span id='easy-footnote-7-2316' class='easy-footnote-margin-adjust'></span><span class='easy-footnote'><a href='http://aiimpacts.org/2019-recent-trends-in-gpu-price-per-flops/#easy-footnote-bottom-7-2316' title='See our &lt;a href=&quot;https://aiimpacts.org/recent-trend-in-the-cost-of-computing/&quot;&gt;2017 analysis&lt;/a&gt;, footnote 4, which notes that single-precision price performance seems to be improving while double-precision price performance is not'><sup>7</sup></a></span> and half-precision performance appears to be increasingly popular because it is adequate for deep learning.<span id='easy-footnote-8-2316' class='easy-footnote-margin-adjust'></span><span class='easy-footnote'><a href='http://aiimpacts.org/2019-recent-trends-in-gpu-price-per-flops/#easy-footnote-bottom-8-2316' title='“With the growing importance of deep learning and energy-saving approximate computing, half precision floating point arithmetic (FP16) is fast gaining popularity. Nvidia&amp;#8217;s recent Pascal architecture was the first GPU that offered FP16 support.”&lt;br&gt;N. Ho and W. Wong, &lt;a href=&quot;https://ieeexplore.ieee.org/abstract/document/8091072&quot;&gt;&amp;#8220;Exploiting half precision arithmetic in Nvidia GPUs,&amp;#8221;&lt;/a&gt; 2017 IEEE High Performance Extreme Computing Conference (HPEC), Waltham, MA, 2017, pp. 1-7.'><sup>8</sup></a></span>



<p>Nvidia, the main provider of chips for machine learning applications,<span id='easy-footnote-9-2316' class='easy-footnote-margin-adjust'></span><span class='easy-footnote'><a href='http://aiimpacts.org/2019-recent-trends-in-gpu-price-per-flops/#easy-footnote-bottom-9-2316' title='“In a recent paper, Google revealed that its TPU can be up to 30x faster than a GPU for inference (the TPU can’t do training of neural networks). As the main provider of chips for machine learning applications, Nvidia took some issue with that, arguing that some of its existing inference chips were already highly competitive to the TPU.”&lt;br&gt;Armasu, Lucian. “On Tensors, Tensorflow, And Nvidia&amp;#8217;s Latest &amp;#8216;Tensor Cores&amp;#8217;.” Tom&amp;#8217;s Hardware. Tom&amp;#8217;s Hardware, May 11, 2017. &lt;a href=&quot;https://www.tomshardware.com/news/nvidia-tensor-core-tesla-v100,34384.html&quot;&gt;https://www.tomshardware.com/news/nvidia-tensor-core-tesla-v100,34384.html&lt;/a&gt;.'><sup>9</sup></a></span> recently released a series of GPUs featuring Tensor Cores,<span id='easy-footnote-10-2316' class='easy-footnote-margin-adjust'></span><span class='easy-footnote'><a href='http://aiimpacts.org/2019-recent-trends-in-gpu-price-per-flops/#easy-footnote-bottom-10-2316' title='“Tensor Cores in NVIDIA Volta GPU Architecture.” NVIDIA. Accessed May 2, 2020. https://www.nvidia.com/en-us/data-center/tensorcore/.&lt;br&gt;'><sup>10</sup></a></span> which claim to deliver “groundbreaking AI performance”. Tensor Core performance is measured in FLOPS, but they perform exclusively certain kinds of floating-point operations known as fused multiply-adds (FMAs).<span id='easy-footnote-11-2316' class='easy-footnote-margin-adjust'></span><span class='easy-footnote'><a href='http://aiimpacts.org/2019-recent-trends-in-gpu-price-per-flops/#easy-footnote-bottom-11-2316' title='&amp;#8220;Volta is equipped with 640 Tensor Cores, each performing 64 floating-point fused-multiply-add (FMA) operations per clock. That delivers up to 125 TFLOPS for training and inference applications.”&lt;br&gt;“Tensor Cores in NVIDIA Volta GPU Architecture.” NVIDIA. Accessed March 25, 2020. &lt;a href=&quot;https://www.nvidia.com/en-us/data-center/tensorcore/&quot;&gt;https://www.nvidia.com/en-us/data-center/tensorcore/&lt;/a&gt;.'><sup>11</sup></a></span> Performance on these operations is important for certain kinds of deep learning performance,<span id='easy-footnote-12-2316' class='easy-footnote-margin-adjust'></span><span class='easy-footnote'><a href='http://aiimpacts.org/2019-recent-trends-in-gpu-price-per-flops/#easy-footnote-bottom-12-2316' title='&amp;#8220;A useful operation in computer linear algebra is multiply-add: calculating the sum of a value c with a product of other values a x b to produce c + a x b. Typically, thousands of such products may be summed in a single accumulator for a model such as ResNet-50, with many millions of independent accumulations when running a model in deployment, and quadrillions of these for training models.”&lt;br&gt;Johnson, Jeff. “Making Floating Point Math Highly Efficient for AI Hardware.” Facebook AI Blog, November 8, 2018. &lt;a href=&quot;https://ai.facebook.com/blog/making-floating-point-math-highly-efficient-for-ai-hardware/&quot;&gt;https://ai.facebook.com/blog/making-floating-point-math-highly-efficient-for-ai-hardware/&lt;/a&gt;.'><sup>12</sup></a></span> so we track ‘GPU price / FMA FLOPS’ as well as ‘GPU price / FLOPS’.<br></p>



<p>In addition to purely half-precision computations, Tensor Cores are capable of performing mixed-precision computations, where part of the computation is done in half-precision and part in single-precision.<span id='easy-footnote-13-2316' class='easy-footnote-margin-adjust'></span><span class='easy-footnote'><a href='http://aiimpacts.org/2019-recent-trends-in-gpu-price-per-flops/#easy-footnote-bottom-13-2316' title='See Figure 2:&lt;br&gt;Gupta, Geetika. “Using Tensor Cores for Mixed-Precision Scientific Computing.” NVIDIA Developer Blog, April 19, 2019. &lt;a href=&quot;https://devblogs.nvidia.com/tensor-cores-mixed-precision-scientific-computing&quot;&gt;https://devblogs.nvidia.com/tensor-cores-mixed-precision-scientific-computing&lt;/a&gt;/.'><sup>13</sup></a></span> Since explicitly mixed-precision-optimized hardware is quite recent, we don’t look at the trend in mixed-precision price performance, and only look at the trend in half-precision price performance.</p>



<h4 class="wp-block-heading">Precision tradeoffs</h4>



<p>Any GPU that performs multiple kinds of computations (single-precision, half-precision, half-precision fused multiply add) trades off performance on one for performance on the other, because there is limited space on the chip, and transistors must be allocated to either one type of computation or the other.<span id='easy-footnote-14-2316' class='easy-footnote-margin-adjust'></span><span class='easy-footnote'><a href='http://aiimpacts.org/2019-recent-trends-in-gpu-price-per-flops/#easy-footnote-bottom-14-2316' title='Three different individuals told us about this constraint, including one Nvidia employee.'><sup>14</sup></a></span> All current GPUs that perform half-precision or TensorCore fused-multiply-add computations also do single-precision computations, so they are splitting their transistor budget. For this reason, our impression is that half-precision FLOPS could be much cheaper now if entire GPUs were allocated to each one alone, rather than split between them.</p>



<h2 class="wp-block-heading">Release date prices</h2>



<p>We collected data on theoretical peak performance (FLOPS), release date, and price from several sources, including Wikipedia.<span id='easy-footnote-15-2316' class='easy-footnote-margin-adjust'></span><span class='easy-footnote'><a href='http://aiimpacts.org/2019-recent-trends-in-gpu-price-per-flops/#easy-footnote-bottom-15-2316' title='See the ‘Source’ column in &lt;a href=&quot;https://docs.google.com/spreadsheets/d/1ZZm5Wgr3BDRtloTZGylWzYTaVr5VqjiwOiRNu5Pz_q8/edit?usp=sharing&quot;&gt;this spreadsheet&lt;/a&gt;, tab ‘GPU Data’. We largely used &lt;a href=&quot;https://www.techpowerup.com/&quot;&gt;TechPowerUp&lt;/a&gt;, Wikipedia’s &lt;a href=&quot;https://en.wikipedia.org/wiki/List_of_Nvidia_graphics_processing_units&quot;&gt;List of Nvidia GPUs&lt;/a&gt;, &lt;a href=&quot;https://en.wikipedia.org/wiki/List_of_AMD_graphics_processing_units&quot;&gt;List of AMD GPUs&lt;/a&gt;, and &lt;a href=&quot;https://docs.google.com/spreadsheets/d/1xAo6TcSgHdd25EdQ-6GqM0VKbTYu8cWyycgJhHRVIgY/edit#gid=0&quot;&gt;this document listing GPU performance&lt;/a&gt;.'><sup>15</sup></a></span> (Data is available in <a href="https://docs.google.com/spreadsheets/d/1ZZm5Wgr3BDRtloTZGylWzYTaVr5VqjiwOiRNu5Pz_q8/edit?usp=sharing">this spreadsheet</a>). We found GPUs by looking at Wikipedia’s existing large lists<span id='easy-footnote-16-2316' class='easy-footnote-margin-adjust'></span><span class='easy-footnote'><a href='http://aiimpacts.org/2019-recent-trends-in-gpu-price-per-flops/#easy-footnote-bottom-16-2316' title='See Wikipedia’s &lt;a href=&quot;https://en.wikipedia.org/wiki/List_of_Nvidia_graphics_processing_units&quot;&gt;List of Nvidia GPUs&lt;/a&gt; and &lt;a href=&quot;https://en.wikipedia.org/wiki/List_of_AMD_graphics_processing_units&quot;&gt;List of AMD GPUs&lt;/a&gt;.'><sup>16</sup></a></span> and by Googling “popular GPUs” and “popular deep learning GPUs”. We included any hardware that was labeled as a ‘GPU’. We adjusted prices for inflation based on the consumer price index.<span id='easy-footnote-17-2316' class='easy-footnote-margin-adjust'></span><span class='easy-footnote'><a href='http://aiimpacts.org/2019-recent-trends-in-gpu-price-per-flops/#easy-footnote-bottom-17-2316' title='“CPI Home.” U.S. Bureau of Labor Statistics. U.S. Bureau of Labor Statistics. Accessed May 2, 2020. https://www.bls.gov/cpi/.'><sup>17</sup></a></span>



<p>We were unable to find price and performance data for many popular GPUs and suspect that we are missing many from our list. In our search, we did not find any GPUs that beat our 2017 minimum of $0.03 (release price) / single-precision GFLOPS. We put out a $20 bounty on a popular Facebook group to find a cheaper GPU / FLOPS, and the bounty went unclaimed, so we are reasonably confident in this minimum.<span id='easy-footnote-18-2316' class='easy-footnote-margin-adjust'></span><span class='easy-footnote'><a href='http://aiimpacts.org/2019-recent-trends-in-gpu-price-per-flops/#easy-footnote-bottom-18-2316' title='The Facebook group is for posting and claiming bounties and has around 750 people, many with interests in computers. The bounty has been up for two months, as of March 13 2020.'><sup>18</sup></a></span>



<h3 class="wp-block-heading">GPU price / single-precision FLOPS</h3>



<p>Figure 1 shows our collected dataset for GPU price / single-precision FLOPS over time.<span id='easy-footnote-19-2316' class='easy-footnote-margin-adjust'></span><span class='easy-footnote'><a href='http://aiimpacts.org/2019-recent-trends-in-gpu-price-per-flops/#easy-footnote-bottom-19-2316' title='See &lt;a href=&quot;https://docs.google.com/spreadsheets/d/1ZZm5Wgr3BDRtloTZGylWzYTaVr5VqjiwOiRNu5Pz_q8/edit?usp=sharing&quot;&gt;this spreadsheet&lt;/a&gt;, tab ‘Cleaned GPU Data for SP’ for the chart generation.'><sup>19</sup></a></span>



<figure class="wp-block-image is-resized"><img loading="lazy" decoding="async" src="https://lh6.googleusercontent.com/eakDTr43veWzFljIgZr2zzJ8--TzbWpIbelaGFeM7clFeQHdhcodTZfBAw5aUxkkJ5lNd3h4g7m8X6AsHwEm_kU-5gZnnESi26Mnf43eMCcD0W8EnpJDrPqBhN9OXT5W7UnQR9om" alt="" width="591" height="364"/><figcaption class="wp-element-caption"><strong>Figure 1: Real GPU price / single-precision FLOPS over time. The vertical axis is log-scale. Price is measured in 2019 dollars.</strong></figcaption></figure>



<p>To find a clear trend for the prices of the cheapest GPUs / FLOPS, we looked at the running minimum prices every 10 days.<span id='easy-footnote-20-2316' class='easy-footnote-margin-adjust'></span><span class='easy-footnote'><a href='http://aiimpacts.org/2019-recent-trends-in-gpu-price-per-flops/#easy-footnote-bottom-20-2316' title='See &lt;a href=&quot;https://docs.google.com/spreadsheets/d/1ZZm5Wgr3BDRtloTZGylWzYTaVr5VqjiwOiRNu5Pz_q8/edit?usp=sharing&quot;&gt;this spreadsheet&lt;/a&gt;, tab ‘Cleaned GPU Data for SP Minimums’ for the plotting. We used &lt;a href=&quot;https://drive.google.com/open?id=1JP98EP8nYA0KqofLm24vF2vwNL0PalcB&quot;&gt;this script&lt;/a&gt; on the data from the ‘Cleaned GPU Data for SP’ to calculate the minimums and then import them into a new sheet of the spreadsheet.'><sup>20</sup></a></span>



<figure class="wp-block-image size-large is-resized"><img loading="lazy" decoding="async" src="https://aiimpacts.org/wp-content/uploads/2020/05/image-1-1024x633.png" alt="" class="wp-image-2551" width="591" height="364" srcset="http://aiimpacts.org/wp-content/uploads/2020/05/image-1-1024x633.png 1024w, http://aiimpacts.org/wp-content/uploads/2020/05/image-1-300x185.png 300w, http://aiimpacts.org/wp-content/uploads/2020/05/image-1.png 1366w" sizes="auto, (max-width: 591px) 100vw, 591px" /><figcaption class="wp-element-caption"><strong>Figure 2: Ten-day minimums in real GPU price / single-precision FLOPS over time. The vertical axis is log-scale. Price is measured in 2019 dollars. The blue line shows the trendline ignoring data before late 2007. (We believe the apparent steep decline prior to late 2007 is an artefact of a lack of data for that time period.)</strong></figcaption></figure>



<p>The cheapest GPU price / FLOPS hardware using release date pricing has not decreased since 2017. However there was a similar period of stagnation between early 2009 and 2011, so this may not represent a slowing of the trend in the long run.</p>



<p>Based on the figures above, the running minimums seem to follow a roughly exponential trend. If we do not include the initial point in 2007, (which we suspect is not in fact the cheapest hardware at the time), we get that the cheapest GPU price / single-precision FLOPS fell by around 17% per year, for a factor of ten in&nbsp; ~12.5 years.<span id='easy-footnote-21-2316' class='easy-footnote-margin-adjust'></span><span class='easy-footnote'><a href='http://aiimpacts.org/2019-recent-trends-in-gpu-price-per-flops/#easy-footnote-bottom-21-2316' title='See &lt;a href=&quot;https://docs.google.com/spreadsheets/d/1ZZm5Wgr3BDRtloTZGylWzYTaVr5VqjiwOiRNu5Pz_q8/edit?usp=sharing&quot;&gt;this spreadsheet&lt;/a&gt;, tab ‘Cleaned GPU Data for SP Minimums’ for the calculation.'><sup>21</sup></a></span>



<h3 class="wp-block-heading">GPU price / half-precision FLOPS</h3>



<p>Figure 3 shows GPU price / half-precision FLOPS for all the GPUs in our search above for which we could find half-precision theoretical performance.<span id='easy-footnote-22-2316' class='easy-footnote-margin-adjust'></span><span class='easy-footnote'><a href='http://aiimpacts.org/2019-recent-trends-in-gpu-price-per-flops/#easy-footnote-bottom-22-2316' title='See &lt;a href=&quot;https://docs.google.com/spreadsheets/d/1ZZm5Wgr3BDRtloTZGylWzYTaVr5VqjiwOiRNu5Pz_q8/edit?usp=sharing&quot;&gt;this spreadsheet&lt;/a&gt;, tab ‘Cleaned GPU Data for HP’ for the chart generation.'><sup>22</sup></a></span>



<figure class="wp-block-image is-resized"><img loading="lazy" decoding="async" src="https://lh4.googleusercontent.com/mAmmM2htFRBW13ObHMQSBBDbP0DiVsRqDZyvrvGXYS6iueDBNNPYABeJqDM2DWqi1o3S21IcMXbLCQeeWUR1xFSYDKapZW2vzr4_pFaXvpjN98DvZAwmLCx_3dQm21NWa7xlWKrh" alt="" width="588" height="363"/><figcaption class="wp-element-caption"><strong>Figure 3: Real GPU price / half-precision FLOPS over time. The vertical axis is log-scale. Price is measured in 2019 dollars.</strong></figcaption></figure>



<p>Again, we looked at the running minimums of this graph every 10 days, shown in Figure 4 below.<span id='easy-footnote-23-2316' class='easy-footnote-margin-adjust'></span><span class='easy-footnote'><a href='http://aiimpacts.org/2019-recent-trends-in-gpu-price-per-flops/#easy-footnote-bottom-23-2316' title='See &lt;a href=&quot;https://docs.google.com/spreadsheets/d/1ZZm5Wgr3BDRtloTZGylWzYTaVr5VqjiwOiRNu5Pz_q8/edit?usp=sharing&quot;&gt;this spreadsheet&lt;/a&gt;, tab ‘Cleaned GPU Data for HP Minimums’ for the plotting. We used &lt;a href=&quot;https://drive.google.com/open?id=1JP98EP8nYA0KqofLm24vF2vwNL0PalcB&quot;&gt;this script&lt;/a&gt; on the data from the ‘Cleaned GPU Data for HP’ to calculate the minimums and then import them into a new sheet of the spreadsheet.'><sup>23</sup></a></span>



<figure class="wp-block-image is-resized"><img loading="lazy" decoding="async" src="https://lh6.googleusercontent.com/ZoVY4R_fEQVgrS4n-v8EkOorER7M-Wup_nAFUujdcRTwRrj3MyK5ADkcKgPremWa0uF_gMHUZYoZ6uq735_XyZa6d_voJMhOF7wyOEq23goNuCzjoFaKc-pHYhsXgMc0inmXHUNs" alt="" width="589" height="364"/><figcaption class="wp-element-caption"><strong>Figure 4: Minimums in real GPU price / half-precision FLOPS over time. The vertical axis is log-scale. Price is measured in 2019 dollars.</strong></figcaption></figure>



<p>If we assume an exponential trend with noise,<span id='easy-footnote-24-2316' class='easy-footnote-margin-adjust'></span><span class='easy-footnote'><a href='http://aiimpacts.org/2019-recent-trends-in-gpu-price-per-flops/#easy-footnote-bottom-24-2316' title='Where ambiguous, we assume these trends are exponential rather than linear, because our understanding is that that is much more common historically in computing hardware price trends.'><sup>24</sup></a></span> cheapest GPU price / half-precision FLOPS fell by around 26% per year, which would yield a factor of ten after ~8 years.<span id='easy-footnote-25-2316' class='easy-footnote-margin-adjust'></span><span class='easy-footnote'><a href='http://aiimpacts.org/2019-recent-trends-in-gpu-price-per-flops/#easy-footnote-bottom-25-2316' title='See &lt;a href=&quot;https://docs.google.com/spreadsheets/d/1ZZm5Wgr3BDRtloTZGylWzYTaVr5VqjiwOiRNu5Pz_q8/edit?usp=sharing&quot;&gt;this spreadsheet&lt;/a&gt;, tab ‘Cleaned GPU Data for HP Minimums’ for the calculation.'><sup>25</sup></a></span>



<h3 class="wp-block-heading">GPU price / half-precision FMA FLOPS</h3>



<p>Figure 5 shows GPU price / half-precision FMA FLOPS for all the GPUs in our search above for which we could find half-precision FMA theoretical performance.<span id='easy-footnote-26-2316' class='easy-footnote-margin-adjust'></span><span class='easy-footnote'><a href='http://aiimpacts.org/2019-recent-trends-in-gpu-price-per-flops/#easy-footnote-bottom-26-2316' title='See &lt;a href=&quot;https://docs.google.com/spreadsheets/d/1ZZm5Wgr3BDRtloTZGylWzYTaVr5VqjiwOiRNu5Pz_q8/edit?usp=sharing&quot;&gt;this spreadsheet&lt;/a&gt;, tab ‘Cleaned GPU Data for HP + Tensor Cores’ for the chart generation.'><sup>26</sup></a></span> (Note that this includes all of our half-precision data above, since those FLOPS could be used for fused-multiply adds in particular). GPUs with TensorCores are marked in red.</p>



<figure class="wp-block-image is-resized"><img loading="lazy" decoding="async" src="https://lh6.googleusercontent.com/Reu_O__PF1WmEYmT6RD8JIkotqoO-dO9HTqmoQBVf_lgj0CtTvl4TNm6E1kPJ5AirX0T5j3g1QOv6hwQLjpWct__by_g1lj3LFpSgd1emAKg3FyyDWP-gRyW5sTs4Ostp9JGkcRA" alt="" width="588" height="363"/><figcaption class="wp-element-caption"><strong>Figure 5: Real GPU price / half-precision FMA FLOPS over time. Price is measured in 2019 dollars.</strong></figcaption></figure>



<p>Figure 6 shows the running minimums of GPU price / HP FMA FLOPS.<span id='easy-footnote-27-2316' class='easy-footnote-margin-adjust'></span><span class='easy-footnote'><a href='http://aiimpacts.org/2019-recent-trends-in-gpu-price-per-flops/#easy-footnote-bottom-27-2316' title='See &lt;a href=&quot;https://docs.google.com/spreadsheets/d/1ZZm5Wgr3BDRtloTZGylWzYTaVr5VqjiwOiRNu5Pz_q8/edit?usp=sharing&quot;&gt;this spreadsheet&lt;/a&gt;, tab ‘Cleaned GPU Data for HP + Tensor Cores Minimums’ for the plotting. We used &lt;a href=&quot;https://drive.google.com/open?id=1JP98EP8nYA0KqofLm24vF2vwNL0PalcB&quot;&gt;this script&lt;/a&gt; on the data from the ‘Cleaned GPU Data for HP + Tensor Cores’ to calculate the minimums and then import them into a new sheet of the spreadsheet.'><sup>27</sup></a></span>



<figure class="wp-block-image is-resized"><img loading="lazy" decoding="async" src="https://lh6.googleusercontent.com/SyYEb2HGDZCw8fPF15iIUwmU3U5r9UTKpYlaRXBFCMcrTBJhCf7VTeqwC6B82G2hmANUKYTiTLibBmsYm5XGAiV655z7GCdFX0jDeilXyX2cyHlcHe9wWwXRfQrPE2_gA3iTxzf3" alt="" width="586" height="362"/><figcaption class="wp-element-caption"><strong>Figure 6: Minimums in real GPU price / half-precision FMA FLOPS over time. Price is measured in 2019 dollars.</strong></figcaption></figure>



<p>GPU price / Half-Precision FMA FLOPS appears to be following an exponential trend over the last four years, falling by around 46% per year, for a factor of ten in ~4 years.<span id='easy-footnote-28-2316' class='easy-footnote-margin-adjust'></span><span class='easy-footnote'><a href='http://aiimpacts.org/2019-recent-trends-in-gpu-price-per-flops/#easy-footnote-bottom-28-2316' title='See &lt;a href=&quot;https://docs.google.com/spreadsheets/d/1ZZm5Wgr3BDRtloTZGylWzYTaVr5VqjiwOiRNu5Pz_q8/edit?usp=sharing&quot;&gt;this spreadsheet&lt;/a&gt;, tab ‘Cleaned GPU Data for HP + Tensor Cores Minimums’ for the calculation.'><sup>28</sup></a></span>



<h2 class="wp-block-heading">Active Prices</h2>



<p>GPU prices often go down from the time of release, and some popular GPUs are older ones that have gone down in price.<span id='easy-footnote-29-2316' class='easy-footnote-margin-adjust'></span><span class='easy-footnote'><a href='http://aiimpacts.org/2019-recent-trends-in-gpu-price-per-flops/#easy-footnote-bottom-29-2316' title='For example, one of the GPUs recommended for deep learning &lt;a href=&quot;https://www.reddit.com/r/MachineLearning/comments/b95182/d_which_gpus_to_get_for_deep_learning_my/&quot;&gt;in this Reddit thread&lt;/a&gt; is the GTX 1060 (6GB), which has been around &lt;a href=&quot;https://www.techpowerup.com/gpu-specs/geforce-gtx-1060-6-gb.c2862&quot;&gt;since 2016&lt;/a&gt;.'><sup>29</sup></a></span> Given this, it makes sense to look at active price data for the same GPU over time.</p>



<h3 class="wp-block-heading">Data Sources</h3>



<p>We collected data on peak theoretical performance in FLOPS from <a href="https://www.techpowerup.com/">TechPowerUp</a><span id='easy-footnote-30-2316' class='easy-footnote-margin-adjust'></span><span class='easy-footnote'><a href='http://aiimpacts.org/2019-recent-trends-in-gpu-price-per-flops/#easy-footnote-bottom-30-2316' title='We scraped data from &lt;a href=&quot;https://www.techpowerup.com/gpu-specs/radeon-rx-480.c2848&quot;&gt;individual TechPowerUp pages&lt;/a&gt; using &lt;a href=&quot;https://drive.google.com/open?id=1msy977jSLcJspULMWlWOfIIFRWo_vsds&quot;&gt;this script&lt;/a&gt;. Our full scraped TechPowerUp dataset can be found &lt;a href=&quot;https://docs.google.com/spreadsheets/d/1pXTyUJ2AvpkhYtphGn8UnAl4gI8j2jsx1AaGrOahl7o/edit?usp=sharing&quot;&gt;here&lt;/a&gt;.'><sup>30</sup></a></span> and combined it with active GPU price data to get GPU price / FLOPS over time.<span id='easy-footnote-31-2316' class='easy-footnote-margin-adjust'></span><span class='easy-footnote'><a href='http://aiimpacts.org/2019-recent-trends-in-gpu-price-per-flops/#easy-footnote-bottom-31-2316' title='We chose to automatically scrape theoretical peak performance numbers from TechPowerUp instead of using the ones we manually collected above because there were several GPUs in the active pricing datasets that we hadn’t collected data for manually, and it was easier to scrape the entire site than just the subset of GPUs we needed.'><sup>31</sup></a></span> Our primary source of historical pricing data was Passmark, though we also found a less trustworthy dataset on Kaggle which we used to check our analysis. We adjusted prices for inflation based on the consumer price index.<span id='easy-footnote-32-2316' class='easy-footnote-margin-adjust'></span><span class='easy-footnote'><a href='http://aiimpacts.org/2019-recent-trends-in-gpu-price-per-flops/#easy-footnote-bottom-32-2316' title='“CPI Home.” U.S. Bureau of Labor Statistics. U.S. Bureau of Labor Statistics. Accessed May 2, 2020. https://www.bls.gov/cpi/.'><sup>32</sup></a></span>



<h4 class="wp-block-heading">Passmark</h4>



<p>We scraped pricing data<span id='easy-footnote-33-2316' class='easy-footnote-margin-adjust'></span><span class='easy-footnote'><a href='http://aiimpacts.org/2019-recent-trends-in-gpu-price-per-flops/#easy-footnote-bottom-33-2316' title='We used &lt;a href=&quot;https://drive.google.com/open?id=1nd7111hOb-eCMBCOe1qocXYuAJys0pLk&quot;&gt;this script&lt;/a&gt;.'><sup>33</sup></a></span> on GPUs between 2011 and early 2020 from Passmark.<span id='easy-footnote-34-2316' class='easy-footnote-margin-adjust'></span><span class='easy-footnote'><a href='http://aiimpacts.org/2019-recent-trends-in-gpu-price-per-flops/#easy-footnote-bottom-34-2316' title='“PassMark &amp;#8211; GeForce GTX 660 &amp;#8211; Price Performance Comparison.” Accessed March 24, 2020. &lt;a href=&quot;https://www.videocardbenchmark.net/gpu.php?gpu=GeForce+GTX+660&amp;amp;id=2152&quot;&gt;https://www.videocardbenchmark.net/gpu.php?gpu=GeForce+GTX+660&amp;amp;id=2152&lt;/a&gt;.'><sup>34</sup></a></span> Where necessary, we renamed GPUs from Passmark to be consistent with TechPowerUp.<span id='easy-footnote-35-2316' class='easy-footnote-margin-adjust'></span><span class='easy-footnote'><a href='http://aiimpacts.org/2019-recent-trends-in-gpu-price-per-flops/#easy-footnote-bottom-35-2316' title='In most cases where renaming was necessary, the same GPU had multiple clear names, e.g. the “Radeon HD 7970 / R9 280X” in PassMark was just called the “Radeon HD 7970” in TechPowerUp. In a few cases, Passmark listed some GPUs which TechPowerUp listed separately as one GPU, e.g. “Radeon R9 290X / 390X” seemed to ambiguously refer to the Radeon R9 290X or Radeon R9 390X. In these cases, we conservatively assume that the GPU refers to the less powerful / earlier GPU. In one exceptional case, we assumed that the “Radeon R9 Fury + Fury X” referred to the Radeon Fury X in PassMark. The ambiguously named GPUs were not in the minimum data we calculated, so probably did not have a strong effect on the final result.'><sup>35</sup></a></span> The Passmark data consists of 38,138 price points for 352 GPUs. We guess that these represent most popular GPUs.&nbsp;<br></p>



<p>Looking at the ‘current prices’ listed on individual Passmark GPU pages, prices appear to be sourced from Amazon, Newegg, and Ebay. Passmark’s listed pricing data does not correspond to regular intervals. We don’t know if prices were pulled at irregular intervals, or if Passmark pulls prices regularly and then only lists major changes as price points. When we see a price point, we treat it as though the GPU is that price only at that time point, not indefinitely into the future.<br></p>



<p>The data contains several blips where a GPU is briefly sold very unusually cheaply. A random checking of some of these suggests to us that these correspond to single or small numbers of GPUs for sale, which we are not interested in tracking, because we are trying to predict AI progress, which presumably isn’t influenced by temporary discounts on tiny batches of GPUs.<br></p>



<h4 class="wp-block-heading">Kaggle</h4>



<p><a href="https://www.kaggle.com/raczeq/ethereum-effect-pc-parts">This Kaggle dataset</a> contains scraped data of GPU prices from price comparison sites PriceSpy.co.uk, PCPartPicker.com, Geizhals.eu from the years 2013 &#8211; 2018. The Kaggle dataset has 319,147 price points for 284 GPUs. Unfortunately, at least some of the data is clearly wrong, potentially because price comparison sites include pricing data from untrustworthy merchants.<span id='easy-footnote-36-2316' class='easy-footnote-margin-adjust'></span><span class='easy-footnote'><a href='http://aiimpacts.org/2019-recent-trends-in-gpu-price-per-flops/#easy-footnote-bottom-36-2316' title='For example, the Kaggle dataset includes extremely cheap FirePro S7150s sold in 2014, even though the FirePro S7150 only came out in 2016. One of the sellers of these cheap GPUs were ‘Club 3D’, which also appeared to sell several other erroneously cheap GPUs.'><sup>36</sup></a></span> As such, we don’t use the Kaggle data directly in our analysis, but do use it as a check on our Passmark data. The data that we get from Passmark roughly appears to be a subset of the Kaggle data from 2013 &#8211; 2018,<span id='easy-footnote-37-2316' class='easy-footnote-margin-adjust'></span><span class='easy-footnote'><a href='http://aiimpacts.org/2019-recent-trends-in-gpu-price-per-flops/#easy-footnote-bottom-37-2316' title='See &lt;a href=&quot;https://drive.google.com/open?id=1uvO-gNpAGh9qMzs5MnNtOg1X-R7JzdAq&quot;&gt;this plot of Passmark single-precision GPU price / FLOPS&lt;/a&gt; compared to the &lt;a href=&quot;https://drive.google.com/open?id=194Tqrcix2XdytbT-WbgcRbFpeHIHqsao&quot;&gt;combined Passmark and Kaggle single-precision GPU price / FLOPS&lt;/a&gt;, and &lt;a href=&quot;https://drive.google.com/open?id=1hk-1rHqOkWUwh3BVNbwTEu8dEfjDhsLT&quot;&gt;this plot of Passmark half-precision GPU price / FLOPS&lt;/a&gt; compared to the &lt;a href=&quot;https://drive.google.com/open?id=1mcpQigPJs9CRqebY-Uvm0dJ1si1jeTQv&quot;&gt;combined Passmark and Kaggle half-precision $ / FLOPS&lt;/a&gt;. In both cases the 2013 &amp;#8211; 2018 Passmark data appears to roughly be a subset of the Kaggle data.'><sup>37</sup></a></span> which is what we would expect if the price comparison engines picked up prices from the merchants Passmark looks at.</p>



<h4 class="wp-block-heading">Limitations</h4>



<p>There are a number of reasons why we think this analysis may in fact not reflect GPU price trends:</p>



<ul class="wp-block-list">
<li>We effectively have just one source of pricing data, Passmark.</li>



<li>Passmark appears to only look at Amazon, Newegg, and Ebay for pricing data.</li>



<li>We are not sure, but we suspect that Passmark only looks at the U.S. versions of Amazon, Newegg, and Ebay, and pricing may be significantly different in other parts of the world (though we guess&nbsp;it wouldn’t be different enough to change the general trend much).</li>



<li>As mentioned above, we are not sure if Passmark pulls price data regularly and only lists major price changes, or pulls price data irregularly. If the former is true, our data may be overrepresenting periods where the price changes dramatically.</li>



<li>None of the price data we found includes quantities of GPUs which were available at that price, which means some prices may be for only a very limited number of GPUs.</li>



<li>We don’t know how much the prices from these datasets reflect the prices that a company pays when buying GPUs in bulk, which we may be more interested in tracking.</li>
</ul>



<p>A better version of this analysis might start with more complete data from price comparison engines (along the lines of the Kaggle dataset) and then filter out clearly erroneous pricing information in some principled way.</p>



<h3 class="wp-block-heading">Data</h3>



<p>The original scraped datasets with cards renamed to match TechPowerUp can be found <a href="https://drive.google.com/drive/folders/1cCjG_sUUePxbh5fN9ViOPX6GW9D2GyPJ?usp=sharing">here</a>. GPU price / FLOPS data is graphed on a log scale in the figures below. Price points for the same GPU are marked in the same color. We adjusted prices for inflation using the consumer price index.<span id='easy-footnote-38-2316' class='easy-footnote-margin-adjust'></span><span class='easy-footnote'><a href='http://aiimpacts.org/2019-recent-trends-in-gpu-price-per-flops/#easy-footnote-bottom-38-2316' title='“CPI Home.” U.S. Bureau of Labor Statistics. U.S. Bureau of Labor Statistics. Accessed May 2, 2020. https://www.bls.gov/cpi/.'><sup>38</sup></a></span> All points below are in 2019 dollars.</p>



<p>To try to filter out noisy prices that didn’t last or were only available in small numbers, we took out the lowest 5% of data in every several day period<span id='easy-footnote-39-2316' class='easy-footnote-margin-adjust'></span><span class='easy-footnote'><a href='http://aiimpacts.org/2019-recent-trends-in-gpu-price-per-flops/#easy-footnote-bottom-39-2316' title='We set this period to be 10 days long when looking at single-precision data, and 30 days long when looking at half-precision data, since half-precision data was significantly more sparse.'><sup>39</sup></a></span> to get the 95th percentile cheapest hardware. We then found linear and exponential trendlines of best fit through the available hardware with the lowest GPU price / FLOPS every several days.<span id='easy-footnote-40-2316' class='easy-footnote-margin-adjust'></span><span class='easy-footnote'><a href='http://aiimpacts.org/2019-recent-trends-in-gpu-price-per-flops/#easy-footnote-bottom-40-2316' title='This calculation can be found in &lt;a href=&quot;https://docs.google.com/spreadsheets/d/15pTVDml1j81HROZ3_UeHZ51aBoqq-94-eM8N80npUX0/edit?usp=sharing&quot;&gt;this spreadsheet&lt;/a&gt;.'><sup>40</sup></a></span>



<h4 class="wp-block-heading">GPU price / single-precision FLOPS</h4>



<p>Figures 7-10 show the raw data, 95th percentile data, and trendlines for single-precision GPU price / FLOPS for the Passmark dataset. <a href="https://drive.google.com/open?id=1-PEl2kSORRH78Qa4huRF-t_g_m1QOTDs">This folder</a> contains plots of all our datasets, including the Kaggle dataset and combined Passmark + Kaggle dataset.<span id='easy-footnote-41-2316' class='easy-footnote-margin-adjust'></span><span class='easy-footnote'><a href='http://aiimpacts.org/2019-recent-trends-in-gpu-price-per-flops/#easy-footnote-bottom-41-2316' title='We used a Python plotting library to generate our plots, the script can be found &lt;a href=&quot;https://drive.google.com/open?id=1u3qI9m9W6_9efIpsDBq1Hc-qixcKy8Sb&quot;&gt;here&lt;/a&gt;. All of our resulting plots can be found &lt;a href=&quot;https://drive.google.com/open?id=1afVbKn34pw5rj4fn1vI_qOhdZh5GLfwE&quot;&gt;here&lt;/a&gt;. ‘single’ vs. ‘half’ refers to whether its $ / FLOPS data for single or half-precision FLOPS, ‘passmark’, ‘kaggle’, and ‘combined’ refer to which dataset is being plotted and ‘raw’ vs. ‘95’ refer to whether we’re plotting all the data or the 95th percentile data.'><sup>41</sup></a></span>



<figure class="wp-block-image size-large is-resized"><img loading="lazy" decoding="async" src="https://aiimpacts.org/wp-content/uploads/2020/04/single_passmark_raw_crop-1-1024x525.png" alt="" class="wp-image-2509" width="589" height="302" srcset="http://aiimpacts.org/wp-content/uploads/2020/04/single_passmark_raw_crop-1-1024x525.png 1024w, http://aiimpacts.org/wp-content/uploads/2020/04/single_passmark_raw_crop-1-300x154.png 300w, http://aiimpacts.org/wp-content/uploads/2020/04/single_passmark_raw_crop-1-768x394.png 768w, http://aiimpacts.org/wp-content/uploads/2020/04/single_passmark_raw_crop-1-1536x788.png 1536w, http://aiimpacts.org/wp-content/uploads/2020/04/single_passmark_raw_crop-1.png 1657w" sizes="auto, (max-width: 589px) 100vw, 589px" /><figcaption class="wp-element-caption"><br><strong>Figure 7: GPU price / single-precision FLOPS over time, taken from our Passmark dataset.<span id='easy-footnote-42-2316' class='easy-footnote-margin-adjust'></span><span class='easy-footnote'><a href='http://aiimpacts.org/2019-recent-trends-in-gpu-price-per-flops/#easy-footnote-bottom-42-2316' title='The dataset we used for this plot can be found &lt;a href=&quot;https://drive.google.com/open?id=19FB-_MnQAtJErb8e_5DHTRVKAPg4g1YX&quot;&gt;here&lt;/a&gt;. This a processed version of our scraped dataset, with prices / FLOPS adjusted for inflation. The script we used to process and plot can be found &lt;a href=&quot;https://drive.google.com/open?id=1u3qI9m9W6_9efIpsDBq1Hc-qixcKy8Sb&quot;&gt;here&lt;/a&gt;.'><sup>42</sup></a></span> Price is measured in 2019 dollars. <a href="https://drive.google.com/open?id=194Tqrcix2XdytbT-WbgcRbFpeHIHqsao">This picture</a> shows that the Kaggle data does appear to be a superset of the Passmark data from 2013 &#8211; 2018, giving us some evidence that the Passmark data is correct. The vertical axis is log-scale.</strong></figcaption></figure>



<p></p>



<p></p>



<figure class="wp-block-image size-large is-resized"><img loading="lazy" decoding="async" src="https://aiimpacts.org/wp-content/uploads/2020/04/single_passmark_95_crop-1024x525.png" alt="" class="wp-image-2510" width="587" height="301" srcset="http://aiimpacts.org/wp-content/uploads/2020/04/single_passmark_95_crop-1024x525.png 1024w, http://aiimpacts.org/wp-content/uploads/2020/04/single_passmark_95_crop-300x154.png 300w, http://aiimpacts.org/wp-content/uploads/2020/04/single_passmark_95_crop-768x394.png 768w, http://aiimpacts.org/wp-content/uploads/2020/04/single_passmark_95_crop-1536x788.png 1536w, http://aiimpacts.org/wp-content/uploads/2020/04/single_passmark_95_crop.png 1657w" sizes="auto, (max-width: 587px) 100vw, 587px" /><figcaption class="wp-element-caption"><br><strong>Figure 8: The top 95% of data every 10 days for GPU price / single-precision FLOPS over time, taken from the Passmark dataset we plotted above. (Figure 7 with the cheapest 5% removed.) The vertical axis is log-scale.<span id='easy-footnote-43-2316' class='easy-footnote-margin-adjust'></span><span class='easy-footnote'><a href='http://aiimpacts.org/2019-recent-trends-in-gpu-price-per-flops/#easy-footnote-bottom-43-2316' title='The script to calculate the 95th percentile and generate this plot can be found &lt;a href=&quot;https://drive.google.com/open?id=1u3qI9m9W6_9efIpsDBq1Hc-qixcKy8Sb&quot;&gt;here&lt;/a&gt;.'><sup>43</sup></a></span></strong></figcaption></figure>



<p></p>



<figure class="wp-block-image size-large is-resized"><img loading="lazy" decoding="async" src="https://aiimpacts.org/wp-content/uploads/2020/04/single_passmark_95_zoom_crop-1024x525.png" alt="" class="wp-image-2511" width="582" height="298" srcset="http://aiimpacts.org/wp-content/uploads/2020/04/single_passmark_95_zoom_crop-1024x525.png 1024w, http://aiimpacts.org/wp-content/uploads/2020/04/single_passmark_95_zoom_crop-300x154.png 300w, http://aiimpacts.org/wp-content/uploads/2020/04/single_passmark_95_zoom_crop-768x394.png 768w, http://aiimpacts.org/wp-content/uploads/2020/04/single_passmark_95_zoom_crop-1536x788.png 1536w, http://aiimpacts.org/wp-content/uploads/2020/04/single_passmark_95_zoom_crop.png 1657w" sizes="auto, (max-width: 582px) 100vw, 582px" /><figcaption class="wp-element-caption"><br><strong>Figure 9: The same data as Figure 8, with the vertical axis zoomed-in.</strong></figcaption></figure>



<figure class="wp-block-image size-large is-resized"><img loading="lazy" decoding="async" src="https://aiimpacts.org/wp-content/uploads/2020/06/image-1024x633.png" alt="" class="wp-image-2595" width="579" height="357" srcset="http://aiimpacts.org/wp-content/uploads/2020/06/image-1024x633.png 1024w, http://aiimpacts.org/wp-content/uploads/2020/06/image-300x186.png 300w, http://aiimpacts.org/wp-content/uploads/2020/06/image-768x475.png 768w, http://aiimpacts.org/wp-content/uploads/2020/06/image-1536x950.png 1536w, http://aiimpacts.org/wp-content/uploads/2020/06/image.png 1772w" sizes="auto, (max-width: 579px) 100vw, 579px" /><figcaption class="wp-element-caption"><strong>Figure 10: The minimum data points from the top 95% of the Passmark dataset, taken every 10 days. We fit linear and exponential trendlines through the data. The vertical axis is log-scale.<span id='easy-footnote-44-2316' class='easy-footnote-margin-adjust'></span><span class='easy-footnote'><a href='http://aiimpacts.org/2019-recent-trends-in-gpu-price-per-flops/#easy-footnote-bottom-44-2316' title='See &lt;a href=&quot;https://docs.google.com/spreadsheets/d/15pTVDml1j81HROZ3_UeHZ51aBoqq-94-eM8N80npUX0/edit?usp=sharing&quot;&gt;here&lt;/a&gt;, tab ‘Passmark SP Minimums&amp;#8217; to see our calculation of the minimums over time. We used &lt;a href=&quot;https://drive.google.com/open?id=1yRTJwVQAwCqLSTyGXgGIHcRfJFP7C5D2&quot;&gt;this script&lt;/a&gt; to generate the minimums, then imported them into this spreadsheet.'><sup>44</sup></a></span></strong></figcaption></figure>



<h5 class="wp-block-heading" id="single-precision-analysis">Analysis</h5>



<p>The cheapest 95th percentile data every 10 days appears to fit relatively well to both a linear and exponential trendline. However we assume that progress will follow an exponential, because previous progress has <a href="https://aiimpacts.org/recent-trend-in-the-cost-of-computing/">followed an exponential</a>.<br></p>



<p>In the Passmark dataset, the exponential trendline suggested that from 2011 to 2020, 95th-percentile GPU price / single-precision FLOPS fell by around 13% per year, for a factor of ten in ~17 years,<span id='easy-footnote-45-2316' class='easy-footnote-margin-adjust'></span><span class='easy-footnote'><a href='http://aiimpacts.org/2019-recent-trends-in-gpu-price-per-flops/#easy-footnote-bottom-45-2316' title='You can see our calculations for this &lt;a href=&quot;https://docs.google.com/spreadsheets/d/15pTVDml1j81HROZ3_UeHZ51aBoqq-94-eM8N80npUX0/edit?usp=sharing&quot;&gt;here&lt;/a&gt;, sheet ‘Passmark SP Minimums’. Each sheet has a cell ‘Rate to move an order of magnitude’ which has our calculation for how many years we need to move an order of magnitude. In the (untrustworthy) Kaggle dataset alone, its rate would yield an order of magnitude of decrease every ~12 years, and the rate in the combined dataset&amp;nbsp; would yield an order of magnitude of decrease every ~16 years.'><sup>45</sup></a></span>  bootstrap<span id='easy-footnote-46-2316' class='easy-footnote-margin-adjust'></span><span class='easy-footnote'><a href='http://aiimpacts.org/2019-recent-trends-in-gpu-price-per-flops/#easy-footnote-bottom-46-2316' title='Orloff, Jeremy, and Jonathan Bloom. “Bootstrap Confidence Intervals.” MIT OpenCourseWare, 2014. &lt;a href=&quot;https://ocw.mit.edu/courses/mathematics/18-05-introduction-to-probability-and-statistics-spring-2014/readings/MIT18_05S14_Reading24.pdf&quot;&gt;https://ocw.mit.edu/courses/mathematics/18-05-introduction-to-probability-and-statistics-spring-2014/readings/MIT18_05S14_Reading24.pdf&lt;/a&gt;.'><sup>46</sup></a></span> 95% confidence interval 16.3 to 18.1 years.<span id='easy-footnote-47-2316' class='easy-footnote-margin-adjust'></span><span class='easy-footnote'><a href='http://aiimpacts.org/2019-recent-trends-in-gpu-price-per-flops/#easy-footnote-bottom-47-2316' title='We used &lt;a href=&quot;https://drive.google.com/open?id=1XkA-8WruAMKM3y3cdNMPNJHabpMUE_MT&quot;&gt;this script&lt;/a&gt; to generate bootstrap confidence intervals for our datasets.'><sup>47</sup></a></span> We believe the rise in price / FLOPS in 2017 corresponds to a rise in GPU prices due to increased demand from cryptocurrency miners.<span id='easy-footnote-48-2316' class='easy-footnote-margin-adjust'></span><span class='easy-footnote'><a href='http://aiimpacts.org/2019-recent-trends-in-gpu-price-per-flops/#easy-footnote-bottom-48-2316' title='We think this is the case because we’ve observed this dip in other GPU analyses we’ve done, and because the timing lines up: &lt;a href=&quot;https://www.techspot.com/news/72854-nvidia-asking-graphics-card-retailers-prioritize-gamers-over.html&quot;&gt;the first table in this article&lt;/a&gt; shows how GPU prices were increasing starting 2017 and continued to increase through 2018, and &lt;a href=&quot;https://www.kaggle.com/raczeq/impact-of-cryptocurrencies-rates-on-pc-market/data&quot;&gt;the chart here&lt;/a&gt; shows how GPU prices increased in 2017.'><sup>48</sup></a></span> If we instead look at the trend from 2011 through 2016, before the cryptocurrency rise, we instead get that 95th-percentile GPU price / single-precision FLOPS price fell by around 13% per year, for a factor of ten in ~16 years.<span id='easy-footnote-49-2316' class='easy-footnote-margin-adjust'></span><span class='easy-footnote'><a href='http://aiimpacts.org/2019-recent-trends-in-gpu-price-per-flops/#easy-footnote-bottom-49-2316' title='You can see our calculations for this &lt;a href=&quot;https://docs.google.com/spreadsheets/d/15pTVDml1j81HROZ3_UeHZ51aBoqq-94-eM8N80npUX0/edit?usp=sharing&quot;&gt;here&lt;/a&gt;, sheet ‘Passmark SP Minimums’, next to ‘Exponential trendline from 2015 to 2016. The trendline calculated is technically the linear fit through the log of the data.'><sup>49</sup></a></span>



<p>This is slower than the order of magnitude every ~12.5 years we found when looking at release prices. If we restrict the release price data to 2011 &#8211; 2019, we get an order of magnitude decrease every ~13.5 years instead,<span id='easy-footnote-50-2316' class='easy-footnote-margin-adjust'></span><span class='easy-footnote'><a href='http://aiimpacts.org/2019-recent-trends-in-gpu-price-per-flops/#easy-footnote-bottom-50-2316' title='See our calculation &lt;a href=&quot;https://docs.google.com/spreadsheets/d/1ZZm5Wgr3BDRtloTZGylWzYTaVr5VqjiwOiRNu5Pz_q8/edit?usp=sharing&quot;&gt;here&lt;/a&gt;, tab ‘Cleaned GPU Data for SP Minimums’, next to the cell marked “Exponential trendline from 2011 to 2019.”'><sup>50</sup></a></span> so part of the discrepancy can be explained because of the different start times of the datasets. To get some assurance that our active price data wasn&#8217;t erroneous, we spot checked the best active price at the start of 2011, which was somewhat lower than the best release price at the same time, and confirmed that its given price was consistent with surrounding pricing data.<span id='easy-footnote-51-2316' class='easy-footnote-margin-adjust'></span><span class='easy-footnote'><a href='http://aiimpacts.org/2019-recent-trends-in-gpu-price-per-flops/#easy-footnote-bottom-51-2316' title='At the start of 2011, the &lt;a href=&quot;https://docs.google.com/spreadsheets/d/15pTVDml1j81HROZ3_UeHZ51aBoqq-94-eM8N80npUX0/edit?usp=sharing&quot;&gt;minimum release &lt;/a&gt;&lt;a href=&quot;https://docs.google.com/spreadsheets/d/1ZZm5Wgr3BDRtloTZGylWzYTaVr5VqjiwOiRNu5Pz_q8/edit?usp=sharing&quot;&gt;price / FLOPS&lt;/a&gt; (see tab, ‘Cleaned GPU Data for SP Minimums’)  is .000135 $ / FLOPS, whereas the &lt;a href=&quot;https://docs.google.com/spreadsheets/d/15pTVDml1j81HROZ3_UeHZ51aBoqq-94-eM8N80npUX0/edit?usp=sharing&quot;&gt;minimum active price / FLOPS&lt;/a&gt; (see tab, ‘Passmark SP Minimums’) is around .0001 $ / FLOPS. &lt;a href=&quot;https://docs.google.com/spreadsheets/d/15pTVDml1j81HROZ3_UeHZ51aBoqq-94-eM8N80npUX0/edit?usp=sharing&quot;&gt;The initial GPU price / FLOPS minimum&lt;/a&gt; (see sheet ‘Passmark SP Minimums’) corresponds to the Radeon HD 5850 which had a price of $184.9 in 3/2011 and a release price of $259. &lt;a href=&quot;https://www.videocardbenchmark.net/gpu.php?gpu=Radeon+HD+5850&amp;amp;id=47&quot;&gt;Looking at the general trend in Passmark&lt;/a&gt; suggests that the Radeon HD 5850 did indeed rapidly decline from its $259 release price to consistently below $200 prices.'><sup>51</sup></a></span> We think active prices are likely to be closer to the prices at which people actually bought GPUs, so we guess that ~17 years / order of magnitude decrease is a more accurate estimate of the trend we care about.</p>



<h4 class="wp-block-heading">GPU price / half-precision FLOPS</h4>



<p>Figures 11-14 show the raw data, 95th percentile data, and trendlines for half-precision GPU price / FLOPS for the Passmark dataset. <a href="https://drive.google.com/open?id=1-PEl2kSORRH78Qa4huRF-t_g_m1QOTDs">This folder</a> contains plots of the Kaggle dataset and combined Passmark + Kaggle dataset.</p>



<figure class="wp-block-image size-large is-resized"><img loading="lazy" decoding="async" src="https://aiimpacts.org/wp-content/uploads/2020/04/half_passmark_raw_crop-1024x525.png" alt="" class="wp-image-2512" width="580" height="297" srcset="http://aiimpacts.org/wp-content/uploads/2020/04/half_passmark_raw_crop-1024x525.png 1024w, http://aiimpacts.org/wp-content/uploads/2020/04/half_passmark_raw_crop-300x154.png 300w, http://aiimpacts.org/wp-content/uploads/2020/04/half_passmark_raw_crop-768x394.png 768w, http://aiimpacts.org/wp-content/uploads/2020/04/half_passmark_raw_crop-1536x788.png 1536w, http://aiimpacts.org/wp-content/uploads/2020/04/half_passmark_raw_crop.png 1657w" sizes="auto, (max-width: 580px) 100vw, 580px" /><figcaption class="wp-element-caption"><br><strong> Figure 11: GPU price / half-precision FLOPS over time, taken from our Passmark dataset. Price is measured in 2019 dollars.<span id='easy-footnote-52-2316' class='easy-footnote-margin-adjust'></span><span class='easy-footnote'><a href='http://aiimpacts.org/2019-recent-trends-in-gpu-price-per-flops/#easy-footnote-bottom-52-2316' title=' The dataset we used for this plot can be found here. This is a processed version of our scraped dataset, with prices / FLOPS adjusted for inflation. The script we used to process and plot can be found here.'><sup>52</sup></a></span>  This picture shows that the Kaggle data does appear to be a superset of the Passmark data from 2013 &#8211; 2018, giving us some evidence that the Passmark data is reasonable. The vertical axis is log-scale.</strong></figcaption></figure>



<p></p>



<figure class="wp-block-image size-large is-resized"><img loading="lazy" decoding="async" src="https://aiimpacts.org/wp-content/uploads/2020/04/half_passmark_95_crop-1024x525.png" alt="" class="wp-image-2513" width="581" height="298" srcset="http://aiimpacts.org/wp-content/uploads/2020/04/half_passmark_95_crop-1024x525.png 1024w, http://aiimpacts.org/wp-content/uploads/2020/04/half_passmark_95_crop-300x154.png 300w, http://aiimpacts.org/wp-content/uploads/2020/04/half_passmark_95_crop-768x394.png 768w, http://aiimpacts.org/wp-content/uploads/2020/04/half_passmark_95_crop-1536x788.png 1536w, http://aiimpacts.org/wp-content/uploads/2020/04/half_passmark_95_crop.png 1657w" sizes="auto, (max-width: 581px) 100vw, 581px" /><figcaption class="wp-element-caption"><br><strong>Figure 12: The top 95% of data every 30 days for GPU price / half-precision FLOPS over time, taken from the Passmark dataset we plotted above. (Figure 11 with the cheapest 5% removed.) The vertical axis is log-scale.<span id='easy-footnote-53-2316' class='easy-footnote-margin-adjust'></span><span class='easy-footnote'><a href='http://aiimpacts.org/2019-recent-trends-in-gpu-price-per-flops/#easy-footnote-bottom-53-2316' title='The script to calculate the 95th percentile and generate this plot can be found here.'><sup>53</sup></a></span></strong></figcaption></figure>



<p></p>



<figure class="wp-block-image size-large is-resized"><img loading="lazy" decoding="async" src="https://aiimpacts.org/wp-content/uploads/2020/04/half_passmark_95_zoom_crop-1024x525.png" alt="" class="wp-image-2514" width="583" height="299" srcset="http://aiimpacts.org/wp-content/uploads/2020/04/half_passmark_95_zoom_crop-1024x525.png 1024w, http://aiimpacts.org/wp-content/uploads/2020/04/half_passmark_95_zoom_crop-300x154.png 300w, http://aiimpacts.org/wp-content/uploads/2020/04/half_passmark_95_zoom_crop-768x394.png 768w, http://aiimpacts.org/wp-content/uploads/2020/04/half_passmark_95_zoom_crop-1536x788.png 1536w, http://aiimpacts.org/wp-content/uploads/2020/04/half_passmark_95_zoom_crop.png 1657w" sizes="auto, (max-width: 583px) 100vw, 583px" /><figcaption class="wp-element-caption"><br><strong>Figure 13: The same data as Figure 12, with the vertical axis zoomed-in.</strong></figcaption></figure>



<figure class="wp-block-image size-large is-resized"><img loading="lazy" decoding="async" src="https://aiimpacts.org/wp-content/uploads/2020/06/image-1-1024x633.png" alt="" class="wp-image-2596" width="576" height="356" srcset="http://aiimpacts.org/wp-content/uploads/2020/06/image-1-1024x633.png 1024w, http://aiimpacts.org/wp-content/uploads/2020/06/image-1-300x186.png 300w, http://aiimpacts.org/wp-content/uploads/2020/06/image-1-768x475.png 768w, http://aiimpacts.org/wp-content/uploads/2020/06/image-1-1536x950.png 1536w, http://aiimpacts.org/wp-content/uploads/2020/06/image-1.png 1772w" sizes="auto, (max-width: 576px) 100vw, 576px" /><figcaption class="wp-element-caption"><strong>Figure 14: The minimum data points from the top 95% of the Passmark dataset, taken every 30 days. We fit linear and exponential trendlines through the data. The vertical axis is log-scale.<span id='easy-footnote-54-2316' class='easy-footnote-margin-adjust'></span><span class='easy-footnote'><a href='http://aiimpacts.org/2019-recent-trends-in-gpu-price-per-flops/#easy-footnote-bottom-54-2316' title='See &lt;a href=&quot;https://docs.google.com/spreadsheets/d/15pTVDml1j81HROZ3_UeHZ51aBoqq-94-eM8N80npUX0/edit?usp=sharing&quot;&gt;here&lt;/a&gt;, tab ‘Passmark HP Minimums&amp;#8217; to see our calculation of the minimums over time. We used &lt;a href=&quot;https://drive.google.com/open?id=1u3qI9m9W6_9efIpsDBq1Hc-qixcKy8Sb&quot;&gt;this script&lt;/a&gt; to generate the minimums, then imported them into this spreadsheet.'><sup>54</sup></a></span></strong></figcaption></figure>



<h5 class="wp-block-heading">Analysis</h5>



<p>If we assume the trend is exponential, the Passmark trend seems to suggest that from 2015 to 2020, 95th-percentile GPU price / half-precision FLOPS of GPUs has fallen by around 21% per year, for a factor of ten over ~10 years,<span id='easy-footnote-55-2316' class='easy-footnote-margin-adjust'></span><span class='easy-footnote'><a href='http://aiimpacts.org/2019-recent-trends-in-gpu-price-per-flops/#easy-footnote-bottom-55-2316' title='See the sheet marked ‘Passmark HP minimums’ in &lt;a href=&quot;https://docs.google.com/spreadsheets/d/15pTVDml1j81HROZ3_UeHZ51aBoqq-94-eM8N80npUX0/edit?usp=sharing&quot;&gt;this spreadsheet&lt;/a&gt;. The trendline calculated is technically the linear fit through the log of the data.'><sup>55</sup></a></span> bootstrap<span id='easy-footnote-56-2316' class='easy-footnote-margin-adjust'></span><span class='easy-footnote'><a href='http://aiimpacts.org/2019-recent-trends-in-gpu-price-per-flops/#easy-footnote-bottom-56-2316' title='Orloff, Jeremy, and Jonathan Bloom. “Bootstrap Confidence Intervals.” MIT OpenCourseWare, 2014. &lt;a href=&quot;https://ocw.mit.edu/courses/mathematics/18-05-introduction-to-probability-and-statistics-spring-2014/readings/MIT18_05S14_Reading24.pdf&quot;&gt;https://ocw.mit.edu/courses/mathematics/18-05-introduction-to-probability-and-statistics-spring-2014/readings/MIT18_05S14_Reading24.pdf&lt;/a&gt;.'><sup>56</sup></a></span> 95% confidence interval 8.8 to 11 years.<span id='easy-footnote-57-2316' class='easy-footnote-margin-adjust'></span><span class='easy-footnote'><a href='http://aiimpacts.org/2019-recent-trends-in-gpu-price-per-flops/#easy-footnote-bottom-57-2316' title='We used &lt;a href=&quot;https://drive.google.com/open?id=1XkA-8WruAMKM3y3cdNMPNJHabpMUE_MT&quot;&gt;this script&lt;/a&gt; to generate bootstrap confidence intervals for our datasets.'><sup>57</sup></a></span> This is fairly close to the ~8 years / order of magnitude decrease we found when looking at release price data, but we treat active prices as a more accurate estimate of the actual prices at which people bought GPUs. As in our previous dataset, there is a noticeable rise in 2017, which we think is due to GPU prices increasing as a result of cryptocurrency miners. If we look at the trend from 2015 through 2016, before this rise, we get that 95th-percentile GPU price / half-precision FLOPS has fallen by around 14% per year, which would yield a factor of ten over ~8 years.<span id='easy-footnote-58-2316' class='easy-footnote-margin-adjust'></span><span class='easy-footnote'><a href='http://aiimpacts.org/2019-recent-trends-in-gpu-price-per-flops/#easy-footnote-bottom-58-2316' title='See the sheet marked ‘Passmark HP minimums’ in &lt;a href=&quot;https://docs.google.com/spreadsheets/d/15pTVDml1j81HROZ3_UeHZ51aBoqq-94-eM8N80npUX0/edit?usp=sharing&quot;&gt;this spreadsheet&lt;/a&gt;.'><sup>58</sup></a></span>



<h4 class="wp-block-heading">GPU price / half-precision FMA FLOPS</h4>



<p>Figures 15-18 show the raw data, 95th percentile data, and trendlines for half-precision GPU price / FMA FLOPS for the Passmark dataset. GPUs with Tensor Cores are marked in black. <a href="https://drive.google.com/open?id=1-PEl2kSORRH78Qa4huRF-t_g_m1QOTDs">This folder</a> contains plots of the Kaggle dataset and combined Passmark + Kaggle dataset.</p>



<figure class="wp-block-image size-large is-resized"><img loading="lazy" decoding="async" src="https://aiimpacts.org/wp-content/uploads/2020/04/half_passmark_tensor_raw_crop-1024x525.png" alt="" class="wp-image-2515" width="576" height="295" srcset="http://aiimpacts.org/wp-content/uploads/2020/04/half_passmark_tensor_raw_crop-1024x525.png 1024w, http://aiimpacts.org/wp-content/uploads/2020/04/half_passmark_tensor_raw_crop-300x154.png 300w, http://aiimpacts.org/wp-content/uploads/2020/04/half_passmark_tensor_raw_crop-768x394.png 768w, http://aiimpacts.org/wp-content/uploads/2020/04/half_passmark_tensor_raw_crop-1536x788.png 1536w, http://aiimpacts.org/wp-content/uploads/2020/04/half_passmark_tensor_raw_crop.png 1657w" sizes="auto, (max-width: 576px) 100vw, 576px" /><figcaption class="wp-element-caption"><br><strong>Figure 15: GPU price / half-precision FMA FLOPS over time, taken from our Passmark dataset.<span id='easy-footnote-59-2316' class='easy-footnote-margin-adjust'></span><span class='easy-footnote'><a href='http://aiimpacts.org/2019-recent-trends-in-gpu-price-per-flops/#easy-footnote-bottom-59-2316' title='The dataset we used for this plot can be found here. This a processed version of our scraped dataset, with prices / FLOPS adjusted for inflation. The script we used to process and plot can be found here.'><sup>59</sup></a></span> price is measured in 2019 dollars. This picture shows that the Kaggle data does appear to be a superset of the Passmark data from 2013 &#8211; 2018, giving us some evidence that the Passmark data is correct. The vertical axis is log-scale.</strong></figcaption></figure>



<p></p>



<figure class="wp-block-image size-large is-resized"><img loading="lazy" decoding="async" src="https://aiimpacts.org/wp-content/uploads/2020/04/half_passmark_tensor_95_crop-1024x520.png" alt="" class="wp-image-2516" width="583" height="296" srcset="http://aiimpacts.org/wp-content/uploads/2020/04/half_passmark_tensor_95_crop-1024x520.png 1024w, http://aiimpacts.org/wp-content/uploads/2020/04/half_passmark_tensor_95_crop-300x152.png 300w, http://aiimpacts.org/wp-content/uploads/2020/04/half_passmark_tensor_95_crop-768x390.png 768w, http://aiimpacts.org/wp-content/uploads/2020/04/half_passmark_tensor_95_crop-1536x780.png 1536w, http://aiimpacts.org/wp-content/uploads/2020/04/half_passmark_tensor_95_crop.png 1674w" sizes="auto, (max-width: 583px) 100vw, 583px" /><figcaption class="wp-element-caption"><br><strong>Figure 16: The top 95% of data every 30 days for GPU price / half-precision FMA FLOPS over time, taken from the Passmark dataset we plotted above.<span id='easy-footnote-60-2316' class='easy-footnote-margin-adjust'></span><span class='easy-footnote'><a href='http://aiimpacts.org/2019-recent-trends-in-gpu-price-per-flops/#easy-footnote-bottom-60-2316' title='The script to calculate the 95th percentile and generate this plot can be found here.'><sup>60</sup></a></span> (Figure 15 with the cheapest 5% removed.)</strong></figcaption></figure>



<p></p>



<figure class="wp-block-image size-large is-resized"><img loading="lazy" decoding="async" src="https://aiimpacts.org/wp-content/uploads/2020/04/half_passmark_tensor_95_zoom_crop-1024x520.png" alt="" class="wp-image-2517" width="580" height="294" srcset="http://aiimpacts.org/wp-content/uploads/2020/04/half_passmark_tensor_95_zoom_crop-1024x520.png 1024w, http://aiimpacts.org/wp-content/uploads/2020/04/half_passmark_tensor_95_zoom_crop-300x152.png 300w, http://aiimpacts.org/wp-content/uploads/2020/04/half_passmark_tensor_95_zoom_crop-768x390.png 768w, http://aiimpacts.org/wp-content/uploads/2020/04/half_passmark_tensor_95_zoom_crop-1536x780.png 1536w, http://aiimpacts.org/wp-content/uploads/2020/04/half_passmark_tensor_95_zoom_crop.png 1674w" sizes="auto, (max-width: 580px) 100vw, 580px" /><figcaption class="wp-element-caption"><br><strong>Figure 17: The same data as Figure 16, with the vertical axis zoomed-in.</strong></figcaption></figure>



<figure class="wp-block-image size-large is-resized"><img loading="lazy" decoding="async" src="https://aiimpacts.org/wp-content/uploads/2020/06/image-2-1024x633.png" alt="" class="wp-image-2597" width="580" height="358" srcset="http://aiimpacts.org/wp-content/uploads/2020/06/image-2-1024x633.png 1024w, http://aiimpacts.org/wp-content/uploads/2020/06/image-2-300x186.png 300w, http://aiimpacts.org/wp-content/uploads/2020/06/image-2-768x475.png 768w, http://aiimpacts.org/wp-content/uploads/2020/06/image-2-1536x950.png 1536w, http://aiimpacts.org/wp-content/uploads/2020/06/image-2.png 1772w" sizes="auto, (max-width: 580px) 100vw, 580px" /><figcaption class="wp-element-caption"><strong>Figure 18: The minimum data points from the top 95% of the Passmark dataset, taken every 30 days. We fit linear and exponential trendlines through the data.<span id='easy-footnote-61-2316' class='easy-footnote-margin-adjust'></span><span class='easy-footnote'><a href='http://aiimpacts.org/2019-recent-trends-in-gpu-price-per-flops/#easy-footnote-bottom-61-2316' title='See &lt;a href=&quot;https://docs.google.com/spreadsheets/d/15pTVDml1j81HROZ3_UeHZ51aBoqq-94-eM8N80npUX0/edit?usp=sharing&quot;&gt;here&lt;/a&gt;, tab ‘Passmark HP FMA Minimums&amp;#8217; to see our calculation of the minimums over time. We used &lt;a href=&quot;https://drive.google.com/open?id=1yRTJwVQAwCqLSTyGXgGIHcRfJFP7C5D2&quot;&gt;this script&lt;/a&gt; to generate the minimums, then imported them into this spreadsheet.'><sup>61</sup></a></span></strong></figcaption></figure>



<h5 class="wp-block-heading">Analysis</h5>



<p>If we assume the trend is exponential, the Passmark trend seems to suggest the 95th-percentile GPU price / half-precision FMA FLOPS of GPUs has fallen by around 40% per year, which would yield a factor of ten in ~4.5 years,<span id='easy-footnote-62-2316' class='easy-footnote-margin-adjust'></span><span class='easy-footnote'><a href='http://aiimpacts.org/2019-recent-trends-in-gpu-price-per-flops/#easy-footnote-bottom-62-2316' title='See the sheet marked ‘Passmark HP FMA minimums’ in &lt;a href=&quot;https://docs.google.com/spreadsheets/d/15pTVDml1j81HROZ3_UeHZ51aBoqq-94-eM8N80npUX0/edit?usp=sharing&quot;&gt;this spreadsheet&lt;/a&gt;. The trendline calculated is technically the linear fit through the log of the data.'><sup>62</sup></a></span> with a bootstrap<span id='easy-footnote-63-2316' class='easy-footnote-margin-adjust'></span><span class='easy-footnote'><a href='http://aiimpacts.org/2019-recent-trends-in-gpu-price-per-flops/#easy-footnote-bottom-63-2316' title='Orloff, Jeremy, and Jonathan Bloom. “Bootstrap Confidence Intervals.” MIT OpenCourseWare, 2014. &lt;a href=&quot;https://ocw.mit.edu/courses/mathematics/18-05-introduction-to-probability-and-statistics-spring-2014/readings/MIT18_05S14_Reading24.pdf&quot;&gt;https://ocw.mit.edu/courses/mathematics/18-05-introduction-to-probability-and-statistics-spring-2014/readings/MIT18_05S14_Reading24.pdf&lt;/a&gt;.'><sup>63</sup></a></span> 95% confidence interval 4 to 5.2 years.<span id='easy-footnote-64-2316' class='easy-footnote-margin-adjust'></span><span class='easy-footnote'><a href='http://aiimpacts.org/2019-recent-trends-in-gpu-price-per-flops/#easy-footnote-bottom-64-2316' title='We used &lt;a href=&quot;https://drive.google.com/open?id=1XkA-8WruAMKM3y3cdNMPNJHabpMUE_MT&quot;&gt;this script&lt;/a&gt; to generate bootstrap confidence intervals for our datasets.'><sup>64</sup></a></span> This is fairly close to the ~4 years / order of magnitude decrease we found when looking at release price data, but we think active prices are a more accurate estimate of the actual prices at which people bought GPUs.</p>



<p>The figures above suggest that certain GPUs with Tensor Cores were a significant (~half an order of magnitude) improvement over existing GPU price / half-precision FMA FLOPS.</p>



<h1 class="wp-block-heading">Conclusion</h1>



<p>We summarize our results in the table below.</p>



<figure class="wp-block-table"><table class="has-fixed-layout"><tbody><tr><td></td><td><strong>Release Prices</strong></td><td><strong>95th-percentile Active Prices</strong></td><td><strong>95th-percentile Active Prices</strong> <strong>(pre-crypto price rise)</strong></td></tr><tr><td></td><td><em>11/2007 &#8211; 1/2020</em></td><td><em>3/2011 &#8211; 1/2020</em></td><td><em>3/2011 &#8211; 12/2016 </em></td></tr><tr><td><strong>$ / single-precision FLOPS</strong></td><td>12.5</td><td>17</td><td>16</td></tr><tr><td></td><td><em>9/2014 &#8211; 1/2020</em></td><td><em>1/2015 &#8211; 1/2020</em></td><td><em>1/2015 &#8211; 12/2016 </em></td></tr><tr><td><strong>$ / half-precision FLOPS</strong></td><td>8</td><td>10</td><td>8</td></tr><tr><td><strong>$ / half-precision FMA FLOPS</strong></td><td>4</td><td>4.5</td><td>&#8212;</td></tr></tbody></table></figure>



<p>Release price data seems to generally support the trends we found in active prices, with the notable exception of trends in GPU price / single-precision FLOPS, which cannot be explained solely by the different start dates.<span id='easy-footnote-65-2316' class='easy-footnote-margin-adjust'></span><span class='easy-footnote'><a href='http://aiimpacts.org/2019-recent-trends-in-gpu-price-per-flops/#easy-footnote-bottom-65-2316' title='See our analysis in &lt;a href=&quot;#single-precision-analysis&quot;&gt;this section&lt;/a&gt; above.'><sup>65</sup></a></span> We think the best estimate of the overall trend for prices at which people recently bought GPUs is the 95th-percentile active price data from 2011 &#8211; 2020, since release price data does not account for existing GPUs becoming cheaper over time. The pre-crypto trends are similar to the overall trends, suggesting that the trends we are seeing are not anomalous due to cryptocurrency.<br></p>



<p>Given that, we guess that GPU prices as a whole have fallen at rates that would yield an order of magnitude over roughly:</p>



<ul class="wp-block-list">
<li>17 years for single-precision FLOPS</li>



<li>10 years for half-precision FLOPS</li>



<li>5 years for half-precision fused multiply-add FLOPS</li>
</ul>



<p>Half-precision FLOPS seem to have become cheaper substantially faster than single-precision in recent years. This may be a “catching up” effect as more of the space on GPUs was allocated to half-precision computing, rather than reflecting more fundamental technological progress.</p>



<p><em>Primary author: Asya Bergal</em></p>



<h1 class="wp-block-heading">Notes</h1>
]]></content:encoded>
					
					<wfw:commentRss>http://aiimpacts.org/2019-recent-trends-in-gpu-price-per-flops/feed/</wfw:commentRss>
			<slash:comments>1</slash:comments>
		
		
			</item>
	</channel>
</rss>
