<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>AI Impacts</title>
	<atom:link href="http://aiimpacts.org/feed/" rel="self" type="application/rss+xml" />
	<link>http://aiimpacts.org</link>
	<description></description>
	<lastBuildDate>Tue, 17 Dec 2024 22:46:54 +0000</lastBuildDate>
	<language>en-US</language>
	<sy:updatePeriod>
	hourly	</sy:updatePeriod>
	<sy:updateFrequency>
	1	</sy:updateFrequency>
	<generator>https://wordpress.org/?v=6.9.1</generator>
	<item>
		<title>How should we analyse survey forecasts of AI timelines?</title>
		<link>http://aiimpacts.org/how-should-we-analyse-survey-forecasts-of-ai-timelines/</link>
		
		<dc:creator><![CDATA[aiimpacts]]></dc:creator>
		<pubDate>Mon, 16 Dec 2024 05:39:34 +0000</pubDate>
				<category><![CDATA[Reports]]></category>
		<guid isPermaLink="false">https://aiimpacts.org/?p=3647</guid>

					<description><![CDATA[Tom Adamczewski, 2024 The Expert Survey on Progress in AI (ESPAI) is a large survey of AI researchers about the future of AI, conducted in 2016, 2022, and 2023. One main focus of the survey <a class="mh-excerpt-more" href="http://aiimpacts.org/how-should-we-analyse-survey-forecasts-of-ai-timelines/" title="How should we analyse survey forecasts of AI timelines?"></a>]]></description>
										<content:encoded><![CDATA[<link rel="stylesheet" href="https://cdn.jsdelivr.net/npm/katex@0.16.2/dist/katex.min.css">


<p><em>Tom Adamczewski, 2024</em></p>



<p>The Expert Survey on Progress in AI (ESPAI) is a large survey of AI researchers about the future of AI, conducted in <a href="https://arxiv.org/abs/1705.08807">2016</a>, <a href="https://wiki.aiimpacts.org/doku.php?id=ai_timelines:predictions_of_human-level_ai_timelines:ai_timeline_surveys:2022_expert_survey_on_progress_in_ai">2022</a>, and <a href="https://arxiv.org/abs/2401.02843">2023</a>. One main focus of the survey is the timing of progress in AI.<sup data-fn="23f54f49-9fa2-4345-a108-0f9d2acd5259" class="fn"><a href="#23f54f49-9fa2-4345-a108-0f9d2acd5259" id="23f54f49-9fa2-4345-a108-0f9d2acd5259-link">1</a></sup></p>



<p>The timing-related results of the survey are usually presented as a cumulative distribution function (CDF) showing probabilities as a function of years, in the aggregated opinion of respondents. Respondents gave triples of (year, probability) pairs for various AI milestones. Starting from these responses, two key steps of processing are required to obtain such a CDF:</p>



<ul class="wp-block-list">
<li>Fitting a continuous probability distribution to each response</li>



<li>Aggregating these distributions</li>
</ul>



<p>These two steps require a number of judgement calls. In addition, summarising and presenting the results involves many other implicit choices.</p>



<p>In this report, I investigate these choices and their impact on the results of the survey (for the 2023 iteration). I provide recommendations for how the survey results should be analysed and presented in the future.</p>



<p>This plot represents a summary of my best guesses as to how the ESPAI data should be analysed and presented.</p>



<figure class="wp-block-image size-large is-resized"><a href="https://aiimpacts.org/wp-content/uploads/2024/12/headline_result.png"><img fetchpriority="high" decoding="async" width="1024" height="768" src="https://aiimpacts.org/wp-content/uploads/2024/12/headline_result-1024x768.png" alt="" class="wp-image-3652" style="width:632px;height:auto" srcset="http://aiimpacts.org/wp-content/uploads/2024/12/headline_result-1024x768.png 1024w, http://aiimpacts.org/wp-content/uploads/2024/12/headline_result-300x225.png 300w, http://aiimpacts.org/wp-content/uploads/2024/12/headline_result-768x576.png 768w, http://aiimpacts.org/wp-content/uploads/2024/12/headline_result-1536x1152.png 1536w, http://aiimpacts.org/wp-content/uploads/2024/12/headline_result-2048x1536.png 2048w, http://aiimpacts.org/wp-content/uploads/2024/12/headline_result-678x509.png 678w, http://aiimpacts.org/wp-content/uploads/2024/12/headline_result-326x245.png 326w, http://aiimpacts.org/wp-content/uploads/2024/12/headline_result-80x60.png 80w" sizes="(max-width: 1024px) 100vw, 1024px" /></a></figure>



<details class="wp-block-details is-layout-flow wp-block-details-is-layout-flow"><summary>See the version in the paper</summary>
<figure class="wp-block-image size-large is-resized"><a href="https://aiimpacts.org/wp-content/uploads/2024/12/grace_2024_figure_3.png"><img decoding="async" width="788" height="1024" src="https://aiimpacts.org/wp-content/uploads/2024/12/grace_2024_figure_3-788x1024.png" alt="" class="wp-image-3653" style="width:459px;height:auto" srcset="http://aiimpacts.org/wp-content/uploads/2024/12/grace_2024_figure_3-788x1024.png 788w, http://aiimpacts.org/wp-content/uploads/2024/12/grace_2024_figure_3-231x300.png 231w, http://aiimpacts.org/wp-content/uploads/2024/12/grace_2024_figure_3-768x998.png 768w, http://aiimpacts.org/wp-content/uploads/2024/12/grace_2024_figure_3-1181x1536.png 1181w, http://aiimpacts.org/wp-content/uploads/2024/12/grace_2024_figure_3.png 1493w" sizes="(max-width: 788px) 100vw, 788px" /></a></figure>



<p><a href="https://arxiv.org/abs/2401.02843">Thousands of AI Authors on the Future of AI</a>, Figure 3. I added annotations to the 20%, 50%, and 80% points, for comparison with my plot.</p>
</details>



<p></p>



<p>I differ from previous authors in four main ways:</p>



<ul class="wp-block-list">
<li><strong>Show distribution of responses</strong>. Previous summary plots showed a random subset of responses, rather than quantifying the range of opinion among experts. I show a shaded area representing the central 50% of individual-level CDFs (25th to 75th percentile). <a href="#Displaying_the_distribution_of_responses">More</a></li>



<li><strong>Aggregate task and occupation questions</strong>. Previous analyses only showed task (HLMI) and occupation (FAOL) results separately, whereas I provide a single estimate combining both. By not providing a single headline result, previous approaches made summarization more difficult, and left room for selective interpretations. I find evidence that task automation (HLMI) numbers have been far more widely reported than occupation automation (FAOL). <a href="#Aggregating_across_the_task_and_occupation_framings">More</a></li>



<li><strong>Median aggregation</strong>. I’m quite uncertain as to which method is most appropriate in this context for aggregating the individual distributions into a single distribution. The arithmetic mean of probabilities, used by previous authors, is a reasonable option. I choose the median merely because it has the convenient property that we get the same result whether we take the median in the vertical direction (probabilities) or the horizontal (years). <a href="#Aggregation">More</a></li>



<li><strong>Flexible distributions</strong>: I fit individual-level CDF data to “flexible” interpolation-based distributions that can match the input data exactly. The original authors use the Gamma distribution. This change (and distribution fitting in general) makes only a small difference to the aggregate results. <a href="#Distribution_fitting">More</a></li>
</ul>



<details class="wp-block-details is-layout-flow wp-block-details-is-layout-flow"><summary>See effects of changes, compared to the results in the paper</summary>
<figure class="wp-block-image size-large is-resized"><a href="https://aiimpacts.org/wp-content/uploads/2024/12/combined_effect_headline.png"><img decoding="async" width="1024" height="768" src="https://aiimpacts.org/wp-content/uploads/2024/12/combined_effect_headline-1024x768.png" alt="" class="wp-image-3657" style="width:539px;height:auto" srcset="http://aiimpacts.org/wp-content/uploads/2024/12/combined_effect_headline-1024x768.png 1024w, http://aiimpacts.org/wp-content/uploads/2024/12/combined_effect_headline-300x225.png 300w, http://aiimpacts.org/wp-content/uploads/2024/12/combined_effect_headline-768x576.png 768w, http://aiimpacts.org/wp-content/uploads/2024/12/combined_effect_headline-1536x1152.png 1536w, http://aiimpacts.org/wp-content/uploads/2024/12/combined_effect_headline-678x509.png 678w, http://aiimpacts.org/wp-content/uploads/2024/12/combined_effect_headline-326x245.png 326w, http://aiimpacts.org/wp-content/uploads/2024/12/combined_effect_headline-80x60.png 80w, http://aiimpacts.org/wp-content/uploads/2024/12/combined_effect_headline.png 1920w" sizes="(max-width: 1024px) 100vw, 1024px" /></a></figure>



<p>The combined effect of (3 of the 4 elements of) our approach, compared with previous results. For legibility, this does not show the range of responses, although I consider this one of the most important innovations over previous analyses.</p>



<figure class="wp-block-table"><table class="has-fixed-layout"><tbody><tr><td><strong>CDF</strong></td><td><strong>Framing of automation</strong></td><td><strong>Distribution family</strong></td><td><strong>Loss function</strong></td><td><strong>Aggregation</strong></td><td><strong>p20</strong></td><td><strong>p50</strong></td><td><strong>p80</strong></td></tr><tr><td><mark style="background-color:rgba(0, 0, 0, 0)" class="has-inline-color has-vivid-cyan-blue-color">Blue (ours)</mark></td><td>Aggregate of tasks (HLMI) and occupations (FAOL)</td><td>Flexible</td><td>Not applicable</td><td>Median</td><td>2048</td><td>2073</td><td>2103</td></tr><tr><td><mark style="background-color:rgba(0, 0, 0, 0)" class="has-inline-color has-luminous-vivid-orange-color">Orange (previous)</mark></td><td>Tasks (HLMI)<br></td><td>Gamma</td><td>MSE of probabilities</td><td>Arithmetic mean of probabilities</td><td>2031</td><td>2047</td><td>2110</td></tr><tr><td><mark style="background-color:rgba(0, 0, 0, 0)" class="has-inline-color has-vivid-green-cyan-color">Green (previous)</mark></td><td>Occupations (FAOL)</td><td>Gamma</td><td>MSE of probabilities</td><td>Arithmetic mean of probabilities</td><td>2051</td><td>2110</td><td>2843</td></tr></tbody></table></figure>



<p>Note: Although previous authors give equal prominence to the orange (tasks, HLMI) and green (occupations, FAOL) results, I find evidence that the orange (tasks, HLMI) curve has been far more widely reported (<a href="#Aggregating_across_the_task_and_occupation_framings">More</a>).</p>
</details>



<p></p>



<p>The last two points (aggregation and distribution fitting) directly affect the numerical results. The first two are about how the headline result of the survey should be conceived of and communicated.</p>



<p>These four choices vary in both their <em>impact</em>, and in my <em>confidence</em> that they represent an improvement over previous analyses. The two tables below summarise my views on the topic.</p>



<figure class="wp-block-table"><table class="has-fixed-layout"><thead><tr><th>Choice</th><th>Impact on understanding and communication of main results</th><th>Confidence it’s an improvement</th></tr></thead><tbody><tr><td>Show range of responses (<a href="#Displaying_the_distribution_of_responses">More</a>)</td><td>High</td><td>Very high</td></tr><tr><td>Aggregate FAOL and HLMI (<a href="#Aggregating_across_the_task_and_occupation_framings">More</a>)</td><td>High</td><td>Moderate</td></tr></tbody></table></figure>



<figure class="wp-block-table"><table class="has-fixed-layout"><thead><tr><th>Choice</th><th>Numerical impact on aggregate CDF</th><th>Confidence it’s an improvement</th></tr></thead><tbody><tr><td>Median aggregation (<a href="#Aggregation">More</a>)</td><td>High</td><td>Very low</td></tr><tr><td>Flexible distributions (<a href="#Distribution_fitting">More</a>)</td><td>Minimal</td><td>High</td></tr></tbody></table></figure>



<p>Even if you disagree with these choices, you can still benefit from my work! The <a href="#Codebase">code</a> used to implement these new variations is open source. It provides user-friendly configuration objects that make it easy to run your own analysis and produce your own plots. The source data is included in version control. AI Impacts plans to use this code when analysing future iterations of ESPAI. I also welcome engagement from the wider research community.</p>



<p><strong>Suggested textual description</strong></p>



<p>If you need a textual description of the results in the plot, I would recommend:</p>



<blockquote class="wp-block-quote is-layout-flow wp-block-quote-is-layout-flow">
<p>Experts were asked when it will be feasible to automate all tasks or occupations. The median expert thinks this is 20% likely by 2048, and 80% likely by 2103. There was substantial disagreement among experts. For automation by 2048, the middle half of experts assigned it a probability between 1% and a 60% (meaning ¼ assigned it a chance lower than 1%, and ¼ gave a chance higher than 60%). For automation by 2103, the central half of experts forecasts ranged from a 25% chance to a 100% chance.<sup data-fn="514e6709-2685-453c-b148-94cd12b91b67" class="fn"><a href="#514e6709-2685-453c-b148-94cd12b91b67" id="514e6709-2685-453c-b148-94cd12b91b67-link">2</a></sup></p>
</blockquote>



<p>This description still contains big simplifications (e.g. using “the median expert thinks” even though no expert directly answered questions about 2048 or 2103). However, it communicates both:</p>



<ul class="wp-block-list">
<li>The uncertainty represented by the aggregated CDF (using the 60% belief interval from 20% to 80%)</li>



<li>The range of disagreement among experts (using the central 50% of responses)</li>
</ul>



<p>In some cases, this may be too much information. I recommend if at all possible that the results should not be reduced to the single number of the year by which experts expect a 50% chance of advanced AI. Instead, emphasise that we have a probability distribution over years by giving two points on the distribution. So if a very concise summary is required, you could use:</p>



<blockquote class="wp-block-quote is-layout-flow wp-block-quote-is-layout-flow">
<p>Surveyed experts think it’s unlikely (20%) it will become feasible to automate all tasks or occupations by 2048, but it probably will (80%) by 2103.</p>
</blockquote>



<p>If even greater simplicity is required, I would urge something like the following, over just using the median year:</p>



<blockquote class="wp-block-quote is-layout-flow wp-block-quote-is-layout-flow">
<p>AI experts think full automation is most likely to become feasible between 2048 and 2103.</p>
</blockquote>



<h1 class="wp-block-heading" id="the-distribution-of-raw-responses">The distribution of raw responses</h1>



<p>Even readers who are familiar with ESPAI may only have seen the results after processing. It can be helpful to look at the raw data, i.e. respondent’s answers to questions before any processing, to remind ourselves how the survey was conducted.</p>



<p>All questions about how soon a milestone would be reached were framed in two ways: fixed-years and fixed-probabilities. Half of respondents were asked to estimate the probability that a milestone would be reached by a given year (“fixed-years framing”), while the other half were asked to estimate the year by which the milestone would be feasible with a given probability (“fixed-probabilities framing”).</p>



<h2 class="wp-block-heading" id="example-retail-salesperson-occupation">Example: Retail Salesperson occupation</h2>



<p>Responses about one such milestone (say, the occupation of retail salesperson in the example below), if shown as a scatterplot, form three horizontal lines for fixed probabilities, and three vertical lines for fixed years. These correspond to the six questions being asked about the milestone:</p>



<figure class="wp-block-image size-large is-resized"><a href="https://aiimpacts.org/wp-content/uploads/2024/12/raw_scatter_retail_salesperson.png"><img loading="lazy" decoding="async" width="1024" height="768" src="https://aiimpacts.org/wp-content/uploads/2024/12/raw_scatter_retail_salesperson-1024x768.png" alt="" class="wp-image-3668" style="width:546px;height:auto" srcset="http://aiimpacts.org/wp-content/uploads/2024/12/raw_scatter_retail_salesperson-1024x768.png 1024w, http://aiimpacts.org/wp-content/uploads/2024/12/raw_scatter_retail_salesperson-300x225.png 300w, http://aiimpacts.org/wp-content/uploads/2024/12/raw_scatter_retail_salesperson-768x576.png 768w, http://aiimpacts.org/wp-content/uploads/2024/12/raw_scatter_retail_salesperson-1536x1152.png 1536w, http://aiimpacts.org/wp-content/uploads/2024/12/raw_scatter_retail_salesperson-678x509.png 678w, http://aiimpacts.org/wp-content/uploads/2024/12/raw_scatter_retail_salesperson-326x245.png 326w, http://aiimpacts.org/wp-content/uploads/2024/12/raw_scatter_retail_salesperson-80x60.png 80w, http://aiimpacts.org/wp-content/uploads/2024/12/raw_scatter_retail_salesperson.png 1920w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /></a></figure>



<p>The scatterplot is a helpful reminder of the data’s shape in its rawest form. However, all scatterplots will form three horizontal and three vertical lines. We can show more structured information about the distribution of responses for each of the six questions by using six box and whisker plots<sup data-fn="648cbfff-fbda-4131-a809-9dfea725a197" class="fn"><a href="#648cbfff-fbda-4131-a809-9dfea725a197" id="648cbfff-fbda-4131-a809-9dfea725a197-link">3</a></sup>, as shown below:</p>



<figure class="wp-block-image size-large is-resized"><a href="https://aiimpacts.org/wp-content/uploads/2024/12/raw_boxplot_retail_salesperson.png"><img loading="lazy" decoding="async" width="1024" height="768" src="https://aiimpacts.org/wp-content/uploads/2024/12/raw_boxplot_retail_salesperson-1024x768.png" alt="" class="wp-image-3669" style="width:603px;height:auto" srcset="http://aiimpacts.org/wp-content/uploads/2024/12/raw_boxplot_retail_salesperson-1024x768.png 1024w, http://aiimpacts.org/wp-content/uploads/2024/12/raw_boxplot_retail_salesperson-300x225.png 300w, http://aiimpacts.org/wp-content/uploads/2024/12/raw_boxplot_retail_salesperson-768x576.png 768w, http://aiimpacts.org/wp-content/uploads/2024/12/raw_boxplot_retail_salesperson-1536x1152.png 1536w, http://aiimpacts.org/wp-content/uploads/2024/12/raw_boxplot_retail_salesperson-678x509.png 678w, http://aiimpacts.org/wp-content/uploads/2024/12/raw_boxplot_retail_salesperson-326x245.png 326w, http://aiimpacts.org/wp-content/uploads/2024/12/raw_boxplot_retail_salesperson-80x60.png 80w, http://aiimpacts.org/wp-content/uploads/2024/12/raw_boxplot_retail_salesperson.png 1920w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /></a></figure>



<p>We can see several useful things in this set of box plots:</p>



<ul class="wp-block-list">
<li><strong>There is a large framing effect, whereby the fixed-years framing produces later predictions</strong>. (This effect is familiar from previous analyses of ESPAI, where it has been shown to occur systematically).
<ul class="wp-block-list">
<li>For example, the prediction (2043, 50%) is the <em>85th percentile</em> of responses for the 50% question in the fixed-probabilities framing, while the same prediction is the <em>median</em> response for the 2043 question in the fixed-years framing.</li>



<li>When asked about 2073 in the fixed-years framing, the median response was 90%, which is much later than even the 85th percentile response to the 90% question int the fixed-probabilities framing.</li>
</ul>
</li>



<li><strong>Responses follow a skewed distribution</strong>
<ul class="wp-block-list">
<li>For all three questions in the fixed-probabilities framing, the responses have a large right skew</li>



<li>In the fixed-years framing, the 2033 question produces a right skew (up skew in the boxplot), whereas the 2073 question produces a left-skew (down skew in the boxplot), with more than 25% of respondents giving a probability of 100%.</li>
</ul>
</li>



<li><strong>There is a wide range of responses, indicating substantial disagreement among respondents</strong>. For example, when asked about 2043, the interval (centred on the median) that contains half of responses ranged from a 30% chance to a 90% chance. The interval that contains 70% of responses ranged from a 10% chance to a 98% chance.</li>
</ul>



<p>We can now look at the distribution of raw responses for the timing of human-level performance.</p>



<h2 class="wp-block-heading" id="timing-of-human-level-performance">Timing of human-level performance</h2>



<p>When the survey investigated the timing of human-level performance, the question was framed in two ways, as tasks, and as occupations<sup data-fn="a8f1ee3f-3a3b-498f-ac7c-f25b9ea3dddd" class="fn"><a href="#a8f1ee3f-3a3b-498f-ac7c-f25b9ea3dddd" id="a8f1ee3f-3a3b-498f-ac7c-f25b9ea3dddd-link">4</a></sup>:</p>



<ul class="wp-block-list">
<li>“High-Level Machine Intelligence” (HLMI): when unaided machines can accomplish every <strong>task</strong> better and more cheaply than human workers.</li>



<li>“Full Automation of Labor” (FAOL): when for any <strong>occupation</strong>, machines could be built to carry it out better and more cheaply than human workers.</li>
</ul>



<p>We can now take each of these in turn (expand the collapsible sections below).</p>



<details class="wp-block-details is-layout-flow wp-block-details-is-layout-flow"><summary>Full Automation of Labor (FAOL)</summary>
<figure class="wp-block-image size-large is-resized"><a href="https://aiimpacts.org/wp-content/uploads/2024/12/raw_boxplot_FAOL.png"><img loading="lazy" decoding="async" width="1024" height="768" src="https://aiimpacts.org/wp-content/uploads/2024/12/raw_boxplot_FAOL-1024x768.png" alt="" class="wp-image-3670" style="width:598px;height:auto" srcset="http://aiimpacts.org/wp-content/uploads/2024/12/raw_boxplot_FAOL-1024x768.png 1024w, http://aiimpacts.org/wp-content/uploads/2024/12/raw_boxplot_FAOL-300x225.png 300w, http://aiimpacts.org/wp-content/uploads/2024/12/raw_boxplot_FAOL-768x576.png 768w, http://aiimpacts.org/wp-content/uploads/2024/12/raw_boxplot_FAOL-1536x1152.png 1536w, http://aiimpacts.org/wp-content/uploads/2024/12/raw_boxplot_FAOL-678x509.png 678w, http://aiimpacts.org/wp-content/uploads/2024/12/raw_boxplot_FAOL-326x245.png 326w, http://aiimpacts.org/wp-content/uploads/2024/12/raw_boxplot_FAOL-80x60.png 80w, http://aiimpacts.org/wp-content/uploads/2024/12/raw_boxplot_FAOL.png 1920w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /></a></figure>



<p>In the fixed probabilities framing, respondents were asked for the number of years until a 10%, 50%, and 90% probability of FAOL:<br></p>



<figure class="wp-block-table"><table class="has-fixed-layout"><tbody><tr><td><strong>Probability of FAOL</strong></td><td><strong>Mean response</strong></td><td><strong>15th percentile response</strong></td><td><strong>Median response</strong></td><td><strong>85th percentile response</strong></td></tr><tr><td>10%</td><td>5.08e+05</td><td>10</td><td>40</td><td>100</td></tr><tr><td>50%</td><td>7.84e+05</td><td>20</td><td>70</td><td>200</td></tr><tr><td>90%</td><td>1.01e+06</td><td>35</td><td>100</td><td>500</td></tr></tbody></table></figure>



<p>In the fixed years framing, respondents were asked for the probability of FAOL within 10, 20, and 50 years:</p>



<figure class="wp-block-table"><table class="has-fixed-layout"><tbody><tr><td><strong><b>Y</b>ears until FAOL</strong></td><td><strong>Mean response</strong></td><td><strong>15th percentile response</strong></td><td><strong>Median response</strong></td><td><strong>85th percentile response</strong></td></tr><tr><td>10</td><td>6.02%</td><td>0.00%</td><td>0.00%</td><td>10.00%</td></tr><tr><td>20</td><td>12.30%</td><td>0.00%</td><td>2.00%</td><td>30.00%</td></tr><tr><td>50</td><td>24.66%</td><td>0.00%</td><td>10.00%</td><td>60.00%</td></tr></tbody></table></figure>
</details>



<p></p>



<details class="wp-block-details is-layout-flow wp-block-details-is-layout-flow"><summary>High-Level Machine Intelligence (HLMI)</summary>
<figure class="wp-block-image size-large is-resized"><a href="https://aiimpacts.org/wp-content/uploads/2024/12/raw_boxplot_HLMI.png"><img loading="lazy" decoding="async" width="1024" height="768" src="https://aiimpacts.org/wp-content/uploads/2024/12/raw_boxplot_HLMI-1024x768.png" alt="" class="wp-image-3672" style="width:573px;height:auto" srcset="http://aiimpacts.org/wp-content/uploads/2024/12/raw_boxplot_HLMI-1024x768.png 1024w, http://aiimpacts.org/wp-content/uploads/2024/12/raw_boxplot_HLMI-300x225.png 300w, http://aiimpacts.org/wp-content/uploads/2024/12/raw_boxplot_HLMI-768x576.png 768w, http://aiimpacts.org/wp-content/uploads/2024/12/raw_boxplot_HLMI-1536x1152.png 1536w, http://aiimpacts.org/wp-content/uploads/2024/12/raw_boxplot_HLMI-678x509.png 678w, http://aiimpacts.org/wp-content/uploads/2024/12/raw_boxplot_HLMI-326x245.png 326w, http://aiimpacts.org/wp-content/uploads/2024/12/raw_boxplot_HLMI-80x60.png 80w, http://aiimpacts.org/wp-content/uploads/2024/12/raw_boxplot_HLMI.png 1920w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /></a></figure>



<p>In the fixed probabilities framing, respondents were asked for the number of years until a 10%, 50%, and 90% probability of HLMI:</p>



<figure class="wp-block-table"><table class="has-fixed-layout"><tbody><tr><td><strong>Probability of HLMI</strong></td><td><strong>Mean response</strong></td><td><strong>15th percentile response</strong></td><td><strong>Median response</strong></td><td><strong>85th percentile response</strong></td></tr><tr><td>10%</td><td>41.2</td><td>2</td><td>5</td><td>20</td></tr><tr><td>50%</td><td>1310</td><td>7</td><td>20</td><td>50</td></tr><tr><td>90%</td><td>4.57e+05</td><td>15</td><td>50</td><td>100</td></tr></tbody></table></figure>



<p>In the fixed years framing, respondents were asked for the probability of HLMI within 10, 20, and 40 years:</p>



<figure class="wp-block-table"><table class="has-fixed-layout"><tbody><tr><td><strong>Years until HLMI</strong></td><td><strong>Mean response</strong></td><td><strong>15th percentile response</strong></td><td><strong>Median response</strong></td><td><strong>85th percentile response</strong></td></tr><tr><td>10</td><td>18.3%</td><td>0%</td><td>10%</td><td>50%</td></tr><tr><td>20</td><td>34.7%</td><td>4%</td><td>30%</td><td>75%</td></tr><tr><td>40</td><td>54.6%</td><td>10%</td><td>50%</td><td>95%</td></tr></tbody></table></figure>



<p></p>
</details>



<p></p>



<h1 class="wp-block-heading" id="aggregation">Aggregation</h1>



<h2 class="wp-block-heading" id="possible-methods">Possible methods</h2>



<p>All previous analyses produced the aggregate distribution by taking the average of CDF values, that is, by taking the mean of probability values at each year.</p>



<p>There are many other possible aggregation methods. We can put these into two categories:</p>



<ul class="wp-block-list">
<li>vertical methods like the above aggregate probability values at each year</li>



<li>horizontal methods aggregate year values at each probability</li>
</ul>



<p>This figure illustrates both methods on a very simple example with two CDFs to aggregate.</p>



<figure class="wp-block-image size-large"><a href="https://aiimpacts.org/wp-content/uploads/2024/12/illustrate_agg_methods.png"><img loading="lazy" decoding="async" width="1024" height="410" src="https://aiimpacts.org/wp-content/uploads/2024/12/illustrate_agg_methods-1024x410.png" alt="" class="wp-image-3674" srcset="http://aiimpacts.org/wp-content/uploads/2024/12/illustrate_agg_methods-1024x410.png 1024w, http://aiimpacts.org/wp-content/uploads/2024/12/illustrate_agg_methods-300x120.png 300w, http://aiimpacts.org/wp-content/uploads/2024/12/illustrate_agg_methods-768x307.png 768w, http://aiimpacts.org/wp-content/uploads/2024/12/illustrate_agg_methods-1536x614.png 1536w, http://aiimpacts.org/wp-content/uploads/2024/12/illustrate_agg_methods-2048x819.png 2048w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /></a></figure>



<p>For both vertical and horizontal aggregation, we need not take the mean of values. In principle any aggregation function could be used, of which the mean and median are only the two most obvious examples.</p>



<p>When it comes to aggregating probabilities (vertical aggregation), there are additional complications. The topic has been well studied, and many aggregation methods have been proposed.</p>



<p>A full assessment of the topic would take us far beyond the scope of this report, so I will only briefly mention one prominent recommendation: taking the geometric mean of odds. The core observation is that the arithmetic mean of probabilities ignores information from extreme predictions. This can be seen with a simple example. In scenario A, we aggregate the two predictions (1%, 10%), whereas in scenario B the two predictions are (0.1%, 10%). The arithmetic mean of probabilities is close to 5% in both cases (5.5% for A and 5.05% for B). It gives very little weight to the difference between 1% and 0.1%, which is after all a factor of 10. The geometric mean of odds reacts much more strongly to this much more extreme prediction, being about 3.2% in scenario A, and 1.0% in scenario B.</p>



<p>This behaviour of the geometric mean of odds is theoretically appealing, but it is only advisable if extreme predictions are really to be taken at face value. We might worry that these predictions are much more overconfident.</p>



<p>As a further complication, in the case of ESPAI, in practise we cannot apply the geometric mean of odds. This is because for nearly every year (every vertical line) we might consider, many of the respondents’ fitted CDFs take values indistinguishable from 0 or 1. This causes the geometric mean of odds to immediately become 0 or 1.<sup data-fn="40b92e73-777b-4d16-ac7f-66174261ef2a" class="fn"><a href="#40b92e73-777b-4d16-ac7f-66174261ef2a" id="40b92e73-777b-4d16-ac7f-66174261ef2a-link">5</a></sup></p>



<figure class="wp-block-image size-large is-resized"><a href="https://aiimpacts.org/wp-content/uploads/2024/12/geomean_odds_bad.png"><img loading="lazy" decoding="async" width="1024" height="768" src="https://aiimpacts.org/wp-content/uploads/2024/12/geomean_odds_bad-1024x768.png" alt="" class="wp-image-3677" style="width:654px;height:auto" srcset="http://aiimpacts.org/wp-content/uploads/2024/12/geomean_odds_bad-1024x768.png 1024w, http://aiimpacts.org/wp-content/uploads/2024/12/geomean_odds_bad-300x225.png 300w, http://aiimpacts.org/wp-content/uploads/2024/12/geomean_odds_bad-768x576.png 768w, http://aiimpacts.org/wp-content/uploads/2024/12/geomean_odds_bad-1536x1152.png 1536w, http://aiimpacts.org/wp-content/uploads/2024/12/geomean_odds_bad-678x509.png 678w, http://aiimpacts.org/wp-content/uploads/2024/12/geomean_odds_bad-326x245.png 326w, http://aiimpacts.org/wp-content/uploads/2024/12/geomean_odds_bad-80x60.png 80w, http://aiimpacts.org/wp-content/uploads/2024/12/geomean_odds_bad.png 1920w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /></a></figure>



<p>Aggregating years is also problematic. Because the input is bounded on the left but not the right, the arithmetic mean of responses is inevitably dominated by extremely large values. This method would produce a CDF where any probability of the event is essentially infinitely many years away. We might hope to address this problem by using the geometric mean of years, but this in turn suffers from numerical and conceptual issues similar to those of the geometric mean of odds. Ultimately, taking the median of years is the only method of aggregating years I was able to apply.</p>



<p>The median of years and median of probabilities give the same answer. This makes intuitive sense since CDFs are strictly increasing.<sup data-fn="36fdf480-d735-4785-9483-7a68aff2be0b" class="fn"><a href="#36fdf480-d735-4785-9483-7a68aff2be0b" id="36fdf480-d735-4785-9483-7a68aff2be0b-link">6</a></sup> So I simply call this the “median”.</p>



<h2 class="wp-block-heading" id="mean-vs-median-aggregation">Mean vs Median aggregation</h2>



<p>As a result of these difficulties, I will present only the following aggregation methods:</p>



<ul class="wp-block-list">
<li>(Arithmetic) mean of probabilities</li>



<li>Median of probabilities</li>
</ul>



<p>These plots use the Gamma distribution with the mean square error (MSE) of probabilities as the loss function, so the mean aggregation line corresponds to the results of previous analyses.</p>



<details class="wp-block-details is-layout-flow wp-block-details-is-layout-flow"><summary>Full Automation of Labor (FAOL)</summary>
<figure class="wp-block-image size-large is-resized"><a href="https://aiimpacts.org/wp-content/uploads/2024/12/compare_agg_methods_faol.png"><img loading="lazy" decoding="async" width="1024" height="768" src="https://aiimpacts.org/wp-content/uploads/2024/12/compare_agg_methods_faol-1024x768.png" alt="" class="wp-image-3678" style="width:588px;height:auto" srcset="http://aiimpacts.org/wp-content/uploads/2024/12/compare_agg_methods_faol-1024x768.png 1024w, http://aiimpacts.org/wp-content/uploads/2024/12/compare_agg_methods_faol-300x225.png 300w, http://aiimpacts.org/wp-content/uploads/2024/12/compare_agg_methods_faol-768x576.png 768w, http://aiimpacts.org/wp-content/uploads/2024/12/compare_agg_methods_faol-1536x1152.png 1536w, http://aiimpacts.org/wp-content/uploads/2024/12/compare_agg_methods_faol-678x509.png 678w, http://aiimpacts.org/wp-content/uploads/2024/12/compare_agg_methods_faol-326x245.png 326w, http://aiimpacts.org/wp-content/uploads/2024/12/compare_agg_methods_faol-80x60.png 80w, http://aiimpacts.org/wp-content/uploads/2024/12/compare_agg_methods_faol.png 1920w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /></a></figure>



<figure class="wp-block-image size-large is-resized"><a href="https://aiimpacts.org/wp-content/uploads/2024/12/compare_agg_methods_faol_2800.png"><img loading="lazy" decoding="async" width="1024" height="768" src="https://aiimpacts.org/wp-content/uploads/2024/12/compare_agg_methods_faol_2800-1024x768.png" alt="" class="wp-image-3679" style="width:591px;height:auto" srcset="http://aiimpacts.org/wp-content/uploads/2024/12/compare_agg_methods_faol_2800-1024x768.png 1024w, http://aiimpacts.org/wp-content/uploads/2024/12/compare_agg_methods_faol_2800-300x225.png 300w, http://aiimpacts.org/wp-content/uploads/2024/12/compare_agg_methods_faol_2800-768x576.png 768w, http://aiimpacts.org/wp-content/uploads/2024/12/compare_agg_methods_faol_2800-1536x1152.png 1536w, http://aiimpacts.org/wp-content/uploads/2024/12/compare_agg_methods_faol_2800-678x509.png 678w, http://aiimpacts.org/wp-content/uploads/2024/12/compare_agg_methods_faol_2800-326x245.png 326w, http://aiimpacts.org/wp-content/uploads/2024/12/compare_agg_methods_faol_2800-80x60.png 80w, http://aiimpacts.org/wp-content/uploads/2024/12/compare_agg_methods_faol_2800.png 1920w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /></a></figure>
</details>



<details class="wp-block-details is-layout-flow wp-block-details-is-layout-flow"><summary>High-Level Machine Intelligence (HLMI)</summary>
<figure class="wp-block-image size-large is-resized"><a href="https://aiimpacts.org/wp-content/uploads/2024/12/compare_agg_methods_hlmi-1.png"><img loading="lazy" decoding="async" width="1024" height="768" src="https://aiimpacts.org/wp-content/uploads/2024/12/compare_agg_methods_hlmi-1-1024x768.png" alt="" class="wp-image-3681" style="width:521px;height:auto" srcset="http://aiimpacts.org/wp-content/uploads/2024/12/compare_agg_methods_hlmi-1-1024x768.png 1024w, http://aiimpacts.org/wp-content/uploads/2024/12/compare_agg_methods_hlmi-1-300x225.png 300w, http://aiimpacts.org/wp-content/uploads/2024/12/compare_agg_methods_hlmi-1-768x576.png 768w, http://aiimpacts.org/wp-content/uploads/2024/12/compare_agg_methods_hlmi-1-1536x1152.png 1536w, http://aiimpacts.org/wp-content/uploads/2024/12/compare_agg_methods_hlmi-1-678x509.png 678w, http://aiimpacts.org/wp-content/uploads/2024/12/compare_agg_methods_hlmi-1-326x245.png 326w, http://aiimpacts.org/wp-content/uploads/2024/12/compare_agg_methods_hlmi-1-80x60.png 80w, http://aiimpacts.org/wp-content/uploads/2024/12/compare_agg_methods_hlmi-1.png 1920w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /></a></figure>
</details>



<details class="wp-block-details is-layout-flow wp-block-details-is-layout-flow"><summary>Truck Driver</summary>
<figure class="wp-block-image size-large is-resized"><a href="https://aiimpacts.org/wp-content/uploads/2024/12/compare_agg_methods_TRUCK_DRIVER-1.png"><img loading="lazy" decoding="async" width="1024" height="768" src="https://aiimpacts.org/wp-content/uploads/2024/12/compare_agg_methods_TRUCK_DRIVER-1-1024x768.png" alt="" class="wp-image-3683" style="width:556px;height:auto" srcset="http://aiimpacts.org/wp-content/uploads/2024/12/compare_agg_methods_TRUCK_DRIVER-1-1024x768.png 1024w, http://aiimpacts.org/wp-content/uploads/2024/12/compare_agg_methods_TRUCK_DRIVER-1-300x225.png 300w, http://aiimpacts.org/wp-content/uploads/2024/12/compare_agg_methods_TRUCK_DRIVER-1-768x576.png 768w, http://aiimpacts.org/wp-content/uploads/2024/12/compare_agg_methods_TRUCK_DRIVER-1-1536x1152.png 1536w, http://aiimpacts.org/wp-content/uploads/2024/12/compare_agg_methods_TRUCK_DRIVER-1-678x509.png 678w, http://aiimpacts.org/wp-content/uploads/2024/12/compare_agg_methods_TRUCK_DRIVER-1-326x245.png 326w, http://aiimpacts.org/wp-content/uploads/2024/12/compare_agg_methods_TRUCK_DRIVER-1-80x60.png 80w, http://aiimpacts.org/wp-content/uploads/2024/12/compare_agg_methods_TRUCK_DRIVER-1.png 1920w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /></a></figure>
</details>



<details class="wp-block-details is-layout-flow wp-block-details-is-layout-flow"><summary>Surgeon</summary>
<figure class="wp-block-image size-large is-resized"><a href="https://aiimpacts.org/wp-content/uploads/2024/12/compare_agg_methods_SURGEON.png"><img loading="lazy" decoding="async" width="1024" height="768" src="https://aiimpacts.org/wp-content/uploads/2024/12/compare_agg_methods_SURGEON-1024x768.png" alt="" class="wp-image-3684" style="width:577px;height:auto" srcset="http://aiimpacts.org/wp-content/uploads/2024/12/compare_agg_methods_SURGEON-1024x768.png 1024w, http://aiimpacts.org/wp-content/uploads/2024/12/compare_agg_methods_SURGEON-300x225.png 300w, http://aiimpacts.org/wp-content/uploads/2024/12/compare_agg_methods_SURGEON-768x576.png 768w, http://aiimpacts.org/wp-content/uploads/2024/12/compare_agg_methods_SURGEON-1536x1152.png 1536w, http://aiimpacts.org/wp-content/uploads/2024/12/compare_agg_methods_SURGEON-678x509.png 678w, http://aiimpacts.org/wp-content/uploads/2024/12/compare_agg_methods_SURGEON-326x245.png 326w, http://aiimpacts.org/wp-content/uploads/2024/12/compare_agg_methods_SURGEON-80x60.png 80w, http://aiimpacts.org/wp-content/uploads/2024/12/compare_agg_methods_SURGEON.png 1920w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /></a></figure>
</details>



<details class="wp-block-details is-layout-flow wp-block-details-is-layout-flow"><summary>Retail Salesperson</summary>
<figure class="wp-block-image size-large is-resized"><a href="https://aiimpacts.org/wp-content/uploads/2024/12/compare_agg_methods_RETAIL_SALESPERSON.png"><img loading="lazy" decoding="async" width="1024" height="768" src="https://aiimpacts.org/wp-content/uploads/2024/12/compare_agg_methods_RETAIL_SALESPERSON-1024x768.png" alt="" class="wp-image-3685" style="width:562px;height:auto" srcset="http://aiimpacts.org/wp-content/uploads/2024/12/compare_agg_methods_RETAIL_SALESPERSON-1024x768.png 1024w, http://aiimpacts.org/wp-content/uploads/2024/12/compare_agg_methods_RETAIL_SALESPERSON-300x225.png 300w, http://aiimpacts.org/wp-content/uploads/2024/12/compare_agg_methods_RETAIL_SALESPERSON-768x576.png 768w, http://aiimpacts.org/wp-content/uploads/2024/12/compare_agg_methods_RETAIL_SALESPERSON-1536x1152.png 1536w, http://aiimpacts.org/wp-content/uploads/2024/12/compare_agg_methods_RETAIL_SALESPERSON-678x509.png 678w, http://aiimpacts.org/wp-content/uploads/2024/12/compare_agg_methods_RETAIL_SALESPERSON-326x245.png 326w, http://aiimpacts.org/wp-content/uploads/2024/12/compare_agg_methods_RETAIL_SALESPERSON-80x60.png 80w, http://aiimpacts.org/wp-content/uploads/2024/12/compare_agg_methods_RETAIL_SALESPERSON.png 1920w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /></a></figure>
</details>



<details class="wp-block-details is-layout-flow wp-block-details-is-layout-flow"><summary>AI Researcher</summary>
<figure class="wp-block-image size-large is-resized"><a href="https://aiimpacts.org/wp-content/uploads/2024/12/compare_agg_methods_AI_RESEARCHER.png"><img loading="lazy" decoding="async" width="1024" height="768" src="https://aiimpacts.org/wp-content/uploads/2024/12/compare_agg_methods_AI_RESEARCHER-1024x768.png" alt="" class="wp-image-3686" style="width:551px;height:auto" srcset="http://aiimpacts.org/wp-content/uploads/2024/12/compare_agg_methods_AI_RESEARCHER-1024x768.png 1024w, http://aiimpacts.org/wp-content/uploads/2024/12/compare_agg_methods_AI_RESEARCHER-300x225.png 300w, http://aiimpacts.org/wp-content/uploads/2024/12/compare_agg_methods_AI_RESEARCHER-768x576.png 768w, http://aiimpacts.org/wp-content/uploads/2024/12/compare_agg_methods_AI_RESEARCHER-1536x1152.png 1536w, http://aiimpacts.org/wp-content/uploads/2024/12/compare_agg_methods_AI_RESEARCHER-678x509.png 678w, http://aiimpacts.org/wp-content/uploads/2024/12/compare_agg_methods_AI_RESEARCHER-326x245.png 326w, http://aiimpacts.org/wp-content/uploads/2024/12/compare_agg_methods_AI_RESEARCHER-80x60.png 80w, http://aiimpacts.org/wp-content/uploads/2024/12/compare_agg_methods_AI_RESEARCHER.png 1920w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /></a></figure>
</details>



<p>We see a notable pattern in each of these cases. As we go from left to right, the median always starts below the mean (i.e. the median initially gives later predictions), but eventually overtakes the mean (i.e. the median eventually gives earlier predictions). Median aggregation also always gives rise to a more confident probability distribution: one whose probability mass is more concentrated.</p>



<h2 class="wp-block-heading" id="why-the-mean-and-median-differ">Why the mean and median differ</h2>



<p>When the mean is very different from the median, this means that the distribution of responses is highly skewed. We can illustrate this by displaying a histogram of CDF values for a given year.</p>



<figure class="wp-block-image size-large is-resized"><a href="https://aiimpacts.org/wp-content/uploads/2024/12/mean_vs_median_cdf_slice_histogram_FAOL.png"><img loading="lazy" decoding="async" width="1024" height="683" src="https://aiimpacts.org/wp-content/uploads/2024/12/mean_vs_median_cdf_slice_histogram_FAOL-1024x683.png" alt="" class="wp-image-3687" style="width:575px;height:auto" srcset="http://aiimpacts.org/wp-content/uploads/2024/12/mean_vs_median_cdf_slice_histogram_FAOL-1024x683.png 1024w, http://aiimpacts.org/wp-content/uploads/2024/12/mean_vs_median_cdf_slice_histogram_FAOL-300x200.png 300w, http://aiimpacts.org/wp-content/uploads/2024/12/mean_vs_median_cdf_slice_histogram_FAOL-768x512.png 768w, http://aiimpacts.org/wp-content/uploads/2024/12/mean_vs_median_cdf_slice_histogram_FAOL-1536x1024.png 1536w, http://aiimpacts.org/wp-content/uploads/2024/12/mean_vs_median_cdf_slice_histogram_FAOL-2048x1365.png 2048w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /></a></figure>



<p>The results for automation of all occupations (FAOL) are quite interesting. For the years 2040 and 2060, the results are extremely skewed. A large majority assigns very low probabilities, but there is a right tail of high probabilities, which causes the mean to greatly exceed the median. For the years 2080 and 2100, a bimodal distribution emerges. We have a big cluster with probabilities near zero and a big cluster with probabilities near 1. Opinion is extremely polarised. By 2200 the median exceeds the mean. When we reach 2500, a majority think FAOL is near-certain, but a significant left tail causes the mean to lag far behind the median.</p>



<figure class="wp-block-image size-large is-resized"><a href="https://aiimpacts.org/wp-content/uploads/2024/12/mean_vs_median_cdf_slice_histogram_HLMI.png"><img loading="lazy" decoding="async" width="1024" height="683" src="https://aiimpacts.org/wp-content/uploads/2024/12/mean_vs_median_cdf_slice_histogram_HLMI-1024x683.png" alt="" class="wp-image-3688" style="width:569px;height:auto" srcset="http://aiimpacts.org/wp-content/uploads/2024/12/mean_vs_median_cdf_slice_histogram_HLMI-1024x683.png 1024w, http://aiimpacts.org/wp-content/uploads/2024/12/mean_vs_median_cdf_slice_histogram_HLMI-300x200.png 300w, http://aiimpacts.org/wp-content/uploads/2024/12/mean_vs_median_cdf_slice_histogram_HLMI-768x512.png 768w, http://aiimpacts.org/wp-content/uploads/2024/12/mean_vs_median_cdf_slice_histogram_HLMI-1536x1024.png 1536w, http://aiimpacts.org/wp-content/uploads/2024/12/mean_vs_median_cdf_slice_histogram_HLMI-2048x1365.png 2048w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /></a></figure>



<p>With task automation (HLMI), we see the same basic pattern (repeated everywhere): the median at first trails behind the mean, and for later years and higher probabilities, the median overtakes the mean. However, skewness is less extreme than for FAOL, and we do not see a strongly bimodal histogram (extreme polarisation) at any point.</p>



<h2 class="wp-block-heading" id="aside-the-winsorized-geometric-mean-of-odds">Aside: the winsorized geometric mean of odds</h2>



<p>There is one way to use the geometric mean of odds that avoids the problem of zeroes and ones. This is to winsorize the data: to replace the most extreme values with less extreme values. For example, we could replace all values less than 0.1 with 0.1, and all values greater than 0.9 with 0.9.</p>



<p>Of course, this introduces a highly subjective choice that massively affects the results. We could replace all values less than 0.01 with 0.01, or all values less than 0.001 with 0.001. Therefore, I do not consider this technique suitable for the creation of headline result.</p>



<p>However, it lets us do some potentially interesting explorations. Winsorising essentially means we do not trust the most extreme predictions. We can now explore what the geometric mean of odds would look like under various degrees of winsorisation. e.g. what does it look like if we ignore all predictions more extreme than 1:100? What about 1:1000?</p>



<figure class="wp-block-image size-large is-resized"><a href="https://aiimpacts.org/wp-content/uploads/2024/12/winsorized_geomean_FAOL.png"><img loading="lazy" decoding="async" width="1024" height="768" src="https://aiimpacts.org/wp-content/uploads/2024/12/winsorized_geomean_FAOL-1024x768.png" alt="" class="wp-image-3689" style="width:579px;height:auto" srcset="http://aiimpacts.org/wp-content/uploads/2024/12/winsorized_geomean_FAOL-1024x768.png 1024w, http://aiimpacts.org/wp-content/uploads/2024/12/winsorized_geomean_FAOL-300x225.png 300w, http://aiimpacts.org/wp-content/uploads/2024/12/winsorized_geomean_FAOL-768x576.png 768w, http://aiimpacts.org/wp-content/uploads/2024/12/winsorized_geomean_FAOL-1536x1152.png 1536w, http://aiimpacts.org/wp-content/uploads/2024/12/winsorized_geomean_FAOL-678x509.png 678w, http://aiimpacts.org/wp-content/uploads/2024/12/winsorized_geomean_FAOL-326x245.png 326w, http://aiimpacts.org/wp-content/uploads/2024/12/winsorized_geomean_FAOL-80x60.png 80w, http://aiimpacts.org/wp-content/uploads/2024/12/winsorized_geomean_FAOL.png 1920w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /></a></figure>



<figure class="wp-block-image size-large is-resized"><a href="https://aiimpacts.org/wp-content/uploads/2024/12/winsorized_geomean_HLMI.png"><img loading="lazy" decoding="async" width="1024" height="768" src="https://aiimpacts.org/wp-content/uploads/2024/12/winsorized_geomean_HLMI-1024x768.png" alt="" class="wp-image-3690" style="width:589px;height:auto" srcset="http://aiimpacts.org/wp-content/uploads/2024/12/winsorized_geomean_HLMI-1024x768.png 1024w, http://aiimpacts.org/wp-content/uploads/2024/12/winsorized_geomean_HLMI-300x225.png 300w, http://aiimpacts.org/wp-content/uploads/2024/12/winsorized_geomean_HLMI-768x576.png 768w, http://aiimpacts.org/wp-content/uploads/2024/12/winsorized_geomean_HLMI-1536x1152.png 1536w, http://aiimpacts.org/wp-content/uploads/2024/12/winsorized_geomean_HLMI-678x509.png 678w, http://aiimpacts.org/wp-content/uploads/2024/12/winsorized_geomean_HLMI-326x245.png 326w, http://aiimpacts.org/wp-content/uploads/2024/12/winsorized_geomean_HLMI-80x60.png 80w, http://aiimpacts.org/wp-content/uploads/2024/12/winsorized_geomean_HLMI.png 1920w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /></a></figure>



<p>Very informally, we can see that for HLMI (tasks) and FAOL (occupations), the arithmetic mean of probabilities roughly corresponds to the geometric mean of odds with a winsorisation level of about 1:10. We’ve already discussed the well-known effect that the arithmetic mean of probabilities ignores more extreme predictions, compared to the geometric mean of odds. For this particular data, we can roughly quantify this effect, and see that it is equivalent to ignoring all predictions &lt;10% and &gt;90%. I find this to be quite an extreme level of winsorisation. Consider, for example, that the fixed-probabilities framing explicitly asked for predictions at the 10% and 90% levels – it would be odd to simultaneously consider these probabilities too extreme to be trusted.</p>



<h1 class="wp-block-heading" id="distribution-fitting">Distribution fitting</h1>



<p>All previous analyses of the ESPAI data fitted each respondent’s CDF data (triples of (year, probability)) to a Gamma distribution before aggregating these distributions.</p>



<h2 class="wp-block-heading" id="why-fit-a-distribution">Why fit a distribution?</h2>



<p>Creating a full continuous distribution from three CDF points necessarily imposes some assumptions that were not present in the data. And recall, the respondents just gave numbers in text fields, and never saw the distribution that was later fitted to their CDF data.</p>



<p>So to begin with, it’s worth asking: why fit a distribution at all?</p>



<p>If we are only looking at a particular framing and question, for example FAOL with fixed years, it may indeed be preferable to look directly at the raw data. This allows us to talk strictly about what respondents said, without any additional assumptions. Even in this restricted setting, however, we might want to be able to get predictions for other years or probabilities than those respondents were asked about; this requires a full CDF. A simple example where this is required is for making comparisons across different iterations of the ESPAI survey in the fixed-years setting. Each survey asks for predictions about a fixed number of years <em>from the date of the survey</em>. So the fixed-years question asks about different calendar years each time the survey is made.</p>



<p>A more fundamental problem for the raw data approach is that we wish to aggregate the results of different framings into a single estimate. We can only aggregate across the fixed-years and fixed-probability framings by aggregating full distributions. In addition, even within the fixed-years framing, we cannot aggregate the occupations (FAOL) and tasks (HLMI) framings, because different years were used (10, 20, and 40 years for HLMI and 10, 20, and 50 years for FAOL).</p>



<h2 class="wp-block-heading" id="limitations-of-previous-analyses">Limitations of previous analyses</h2>



<h3 class="wp-block-heading" id="constraints-of-gamma-distribution">Constraints of Gamma distribution</h3>



<p>While creating a full CDF from three points inevitably imposes assumptions not present in the data, we might think that, at a minimum, it would be desirable to have this CDF pass through the three points.</p>



<p>Previous analyses used a Gamma distribution. The gamma distribution is a 2-parameter distribution that is able to exactly match 2 points of CDF data, but not 3 points. The gamma distribution (and any 2-parameter distribution) necessarily loses information and distorts the expert’s beliefs even for the points where we know exactly what they believe.</p>



<p>Here are 9 examples from the fixed-probabilities framing<sup data-fn="e052f82e-e51d-4c49-be81-e33a7efda5dc" class="fn"><a href="#e052f82e-e51d-4c49-be81-e33a7efda5dc" id="e052f82e-e51d-4c49-be81-e33a7efda5dc-link">7</a></sup>. They are representative of the bottom half of fits (from the 50th to 90th percentile). Each subplot shows an example where the fitted gamma CDF (shown as a gray curve) attempts to match three points from a respondent’s data (shown as red crosses).</p>



<figure class="wp-block-image size-large is-resized"><a href="https://aiimpacts.org/wp-content/uploads/2024/12/9_prev_gamma_fits_Fixed_probabilities_annotations_False.png"><img loading="lazy" decoding="async" width="987" height="1024" src="https://aiimpacts.org/wp-content/uploads/2024/12/9_prev_gamma_fits_Fixed_probabilities_annotations_False-987x1024.png" alt="" class="wp-image-3693" style="width:690px;height:auto" srcset="http://aiimpacts.org/wp-content/uploads/2024/12/9_prev_gamma_fits_Fixed_probabilities_annotations_False-987x1024.png 987w, http://aiimpacts.org/wp-content/uploads/2024/12/9_prev_gamma_fits_Fixed_probabilities_annotations_False-289x300.png 289w, http://aiimpacts.org/wp-content/uploads/2024/12/9_prev_gamma_fits_Fixed_probabilities_annotations_False-768x796.png 768w, http://aiimpacts.org/wp-content/uploads/2024/12/9_prev_gamma_fits_Fixed_probabilities_annotations_False-1481x1536.png 1481w, http://aiimpacts.org/wp-content/uploads/2024/12/9_prev_gamma_fits_Fixed_probabilities_annotations_False-1975x2048.png 1975w" sizes="auto, (max-width: 987px) 100vw, 987px" /></a></figure>



<p>First, we can see that the gamma is not flexible enough to match the three points. While the median fit (A1) is acceptable, some fits are poor.</p>



<h3 class="wp-block-heading" id="inappropriate-loss-function">Inappropriate loss function</h3>



<p>In addition, if we look carefully we can see an interesting systematic pattern in the poor fits. We can see that when the Gamma has trouble fitting the data, it prefers to fit two points well, even at the expense of a very poor fit on the third point, rather than choosing a middle ground with an acceptable fit on all three points. This begins to be visible in row B (67th to 87th percentile), and becomes blatantly clear in row C (84th to 95th percentile). In fact, in C2 and C3, the Gamma fits two points exactly and completely ignores the third. When this happens, the worst-fit point is always the 0.9 or 0.1 point, never the 0.5 point.</p>



<p>The errors at the 0.1 and 0.9 points can completely change the nature of the prediction. This becomes clear if we go to odds space, and express the odds ratio between the data and the gamma CDF.</p>



<figure class="wp-block-image size-large is-resized"><a href="https://aiimpacts.org/wp-content/uploads/2024/12/9_prev_gamma_fits_Fixed_probabilities_annotations_True.png"><img loading="lazy" decoding="async" width="984" height="1024" src="https://aiimpacts.org/wp-content/uploads/2024/12/9_prev_gamma_fits_Fixed_probabilities_annotations_True-984x1024.png" alt="" class="wp-image-3694" style="width:673px;height:auto" srcset="http://aiimpacts.org/wp-content/uploads/2024/12/9_prev_gamma_fits_Fixed_probabilities_annotations_True-984x1024.png 984w, http://aiimpacts.org/wp-content/uploads/2024/12/9_prev_gamma_fits_Fixed_probabilities_annotations_True-288x300.png 288w, http://aiimpacts.org/wp-content/uploads/2024/12/9_prev_gamma_fits_Fixed_probabilities_annotations_True-768x799.png 768w, http://aiimpacts.org/wp-content/uploads/2024/12/9_prev_gamma_fits_Fixed_probabilities_annotations_True-1477x1536.png 1477w, http://aiimpacts.org/wp-content/uploads/2024/12/9_prev_gamma_fits_Fixed_probabilities_annotations_True-1969x2048.png 1969w" sizes="auto, (max-width: 984px) 100vw, 984px" /></a></figure>



<p>While a 2x odds ratio (e.g. in B2) is already substantial, when we move to the worst 15% of the fits, the odds ratio for the worst of the three points becomes astronomical.</p>



<p>The reason this happens is that the loss function used in previous work is not the appropriate one.</p>



<p>Previous analyses used mean squared error (MSE) of probabilities as their loss function: <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><msub><mi>L</mi><mtext>MSE</mtext></msub><mo>=</mo><msub><mo>∑</mo><mi>i</mi></msub><mo stretchy="false">(</mo><msub><mi>p</mi><mi>i</mi></msub><mo>−</mo><msub><mover accent="true"><mi>p</mi><mo>^</mo></mover><mi>i</mi></msub><msup><mo stretchy="false">)</mo><mn>2</mn></msup></mrow><annotation encoding="application/x-tex">L_{\text{MSE}} = \sum_i (p_i &#8211; \hat{p}_i)^2</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.8333em;vertical-align:-0.15em;"></span><span class="mord"><span class="mord mathnormal">L</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.3283em;"><span style="top:-2.55em;margin-left:0em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord text mtight"><span class="mord mtight">MSE</span></span></span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.15em;"></span></span></span></span></span><span class="mspace" style="margin-right:0.2778em;"></span><span class="mrel">=</span><span class="mspace" style="margin-right:0.2778em;"></span></span><span class="base"><span class="strut" style="height:1.0497em;vertical-align:-0.2997em;"></span><span class="mop"><span class="mop op-symbol small-op" style="position:relative;top:0em;">∑</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.162em;"><span style="top:-2.4003em;margin-left:0em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mathnormal mtight">i</span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.2997em;"></span></span></span></span></span><span class="mopen">(</span><span class="mord"><span class="mord mathnormal">p</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.3117em;"><span style="top:-2.55em;margin-left:0em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mathnormal mtight">i</span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.15em;"></span></span></span></span></span><span class="mspace" style="margin-right:0.2222em;"></span><span class="mbin">−</span><span class="mspace" style="margin-right:0.2222em;"></span></span><span class="base"><span class="strut" style="height:1.0641em;vertical-align:-0.25em;"></span><span class="mord"><span class="mord accent"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.6944em;"><span style="top:-3em;"><span class="pstrut" style="height:3em;"></span><span class="mord mathnormal">p</span></span><span style="top:-3em;"><span class="pstrut" style="height:3em;"></span><span class="accent-body" style="left:-0.1667em;"><span class="mord">^</span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.1944em;"></span></span></span></span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.3117em;"><span style="top:-2.55em;margin-left:0em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mathnormal mtight">i</span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.15em;"></span></span></span></span></span><span class="mclose"><span class="mclose">)</span><span class="msupsub"><span class="vlist-t"><span class="vlist-r"><span class="vlist" style="height:0.8141em;"><span style="top:-3.063em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight">2</span></span></span></span></span></span></span></span></span></span></span>, where <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><msub><mi>p</mi><mi>i</mi></msub></mrow><annotation encoding="application/x-tex">p_i</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.625em;vertical-align:-0.1944em;"></span><span class="mord"><span class="mord mathnormal">p</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.3117em;"><span style="top:-2.55em;margin-left:0em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mathnormal mtight">i</span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.15em;"></span></span></span></span></span></span></span></span> are the probabilities from the respondent’s data and <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><msub><mover accent="true"><mi>p</mi><mo>^</mo></mover><mi>i</mi></msub></mrow><annotation encoding="application/x-tex">\hat{p}_i</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.8889em;vertical-align:-0.1944em;"></span><span class="mord"><span class="mord accent"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.6944em;"><span style="top:-3em;"><span class="pstrut" style="height:3em;"></span><span class="mord mathnormal">p</span></span><span style="top:-3em;"><span class="pstrut" style="height:3em;"></span><span class="accent-body" style="left:-0.1667em;"><span class="mord">^</span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.1944em;"></span></span></span></span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.3117em;"><span style="top:-2.55em;margin-left:0em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mathnormal mtight">i</span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.15em;"></span></span></span></span></span></span></span></span> are the probabilities from the fitted CDF. This loss function treats all probability differences equally, regardless of where they occur in the distribution. For instance, it considers a deviation of 0.05 to be equally bad whether it occurs at <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>p</mi><mo>=</mo><mn>0.5</mn></mrow><annotation encoding="application/x-tex">p=0.5</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.625em;vertical-align:-0.1944em;"></span><span class="mord mathnormal">p</span><span class="mspace" style="margin-right:0.2778em;"></span><span class="mrel">=</span><span class="mspace" style="margin-right:0.2778em;"></span></span><span class="base"><span class="strut" style="height:0.6444em;"></span><span class="mord">0.5</span></span></span></span> or at <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>p</mi><mo>=</mo><mn>0.99</mn></mrow><annotation encoding="application/x-tex">p=0.99</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.625em;vertical-align:-0.1944em;"></span><span class="mord mathnormal">p</span><span class="mspace" style="margin-right:0.2778em;"></span><span class="mrel">=</span><span class="mspace" style="margin-right:0.2778em;"></span></span><span class="base"><span class="strut" style="height:0.6444em;"></span><span class="mord">0.99</span></span></span></span>.</p>



<p>This is inappropriate when fitting CDF data. Consider the case depicted in C2, where the respondent thinks the event in question is 90% likely by 150 years from the date of the survey. Meanwhile, the Gamma CDF fitted by MSE gives a probability of 99.98% at 150 years. This dramatic departure from the respondent’s beliefs is represented in the 777x odds ratio. A 777x odds ratio at <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>p</mi><mo>=</mo><mn>0.5</mn></mrow><annotation encoding="application/x-tex">p=0.5</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.625em;vertical-align:-0.1944em;"></span><span class="mord mathnormal">p</span><span class="mspace" style="margin-right:0.2778em;"></span><span class="mrel">=</span><span class="mspace" style="margin-right:0.2778em;"></span></span><span class="base"><span class="strut" style="height:0.6444em;"></span><span class="mord">0.5</span></span></span></span> would mean changing from even odds (1:1) to odds of 777:1, or a probability of &gt;99.8%. (A 13x odds ratio, as seen for the 0.1 point in C1 (84th percentile), would mean changing from even odds to odds of 13:1, or a probability of 93%.)</p>



<p>The appropriate loss function for CDF data is the log loss, also known as the cross-entropy loss: <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><msub><mi>L</mi><mtext>log</mtext></msub><mo>=</mo><mo>−</mo><msub><mo>∑</mo><mi>i</mi></msub><mo stretchy="false">[</mo><msub><mi>p</mi><mi>i</mi></msub><mi>log</mi><mo>⁡</mo><mo stretchy="false">(</mo><msub><mover accent="true"><mi>p</mi><mo>^</mo></mover><mi>i</mi></msub><mo stretchy="false">)</mo><mo>+</mo><mo stretchy="false">(</mo><mn>1</mn><mo>−</mo><msub><mi>p</mi><mi>i</mi></msub><mo stretchy="false">)</mo><mi>log</mi><mo>⁡</mo><mo stretchy="false">(</mo><mn>1</mn><mo>−</mo><msub><mover accent="true"><mi>p</mi><mo>^</mo></mover><mi>i</mi></msub><mo stretchy="false">)</mo><mo stretchy="false">]</mo></mrow><annotation encoding="application/x-tex">L_{\text{log}} = -\sum_i [p_i \log(\hat{p}_i) + (1-p_i)\log(1-\hat{p}_i)]</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.9694em;vertical-align:-0.2861em;"></span><span class="mord"><span class="mord mathnormal">L</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.3361em;"><span style="top:-2.55em;margin-left:0em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord text mtight"><span class="mord mtight">log</span></span></span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.2861em;"><span></span></span></span></span></span></span><span class="mspace" style="margin-right:0.2778em;"></span><span class="mrel">=</span><span class="mspace" style="margin-right:0.2778em;"></span></span><span class="base"><span class="strut" style="height:1.0497em;vertical-align:-0.2997em;"></span><span class="mord">−</span><span class="mspace" style="margin-right:0.1667em;"></span><span class="mop"><span class="mop op-symbol small-op" style="position:relative;top:0em;">∑</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.162em;"><span style="top:-2.4003em;margin-left:0em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mathnormal mtight">i</span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.2997em;"><span></span></span></span></span></span></span><span class="mopen">[</span><span class="mord"><span class="mord mathnormal">p</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.3117em;"><span style="top:-2.55em;margin-left:0em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mathnormal mtight">i</span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.15em;"><span></span></span></span></span></span></span><span class="mspace" style="margin-right:0.1667em;"></span><span class="mop">lo<span style="margin-right:0.01389em;">g</span></span><span class="mopen">(</span><span class="mord"><span class="mord accent"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.6944em;"><span style="top:-3em;"><span class="pstrut" style="height:3em;"></span><span class="mord mathnormal">p</span></span><span style="top:-3em;"><span class="pstrut" style="height:3em;"></span><span class="accent-body" style="left:-0.1667em;"><span class="mord">^</span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.1944em;"><span></span></span></span></span></span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.3117em;"><span style="top:-2.55em;margin-left:0em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mathnormal mtight">i</span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.15em;"><span></span></span></span></span></span></span><span class="mclose">)</span><span class="mspace" style="margin-right:0.2222em;"></span><span class="mbin">+</span><span class="mspace" style="margin-right:0.2222em;"></span></span><span class="base"><span class="strut" style="height:1em;vertical-align:-0.25em;"></span><span class="mopen">(</span><span class="mord">1</span><span class="mspace" style="margin-right:0.2222em;"></span><span class="mbin">−</span><span class="mspace" style="margin-right:0.2222em;"></span></span><span class="base"><span class="strut" style="height:1em;vertical-align:-0.25em;"></span><span class="mord"><span class="mord mathnormal">p</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.3117em;"><span style="top:-2.55em;margin-left:0em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mathnormal mtight">i</span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.15em;"><span></span></span></span></span></span></span><span class="mclose">)</span><span class="mspace" style="margin-right:0.1667em;"></span><span class="mop">lo<span style="margin-right:0.01389em;">g</span></span><span class="mopen">(</span><span class="mord">1</span><span class="mspace" style="margin-right:0.2222em;"></span><span class="mbin">−</span><span class="mspace" style="margin-right:0.2222em;"></span></span><span class="base"><span class="strut" style="height:1em;vertical-align:-0.25em;"></span><span class="mord"><span class="mord accent"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.6944em;"><span style="top:-3em;"><span class="pstrut" style="height:3em;"></span><span class="mord mathnormal">p</span></span><span style="top:-3em;"><span class="pstrut" style="height:3em;"></span><span class="accent-body" style="left:-0.1667em;"><span class="mord">^</span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.1944em;"><span></span></span></span></span></span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.3117em;"><span style="top:-2.55em;margin-left:0em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mathnormal mtight">i</span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.15em;"><span></span></span></span></span></span></span><span class="mclose">)]</span></span></span></span>. This loss function naturally accounts for the fact that probability differences near 0 and 1 represent much larger differences in beliefs than the same probability differences near 0.5.<sup data-fn="03cf3499-e10c-403e-b780-3732fae8d50d" class="fn"><a href="#03cf3499-e10c-403e-b780-3732fae8d50d" id="03cf3499-e10c-403e-b780-3732fae8d50d-link">8</a></sup></p>



<p>As expected from this theoretical argument, we can see that the log loss, unlike the MSE of probabilities, does not display the pathological behaviour of ignoring the 0.1 or 0.9 point, and so avoids extreme odds ratios (see especially C1-C3):</p>



<figure class="wp-block-image size-large is-resized"><a href="https://aiimpacts.org/wp-content/uploads/2024/12/9_cdfs_gamma_loss_Fixed_probabilities.png"><img loading="lazy" decoding="async" width="987" height="1024" src="https://aiimpacts.org/wp-content/uploads/2024/12/9_cdfs_gamma_loss_Fixed_probabilities-987x1024.png" alt="" class="wp-image-3695" style="width:663px;height:auto" srcset="http://aiimpacts.org/wp-content/uploads/2024/12/9_cdfs_gamma_loss_Fixed_probabilities-987x1024.png 987w, http://aiimpacts.org/wp-content/uploads/2024/12/9_cdfs_gamma_loss_Fixed_probabilities-289x300.png 289w, http://aiimpacts.org/wp-content/uploads/2024/12/9_cdfs_gamma_loss_Fixed_probabilities-768x796.png 768w, http://aiimpacts.org/wp-content/uploads/2024/12/9_cdfs_gamma_loss_Fixed_probabilities-1481x1536.png 1481w, http://aiimpacts.org/wp-content/uploads/2024/12/9_cdfs_gamma_loss_Fixed_probabilities-1975x2048.png 1975w" sizes="auto, (max-width: 987px) 100vw, 987px" /></a></figure>



<p>As an informal analysis, this plot suggests that the MSE leads to extremely poor fits on &gt;15% of the data, but also that most of the MSE fits are close to the log loss fits.</p>



<p>When we create the aggregate CDF, we see hardly any impact of the loss function:</p>



<figure class="wp-block-image size-large is-resized"><a href="https://aiimpacts.org/wp-content/uploads/2024/12/gamma_effect_of_loss_aggregate_HLMI_FAOL.png"><img loading="lazy" decoding="async" width="1024" height="768" src="https://aiimpacts.org/wp-content/uploads/2024/12/gamma_effect_of_loss_aggregate_HLMI_FAOL-1024x768.png" alt="" class="wp-image-3696" style="width:572px;height:auto" srcset="http://aiimpacts.org/wp-content/uploads/2024/12/gamma_effect_of_loss_aggregate_HLMI_FAOL-1024x768.png 1024w, http://aiimpacts.org/wp-content/uploads/2024/12/gamma_effect_of_loss_aggregate_HLMI_FAOL-300x225.png 300w, http://aiimpacts.org/wp-content/uploads/2024/12/gamma_effect_of_loss_aggregate_HLMI_FAOL-768x576.png 768w, http://aiimpacts.org/wp-content/uploads/2024/12/gamma_effect_of_loss_aggregate_HLMI_FAOL-1536x1152.png 1536w, http://aiimpacts.org/wp-content/uploads/2024/12/gamma_effect_of_loss_aggregate_HLMI_FAOL-678x509.png 678w, http://aiimpacts.org/wp-content/uploads/2024/12/gamma_effect_of_loss_aggregate_HLMI_FAOL-326x245.png 326w, http://aiimpacts.org/wp-content/uploads/2024/12/gamma_effect_of_loss_aggregate_HLMI_FAOL-80x60.png 80w, http://aiimpacts.org/wp-content/uploads/2024/12/gamma_effect_of_loss_aggregate_HLMI_FAOL.png 1920w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /></a></figure>



<h2 class="wp-block-heading" id="flexible-distributions">Flexible distributions</h2>



<p>Regardless of the loss function used, we know that the Gamma distribution cannot match the three points of CDF data given by the expert<sup data-fn="a39f4f4a-6a4d-4c66-abc3-9900e1d00e9d" class="fn"><a href="#a39f4f4a-6a4d-4c66-abc3-9900e1d00e9d" id="a39f4f4a-6a4d-4c66-abc3-9900e1d00e9d-link">9</a></sup>. When we use any such distribution, our results do not merely reflect the expert’s beliefs, they also reflect the mathematical constraint we imposed upon those beliefs.</p>



<p>Overriding expert responses in this way may be appropriate when we have a strong theoretical justification to impose a particular distribution family. For example, if we have a strong reason to believe that experts think (or ought to think) of a variable as a sum of many small independent contributions, we may wish to impose a normal distribution, even if the responses they gave us are incompatible with a normal distribution.</p>



<p>However, the authors of previous analyses did not justify the choice of the gamma distribution at any point. In addition, I am not aware of any strong argument to impose a particular distribution family in this case.</p>



<p>While creating a full CDF from three points inevitably imposes assumptions not present in the data, at a minimum, it would be desirable to have this CDF pass through the three points.</p>



<p>To achieve this, I used proprietary probability distributions which I call ‘flexible distributions’. I developed these over the last several years, for precisely the class of use cases faced by ESPAI. These distributions have the following properties:</p>



<ul class="wp-block-list">
<li>Always exactly match three CDF points (or indeed an arbitrary number of them)…</li>



<li>…while taking a simple and smooth shape</li>



<li>Can be unbounded, or given an upper or lower bound, or both</li>
</ul>



<p>The distributions I used in this analysis are based on <a href="https://en.wikipedia.org/wiki/Interpolation">interpolation</a> theory. While the full mathematical and algorithmic details are proprietary, you can see how these distributions behave with the free interactive the web UI at <a href="https://makedistribution.com/">MakeDistribution</a> (select interpolation-based families under expert settings). In addition, to make this work reproducible, the specific fitted CDFs used in the ESPAI analysis are open source<sup data-fn="0e1c4d58-978b-464d-80f9-f164c7337514" class="fn"><a href="#0e1c4d58-978b-464d-80f9-f164c7337514" id="0e1c4d58-978b-464d-80f9-f164c7337514-link">10</a></sup>.</p>



<p>This plot compares Gamma distribution fits versus flexible distribution fits for fixed probabilities framing, displaying respondent points and Gamma CDFs.</p>



<figure class="wp-block-image size-large is-resized"><a href="https://aiimpacts.org/wp-content/uploads/2024/12/9_cdfs_gamma_vs_flexible_Fixed_probabilities.png"><img loading="lazy" decoding="async" width="987" height="1024" src="https://aiimpacts.org/wp-content/uploads/2024/12/9_cdfs_gamma_vs_flexible_Fixed_probabilities-987x1024.png" alt="" class="wp-image-3697" style="width:741px;height:auto" srcset="http://aiimpacts.org/wp-content/uploads/2024/12/9_cdfs_gamma_vs_flexible_Fixed_probabilities-987x1024.png 987w, http://aiimpacts.org/wp-content/uploads/2024/12/9_cdfs_gamma_vs_flexible_Fixed_probabilities-289x300.png 289w, http://aiimpacts.org/wp-content/uploads/2024/12/9_cdfs_gamma_vs_flexible_Fixed_probabilities-768x796.png 768w, http://aiimpacts.org/wp-content/uploads/2024/12/9_cdfs_gamma_vs_flexible_Fixed_probabilities-1481x1536.png 1481w, http://aiimpacts.org/wp-content/uploads/2024/12/9_cdfs_gamma_vs_flexible_Fixed_probabilities-1975x2048.png 1975w" sizes="auto, (max-width: 987px) 100vw, 987px" /></a></figure>



<p>When we aggregate the individual distributions, however, we find that the choice of distribution has a very limited impact, barely any more than the impact of the loss function.</p>



<figure class="wp-block-image size-large is-resized"><a href="https://aiimpacts.org/wp-content/uploads/2024/12/gamma_vs_flexible_aggregate_HLMI_FAOL.png"><img loading="lazy" decoding="async" width="1024" height="768" src="https://aiimpacts.org/wp-content/uploads/2024/12/gamma_vs_flexible_aggregate_HLMI_FAOL-1024x768.png" alt="" class="wp-image-3698" style="width:614px;height:auto" srcset="http://aiimpacts.org/wp-content/uploads/2024/12/gamma_vs_flexible_aggregate_HLMI_FAOL-1024x768.png 1024w, http://aiimpacts.org/wp-content/uploads/2024/12/gamma_vs_flexible_aggregate_HLMI_FAOL-300x225.png 300w, http://aiimpacts.org/wp-content/uploads/2024/12/gamma_vs_flexible_aggregate_HLMI_FAOL-768x576.png 768w, http://aiimpacts.org/wp-content/uploads/2024/12/gamma_vs_flexible_aggregate_HLMI_FAOL-1536x1152.png 1536w, http://aiimpacts.org/wp-content/uploads/2024/12/gamma_vs_flexible_aggregate_HLMI_FAOL-678x509.png 678w, http://aiimpacts.org/wp-content/uploads/2024/12/gamma_vs_flexible_aggregate_HLMI_FAOL-326x245.png 326w, http://aiimpacts.org/wp-content/uploads/2024/12/gamma_vs_flexible_aggregate_HLMI_FAOL-80x60.png 80w, http://aiimpacts.org/wp-content/uploads/2024/12/gamma_vs_flexible_aggregate_HLMI_FAOL.png 1920w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /></a></figure>



<p>It may be somewhat surprising to see so little difference in aggregate, when we consider that there appeared to be systematic patterns in the poor gamma fits<sup data-fn="b147a578-bb00-4eb5-b834-6b7e806152dd" class="fn"><a href="#b147a578-bb00-4eb5-b834-6b7e806152dd" id="b147a578-bb00-4eb5-b834-6b7e806152dd-link">11</a></sup>. However, this might be explained by the fact that the majority of fits were of acceptable quality.</p>



<p>I ran many variations of this analysis (and so can you, using the open-source codebase). None showed a dramatic effect of the distribution family.</p>



<h2 class="wp-block-heading" id="other-distributions">Other distributions</h2>



<p>In addition to flexible distributions, I also investigated the use of alternative ‘traditional’ distributions, such as the Weibull or Generalised gamma. For each distribution family, I tried fitting them both with the MSE of probabilities loss used by previous authors, and with the log loss. These had little impact on the aggregate CDF, which might be considered unsurprising since even the flexible distribution did not have large effects on aggregate results.</p>



<h1 class="wp-block-heading" id="range-of-responses">Displaying the distribution of responses</h1>



<p>What is the range of opinion<sup data-fn="b7ab8693-8e6a-485f-a2d9-de48fdc21c2a" class="fn"><a href="#b7ab8693-8e6a-485f-a2d9-de48fdc21c2a" id="b7ab8693-8e6a-485f-a2d9-de48fdc21c2a-link">12</a></sup> among experts? Previous analyses gave only an informal sense of this by displaying a few dozen randomly selected CDFs:</p>



<figure class="wp-block-image size-large is-resized"><a href="https://aiimpacts.org/wp-content/uploads/2024/12/grace_2018_figure_1.png"><img loading="lazy" decoding="async" width="1024" height="653" src="https://aiimpacts.org/wp-content/uploads/2024/12/grace_2018_figure_1-1024x653.png" alt="" class="wp-image-3701" style="width:620px;height:auto" srcset="http://aiimpacts.org/wp-content/uploads/2024/12/grace_2018_figure_1-1024x653.png 1024w, http://aiimpacts.org/wp-content/uploads/2024/12/grace_2018_figure_1-300x191.png 300w, http://aiimpacts.org/wp-content/uploads/2024/12/grace_2018_figure_1-768x490.png 768w, http://aiimpacts.org/wp-content/uploads/2024/12/grace_2018_figure_1-1536x980.png 1536w, http://aiimpacts.org/wp-content/uploads/2024/12/grace_2018_figure_1.png 1658w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /></a></figure>



<p><em><a href="https://arxiv.org/pdf/1705.08807">When Will AI Exceed Human Performance? Evidence from AI Experts</a></em>, Figure 1.</p>



<p>Their plots also included a 95% bootstrap confidence interval for the mean CDF. This is a measure of statistical variability in the estimate of the mean due to the finite sample size, not a measure of the dispersion of responses. Since ESPAI sample sizes are quite large, and the mean hence quite precisely estimated, I believe this bootstrap confidence interval is of secondary importance.</p>



<p>I dispense with the bootstrap CI and instead use the shaded area around the aggregate CDF to show the distribution of responses, specifically the central half of CDFs, from the 25th to the 75th percentile. This is a more systematic and quantitative alternative to displaying a random subset of individual CDFs.</p>



<p>It is clear that authors of previous ESPAI analyses are well aware of what the bootstrap CI measures and interpret it correctly. However, it’s possible that some casual readers did not become fully aware of this. For the avoidance of doubt, the 95% bootstrap CI is radically different (and radically more narrow) than the interval containing 95% of individual CDFs. The latter would cover almost the entire plot:</p>



<figure class="wp-block-image size-large is-resized"><a href="https://aiimpacts.org/wp-content/uploads/2024/12/range_of_responses_95_HLMI.png"><img loading="lazy" decoding="async" width="1024" height="768" src="https://aiimpacts.org/wp-content/uploads/2024/12/range_of_responses_95_HLMI-1024x768.png" alt="" class="wp-image-3702" style="width:587px;height:auto" srcset="http://aiimpacts.org/wp-content/uploads/2024/12/range_of_responses_95_HLMI-1024x768.png 1024w, http://aiimpacts.org/wp-content/uploads/2024/12/range_of_responses_95_HLMI-300x225.png 300w, http://aiimpacts.org/wp-content/uploads/2024/12/range_of_responses_95_HLMI-768x576.png 768w, http://aiimpacts.org/wp-content/uploads/2024/12/range_of_responses_95_HLMI-1536x1152.png 1536w, http://aiimpacts.org/wp-content/uploads/2024/12/range_of_responses_95_HLMI-678x509.png 678w, http://aiimpacts.org/wp-content/uploads/2024/12/range_of_responses_95_HLMI-326x245.png 326w, http://aiimpacts.org/wp-content/uploads/2024/12/range_of_responses_95_HLMI-80x60.png 80w, http://aiimpacts.org/wp-content/uploads/2024/12/range_of_responses_95_HLMI.png 1920w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /></a></figure>



<p>The degree of disagreement among respondents is such that instead of 95%, I show the central 50% in my plots. This is the widest interval that I found sufficiently visually informative. More typical intervals like the central 80% or 70% would cover such a wide range of predictions as to be less informative.</p>



<p></p>



<details class="wp-block-details is-layout-flow wp-block-details-is-layout-flow"><summary>Full Automation of Labor (FAOL): central 95% (2.5th to 97.5th percentile)</summary>
<figure class="wp-block-image size-large is-resized"><a href="https://aiimpacts.org/wp-content/uploads/2024/12/range_of_responses_95_FAOL.png"><img loading="lazy" decoding="async" width="1024" height="768" src="https://aiimpacts.org/wp-content/uploads/2024/12/range_of_responses_95_FAOL-1024x768.png" alt="" class="wp-image-3704" style="width:559px;height:auto" srcset="http://aiimpacts.org/wp-content/uploads/2024/12/range_of_responses_95_FAOL-1024x768.png 1024w, http://aiimpacts.org/wp-content/uploads/2024/12/range_of_responses_95_FAOL-300x225.png 300w, http://aiimpacts.org/wp-content/uploads/2024/12/range_of_responses_95_FAOL-768x576.png 768w, http://aiimpacts.org/wp-content/uploads/2024/12/range_of_responses_95_FAOL-1536x1152.png 1536w, http://aiimpacts.org/wp-content/uploads/2024/12/range_of_responses_95_FAOL-678x509.png 678w, http://aiimpacts.org/wp-content/uploads/2024/12/range_of_responses_95_FAOL-326x245.png 326w, http://aiimpacts.org/wp-content/uploads/2024/12/range_of_responses_95_FAOL-80x60.png 80w, http://aiimpacts.org/wp-content/uploads/2024/12/range_of_responses_95_FAOL.png 1920w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /></a></figure>
</details>



<p></p>



<details class="wp-block-details is-layout-flow wp-block-details-is-layout-flow"><summary>Full Automation of Labor (FAOL): central 80% (10th to 90th percentile)</summary>
<figure class="wp-block-image size-large is-resized"><a href="https://aiimpacts.org/wp-content/uploads/2024/12/range_of_responses_80_FAOL.png"><img loading="lazy" decoding="async" width="1024" height="768" src="https://aiimpacts.org/wp-content/uploads/2024/12/range_of_responses_80_FAOL-1024x768.png" alt="" class="wp-image-3705" style="width:533px;height:auto" srcset="http://aiimpacts.org/wp-content/uploads/2024/12/range_of_responses_80_FAOL-1024x768.png 1024w, http://aiimpacts.org/wp-content/uploads/2024/12/range_of_responses_80_FAOL-300x225.png 300w, http://aiimpacts.org/wp-content/uploads/2024/12/range_of_responses_80_FAOL-768x576.png 768w, http://aiimpacts.org/wp-content/uploads/2024/12/range_of_responses_80_FAOL-1536x1152.png 1536w, http://aiimpacts.org/wp-content/uploads/2024/12/range_of_responses_80_FAOL-678x509.png 678w, http://aiimpacts.org/wp-content/uploads/2024/12/range_of_responses_80_FAOL-326x245.png 326w, http://aiimpacts.org/wp-content/uploads/2024/12/range_of_responses_80_FAOL-80x60.png 80w, http://aiimpacts.org/wp-content/uploads/2024/12/range_of_responses_80_FAOL.png 1920w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /></a></figure>
</details>



<p></p>



<details class="wp-block-details is-layout-flow wp-block-details-is-layout-flow"><summary>Full Automation of Labor (FAOL): central 70% (15th to 85th percentile)</summary>
<figure class="wp-block-image size-large is-resized"><a href="https://aiimpacts.org/wp-content/uploads/2024/12/range_of_responses_70_FAOL.png"><img loading="lazy" decoding="async" width="1024" height="768" src="https://aiimpacts.org/wp-content/uploads/2024/12/range_of_responses_70_FAOL-1024x768.png" alt="" class="wp-image-3706" style="width:525px;height:auto" srcset="http://aiimpacts.org/wp-content/uploads/2024/12/range_of_responses_70_FAOL-1024x768.png 1024w, http://aiimpacts.org/wp-content/uploads/2024/12/range_of_responses_70_FAOL-300x225.png 300w, http://aiimpacts.org/wp-content/uploads/2024/12/range_of_responses_70_FAOL-768x576.png 768w, http://aiimpacts.org/wp-content/uploads/2024/12/range_of_responses_70_FAOL-1536x1152.png 1536w, http://aiimpacts.org/wp-content/uploads/2024/12/range_of_responses_70_FAOL-678x509.png 678w, http://aiimpacts.org/wp-content/uploads/2024/12/range_of_responses_70_FAOL-326x245.png 326w, http://aiimpacts.org/wp-content/uploads/2024/12/range_of_responses_70_FAOL-80x60.png 80w, http://aiimpacts.org/wp-content/uploads/2024/12/range_of_responses_70_FAOL.png 1920w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /></a></figure>
</details>



<p></p>



<details class="wp-block-details is-layout-flow wp-block-details-is-layout-flow"><summary>Full Automation of Labor (FAOL): central 50% (25th to 75th percentile)</summary>
<figure class="wp-block-image size-large is-resized"><a href="https://aiimpacts.org/wp-content/uploads/2024/12/range_of_responses_50_FAOL.png"><img loading="lazy" decoding="async" width="1024" height="768" src="https://aiimpacts.org/wp-content/uploads/2024/12/range_of_responses_50_FAOL-1024x768.png" alt="" class="wp-image-3707" style="width:543px;height:auto" srcset="http://aiimpacts.org/wp-content/uploads/2024/12/range_of_responses_50_FAOL-1024x768.png 1024w, http://aiimpacts.org/wp-content/uploads/2024/12/range_of_responses_50_FAOL-300x225.png 300w, http://aiimpacts.org/wp-content/uploads/2024/12/range_of_responses_50_FAOL-768x576.png 768w, http://aiimpacts.org/wp-content/uploads/2024/12/range_of_responses_50_FAOL-1536x1152.png 1536w, http://aiimpacts.org/wp-content/uploads/2024/12/range_of_responses_50_FAOL-678x509.png 678w, http://aiimpacts.org/wp-content/uploads/2024/12/range_of_responses_50_FAOL-326x245.png 326w, http://aiimpacts.org/wp-content/uploads/2024/12/range_of_responses_50_FAOL-80x60.png 80w, http://aiimpacts.org/wp-content/uploads/2024/12/range_of_responses_50_FAOL.png 1920w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /></a></figure>
</details>



<p></p>



<details class="wp-block-details is-layout-flow wp-block-details-is-layout-flow"><summary>High-Level Machine Intelligence (HLMI): central 95% (2.5th to 97.5th percentile)</summary>
<figure class="wp-block-image size-large is-resized"><a href="https://aiimpacts.org/wp-content/uploads/2024/12/range_of_responses_95_HLMI-1.png"><img loading="lazy" decoding="async" width="1024" height="768" src="https://aiimpacts.org/wp-content/uploads/2024/12/range_of_responses_95_HLMI-1-1024x768.png" alt="" class="wp-image-3708" style="width:539px;height:auto" srcset="http://aiimpacts.org/wp-content/uploads/2024/12/range_of_responses_95_HLMI-1-1024x768.png 1024w, http://aiimpacts.org/wp-content/uploads/2024/12/range_of_responses_95_HLMI-1-300x225.png 300w, http://aiimpacts.org/wp-content/uploads/2024/12/range_of_responses_95_HLMI-1-768x576.png 768w, http://aiimpacts.org/wp-content/uploads/2024/12/range_of_responses_95_HLMI-1-1536x1152.png 1536w, http://aiimpacts.org/wp-content/uploads/2024/12/range_of_responses_95_HLMI-1-678x509.png 678w, http://aiimpacts.org/wp-content/uploads/2024/12/range_of_responses_95_HLMI-1-326x245.png 326w, http://aiimpacts.org/wp-content/uploads/2024/12/range_of_responses_95_HLMI-1-80x60.png 80w, http://aiimpacts.org/wp-content/uploads/2024/12/range_of_responses_95_HLMI-1.png 1920w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /></a></figure>
</details>



<p></p>



<details class="wp-block-details is-layout-flow wp-block-details-is-layout-flow"><summary>High-Level Machine Intelligence (HLMI): central 80% (10th to 90th percentile)</summary>
<figure class="wp-block-image size-large is-resized"><a href="https://aiimpacts.org/wp-content/uploads/2024/12/range_of_responses_80_HLMI.png"><img loading="lazy" decoding="async" width="1024" height="768" src="https://aiimpacts.org/wp-content/uploads/2024/12/range_of_responses_80_HLMI-1024x768.png" alt="" class="wp-image-3709" style="width:559px;height:auto" srcset="http://aiimpacts.org/wp-content/uploads/2024/12/range_of_responses_80_HLMI-1024x768.png 1024w, http://aiimpacts.org/wp-content/uploads/2024/12/range_of_responses_80_HLMI-300x225.png 300w, http://aiimpacts.org/wp-content/uploads/2024/12/range_of_responses_80_HLMI-768x576.png 768w, http://aiimpacts.org/wp-content/uploads/2024/12/range_of_responses_80_HLMI-1536x1152.png 1536w, http://aiimpacts.org/wp-content/uploads/2024/12/range_of_responses_80_HLMI-678x509.png 678w, http://aiimpacts.org/wp-content/uploads/2024/12/range_of_responses_80_HLMI-326x245.png 326w, http://aiimpacts.org/wp-content/uploads/2024/12/range_of_responses_80_HLMI-80x60.png 80w, http://aiimpacts.org/wp-content/uploads/2024/12/range_of_responses_80_HLMI.png 1920w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /></a></figure>
</details>



<p></p>



<details class="wp-block-details is-layout-flow wp-block-details-is-layout-flow"><summary>High-Level Machine Intelligence (HLMI): central 70% (15th to 85th percentile)</summary>
<figure class="wp-block-image size-large is-resized"><a href="https://aiimpacts.org/wp-content/uploads/2024/12/range_of_responses_70_HLMI.png"><img loading="lazy" decoding="async" width="1024" height="768" src="https://aiimpacts.org/wp-content/uploads/2024/12/range_of_responses_70_HLMI-1024x768.png" alt="" class="wp-image-3710" style="width:576px;height:auto" srcset="http://aiimpacts.org/wp-content/uploads/2024/12/range_of_responses_70_HLMI-1024x768.png 1024w, http://aiimpacts.org/wp-content/uploads/2024/12/range_of_responses_70_HLMI-300x225.png 300w, http://aiimpacts.org/wp-content/uploads/2024/12/range_of_responses_70_HLMI-768x576.png 768w, http://aiimpacts.org/wp-content/uploads/2024/12/range_of_responses_70_HLMI-1536x1152.png 1536w, http://aiimpacts.org/wp-content/uploads/2024/12/range_of_responses_70_HLMI-678x509.png 678w, http://aiimpacts.org/wp-content/uploads/2024/12/range_of_responses_70_HLMI-326x245.png 326w, http://aiimpacts.org/wp-content/uploads/2024/12/range_of_responses_70_HLMI-80x60.png 80w, http://aiimpacts.org/wp-content/uploads/2024/12/range_of_responses_70_HLMI.png 1920w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /></a></figure>
</details>



<p></p>



<details class="wp-block-details is-layout-flow wp-block-details-is-layout-flow"><summary>High-Level Machine Intelligence (HLMI): central 50% (25th to 75th percentile)</summary>
<figure class="wp-block-image size-large is-resized"><a href="https://aiimpacts.org/wp-content/uploads/2024/12/range_of_responses_50_HLMI.png"><img loading="lazy" decoding="async" width="1024" height="768" src="https://aiimpacts.org/wp-content/uploads/2024/12/range_of_responses_50_HLMI-1024x768.png" alt="" class="wp-image-3711" style="width:530px;height:auto" srcset="http://aiimpacts.org/wp-content/uploads/2024/12/range_of_responses_50_HLMI-1024x768.png 1024w, http://aiimpacts.org/wp-content/uploads/2024/12/range_of_responses_50_HLMI-300x225.png 300w, http://aiimpacts.org/wp-content/uploads/2024/12/range_of_responses_50_HLMI-768x576.png 768w, http://aiimpacts.org/wp-content/uploads/2024/12/range_of_responses_50_HLMI-1536x1152.png 1536w, http://aiimpacts.org/wp-content/uploads/2024/12/range_of_responses_50_HLMI-678x509.png 678w, http://aiimpacts.org/wp-content/uploads/2024/12/range_of_responses_50_HLMI-326x245.png 326w, http://aiimpacts.org/wp-content/uploads/2024/12/range_of_responses_50_HLMI-80x60.png 80w, http://aiimpacts.org/wp-content/uploads/2024/12/range_of_responses_50_HLMI.png 1920w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /></a></figure>
</details>



<p></p>



<h1 class="wp-block-heading" id="aggregating-hlmi-faol">Aggregating across the task and occupation framings</h1>



<p>Before being asked for their forecasts, respondents were shown the following definitions for HLMI (High-Level Machine Intelligence) and FAOL (Full Automation of Labor):</p>



<p>HLMI (tasks):</p>



<blockquote class="wp-block-quote is-layout-flow wp-block-quote-is-layout-flow">
<p>High-level machine intelligence (HLMI) is achieved when unaided machines can accomplish every task better and more cheaply than human workers. Ignore aspects of tasks for which being a human is intrinsically advantageous, e.g., being accepted as a jury member. Think feasibility, not adoption.</p>
</blockquote>



<p>FAOL (occupations):</p>



<blockquote class="wp-block-quote is-layout-flow wp-block-quote-is-layout-flow">
<p>Say an occupation becomes fully automatable when unaided machines can accomplish it better and more cheaply than human workers. Ignore aspects of occupations for which being a human is intrinsically advantageous, e.g., being accepted as a jury member. Think feasibility, not adoption. Say we have reached ‘full automation of labor’ when all occupations are fully automatable. That is, when for any occupation, machines could be built to carry out the task better and more cheaply than human workers.</p>
</blockquote>



<p>The two questions are very similar. The main difference is that HLMI is phrased in terms of tasks, while FAOL asks about occupations. In principle, we should expect the same prediction on both questions. As noted by the authors,</p>



<blockquote class="wp-block-quote is-layout-flow wp-block-quote-is-layout-flow">
<p>since occupations might naturally be understood either as complex tasks, composed of tasks, or closely connected with one of these, achieving HLMI seems to either imply having already achieved FAOL, or suggest being close.</p>
</blockquote>



<p>So it is legitimate to think of these as two different framings of the same question.</p>



<p>Despite their similarity, these framings yield very different predictions. The figures below show the result of using my preferred settings (median aggregation, flexible distributions), except that HLMI and FAOL are shown separately instead of aggregated:</p>



<h2 class="wp-block-heading has-medium-font-size">HLMI vs FAOL</h2>



<details class="wp-block-details is-layout-flow wp-block-details-is-layout-flow"><summary>Full Automation of Labor (FAOL)</summary>
<figure class="wp-block-image size-large"><a href="https://aiimpacts.org/wp-content/uploads/2024/12/headline_result_FAOL.png"><img loading="lazy" decoding="async" width="1024" height="768" src="https://aiimpacts.org/wp-content/uploads/2024/12/headline_result_FAOL-1024x768.png" alt="" class="wp-image-3713" srcset="http://aiimpacts.org/wp-content/uploads/2024/12/headline_result_FAOL-1024x768.png 1024w, http://aiimpacts.org/wp-content/uploads/2024/12/headline_result_FAOL-300x225.png 300w, http://aiimpacts.org/wp-content/uploads/2024/12/headline_result_FAOL-768x576.png 768w, http://aiimpacts.org/wp-content/uploads/2024/12/headline_result_FAOL-1536x1152.png 1536w, http://aiimpacts.org/wp-content/uploads/2024/12/headline_result_FAOL-2048x1536.png 2048w, http://aiimpacts.org/wp-content/uploads/2024/12/headline_result_FAOL-678x509.png 678w, http://aiimpacts.org/wp-content/uploads/2024/12/headline_result_FAOL-326x245.png 326w, http://aiimpacts.org/wp-content/uploads/2024/12/headline_result_FAOL-80x60.png 80w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /></a></figure>
</details>



<p></p>



<details class="wp-block-details is-layout-flow wp-block-details-is-layout-flow"><summary>High-Level Machine Intelligence (HLMI)</summary>
<figure class="wp-block-image size-large"><a href="https://aiimpacts.org/wp-content/uploads/2024/12/headline_result_HLMI.png"><img loading="lazy" decoding="async" width="1024" height="768" src="https://aiimpacts.org/wp-content/uploads/2024/12/headline_result_HLMI-1024x768.png" alt="" class="wp-image-3714" srcset="http://aiimpacts.org/wp-content/uploads/2024/12/headline_result_HLMI-1024x768.png 1024w, http://aiimpacts.org/wp-content/uploads/2024/12/headline_result_HLMI-300x225.png 300w, http://aiimpacts.org/wp-content/uploads/2024/12/headline_result_HLMI-768x576.png 768w, http://aiimpacts.org/wp-content/uploads/2024/12/headline_result_HLMI-1536x1152.png 1536w, http://aiimpacts.org/wp-content/uploads/2024/12/headline_result_HLMI-2048x1536.png 2048w, http://aiimpacts.org/wp-content/uploads/2024/12/headline_result_HLMI-678x509.png 678w, http://aiimpacts.org/wp-content/uploads/2024/12/headline_result_HLMI-326x245.png 326w, http://aiimpacts.org/wp-content/uploads/2024/12/headline_result_HLMI-80x60.png 80w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /></a></figure>
</details>



<p></p>



<details class="wp-block-details is-layout-flow wp-block-details-is-layout-flow"><summary>Comparison of HLMI and FAOL</summary>
<figure class="wp-block-image size-large"><a href="https://aiimpacts.org/wp-content/uploads/2024/12/compare_hlmi_faol.png"><img loading="lazy" decoding="async" width="1024" height="768" src="https://aiimpacts.org/wp-content/uploads/2024/12/compare_hlmi_faol-1024x768.png" alt="" class="wp-image-3715" srcset="http://aiimpacts.org/wp-content/uploads/2024/12/compare_hlmi_faol-1024x768.png 1024w, http://aiimpacts.org/wp-content/uploads/2024/12/compare_hlmi_faol-300x225.png 300w, http://aiimpacts.org/wp-content/uploads/2024/12/compare_hlmi_faol-768x576.png 768w, http://aiimpacts.org/wp-content/uploads/2024/12/compare_hlmi_faol-1536x1152.png 1536w, http://aiimpacts.org/wp-content/uploads/2024/12/compare_hlmi_faol-678x509.png 678w, http://aiimpacts.org/wp-content/uploads/2024/12/compare_hlmi_faol-326x245.png 326w, http://aiimpacts.org/wp-content/uploads/2024/12/compare_hlmi_faol-80x60.png 80w, http://aiimpacts.org/wp-content/uploads/2024/12/compare_hlmi_faol.png 1920w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /></a></figure>
</details>



<p></p>



<p>Previous analyses never aggregated the task and occupation results, only presenting them separately. Recall that using their methodology<sup data-fn="79fe81fe-96c1-4ed8-b302-574fa1ba2b76" class="fn"><a href="#79fe81fe-96c1-4ed8-b302-574fa1ba2b76" id="79fe81fe-96c1-4ed8-b302-574fa1ba2b76-link">13</a></sup>, the authors reported the median year for all human tasks was 2047, while for occupations it was 2116 (a difference of 69 years!).</p>



<p>Presenting results separately allows for a deeper understanding for patient and sophisticated readers. However, we must be realistic: it is very likely that a single “headline” result will be most widely spread and remembered. Attempting to prevent this by <em>only</em> presenting HLMI (tasks) and FAOL (occupations) separately is largely futile, in my opinion. While it may sometimes encourage nuance, more often it will make it easier for readers to choose whichever of the two results best fits their preconceptions.</p>



<p>Indeed, my brief investigation suggests that citations of the 2023 survey results are strongly biased towards tasks (HLMI) over occupations (FAOL). Out of the 20 articles<sup data-fn="a52b53bc-1346-4411-b073-2ffb89d84b32" class="fn"><a href="#a52b53bc-1346-4411-b073-2ffb89d84b32" id="a52b53bc-1346-4411-b073-2ffb89d84b32-link">14</a></sup> on the first two pages of Google Scholar citations of the 2024 preprint, 7 reported at least one of HLMI or FAOL. Among these</p>



<ul class="wp-block-list">
<li>6 out of 7 (85%) reported tasks (HLMI) only</li>



<li>1 out of 7 (15%) reported tasks and occupations</li>



<li>None (0%) reported occupations (FAOL) only</li>
</ul>



<p>Therefore, I consider it preferable, when providing headline results, to aggregate accross HLMI and FAOL to provide a single estimate of when all tasks or occupations will be automatable.</p>



<p>I achieve this by simply including answers to both questions prior to aggregation, i.e. no special form of aggregation is used for aggregating tasks (HLMI) and occupations (FAOL). Since more respondents were asked about tasks than occupations, I achieve equal weight by resampling from the occupations (FAOL) responses.</p>



<h1 class="wp-block-heading" id="codebase">Codebase</h1>



<p>For this analysis, I wrote a <a href="https://github.com/tadamcz/espai/" data-type="link" data-id="https://github.com/tadamcz/espai/">fully new codebase</a>. This was necessary because the system used for previous analyses relied on a collection of Jupyter notebooks that required manually running cells in a specific, undocumented order to achieve results.</p>



<p>This new codebase, written in Python, makes our analyses reproducible for the first time. The codebase includes a robust test suite.</p>



<p>We are open sourcing the codebase, and invite scrutiny and contributions from other researchers. It provides user-friendly configuration objects that we hope will make it easy for you to run your own variations of the analysis and produce your own plots.</p>



<h2 class="wp-block-heading has-large-font-size">Footnotes</h2>


<ol class="wp-block-footnotes"><li id="23f54f49-9fa2-4345-a108-0f9d2acd5259">Timing will be my sole focus. I ignore ESPAI’s questions about whether the overall impact of AI will be positive or negative, the preferred rate of progress, etc. <a href="#23f54f49-9fa2-4345-a108-0f9d2acd5259-link" aria-label="Jump to footnote reference 1"><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/21a9.png" alt="↩" class="wp-smiley" style="height: 1em; max-height: 1em;" />︎</a></li><li id="514e6709-2685-453c-b148-94cd12b91b67">This uses plain language as much as possible. Depending on your audience, you may wish to replace “central half” with “interquartile range”, or use phrases like “75th percentile”. Also, you can round 2048 to 2050 and 2103 to 2100 without losing anything of value. <a href="#514e6709-2685-453c-b148-94cd12b91b67-link" aria-label="Jump to footnote reference 2"><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/21a9.png" alt="↩" class="wp-smiley" style="height: 1em; max-height: 1em;" />︎</a></li><li id="648cbfff-fbda-4131-a809-9dfea725a197">Note that the ‘whiskers’ of our box plot are slightly nonstandard: they show the 15th and 85th percentile responses. Whiskers are more commonly used to represent the 1.5 IQR value: from above the upper quartile (75th percentile), a distance of 1.5 times the interquartile range (IQR) is measured out and a whisker is drawn <em>up to</em> the largest observed data point from the dataset that falls within this distance. <a href="#648cbfff-fbda-4131-a809-9dfea725a197-link" aria-label="Jump to footnote reference 3"><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/21a9.png" alt="↩" class="wp-smiley" style="height: 1em; max-height: 1em;" />︎</a></li><li id="a8f1ee3f-3a3b-498f-ac7c-f25b9ea3dddd">As a further subtlety, “the question sets do differ beyond definitions: only the HLMI questions are preceded by the instruction to “assume that human scientific activity continues without major negative disruption,” and the FAOL block asks a sequence of questions about the automation of specific occupations before asking about full automation of labor” (<a href="https://arxiv.org/abs/2401.02843">Thousands of AI Authors on the Future of AI</a>) <a href="#a8f1ee3f-3a3b-498f-ac7c-f25b9ea3dddd-link" aria-label="Jump to footnote reference 4"><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/21a9.png" alt="↩" class="wp-smiley" style="height: 1em; max-height: 1em;" />︎</a></li><li id="40b92e73-777b-4d16-ac7f-66174261ef2a">In reality there are even more complications that I elide in the main text. If a set of probabilities contains both values of exactly 1, and values of exactly 0, the geometric mean of odds is undefined. If a one is present and there are no zeroes, the aggregate is one; and a zero is present and there are no ones, the aggregate is zero. However, floating point numbers by design have much more precision near zero than near one. For example, we can represent extremely small numbers like <code>1e-18</code>, but <code>1 - 1e-18</code> just gets represented as <code>1.0</code>. This means that very high probabilities get represented as 1 when equally extreme low probabilities do not get represented as zero. As a result, high probabilities get an “unfair advantage”. It should be possible to circumvent some of these problems by using alternative representations of the probabilities. However, many respondents directly give probabilities of 0% or 100% (as opposed to their fitted CDFs merely reaching these values). This poses a more fundamental problem for the geometric mean of odds. <a href="#40b92e73-777b-4d16-ac7f-66174261ef2a-link" aria-label="Jump to footnote reference 5"><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/21a9.png" alt="↩" class="wp-smiley" style="height: 1em; max-height: 1em;" />︎</a></li><li id="36fdf480-d735-4785-9483-7a68aff2be0b">I believe this is probably a theorem (with the possible exception of some degenerate cases), but I am not entirely sure since I have not attempted to actually write down or locate a proof. If you’ve got a proof or counter-example please contact me. <a href="#36fdf480-d735-4785-9483-7a68aff2be0b-link" aria-label="Jump to footnote reference 6"><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/21a9.png" alt="↩" class="wp-smiley" style="height: 1em; max-height: 1em;" />︎</a></li><li id="e052f82e-e51d-4c49-be81-e33a7efda5dc">I give examples only for the fixed-probabilities framing in the main text because it’s easier to explain in the context of the loss functions we are using, which all use probabilities. However, we can see similar phenomena when looking at the fixed-years data. These are 9 plots representative of the bottom half of fixed-years Gamma fits.<br><img loading="lazy" decoding="async" width="500" height="518" class="wp-image-3691" style="width: 500px;" src="http://aiimpacts.org/wp-content/uploads/2024/12/9_prev_gamma_fits_Fixed_years_annotations_True.png" alt="" srcset="http://aiimpacts.org/wp-content/uploads/2024/12/9_prev_gamma_fits_Fixed_years_annotations_True.png 1024w, http://aiimpacts.org/wp-content/uploads/2024/12/9_prev_gamma_fits_Fixed_years_annotations_True-290x300.png 290w, http://aiimpacts.org/wp-content/uploads/2024/12/9_prev_gamma_fits_Fixed_years_annotations_True-988x1024.png 988w, http://aiimpacts.org/wp-content/uploads/2024/12/9_prev_gamma_fits_Fixed_years_annotations_True-768x796.png 768w" sizes="auto, (max-width: 500px) 100vw, 500px" /><br>Since I am in this section aiming for expository clarity rather than the greatest rigour, I also elided the following complication in the main text. All distributions shown are Gammas fitted by previous authors, using the MSE of probabilities as the loss function. However, to produce the ranking of fits used to select which examples to plot, I used a different loss function. This was the MSE of years (horizontal direction) for the fixed-years plot, and the log loss for the fixed-probabilities plot. These loss functions make my examples more intuitive, while still being somewhat systematic (instead of cherry-picking examples). <a href="#e052f82e-e51d-4c49-be81-e33a7efda5dc-link" aria-label="Jump to footnote reference 7"><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/21a9.png" alt="↩" class="wp-smiley" style="height: 1em; max-height: 1em;" />︎</a></li><li id="03cf3499-e10c-403e-b780-3732fae8d50d">The log loss can be motivated by analogy to the Kullback-Leibler (KL) divergence between discrete distributions. For each point in a respondent’s CDF data, we can think of it as a binary probability distribution (p, 1-p). The fitted CDF gives us another binary distribution (q, 1-q) at that point. The KL divergence between these distributions would be<br><br>D<sub>KL</sub>(p||q) = p log(p/q) + (1-p) log((1-p)/(1-q)) = p log(p) &#8211; p log(q) + (1-p) log(1-p) &#8211; (1-p) log(1-q)<br><br>The log loss -[p log(q) + (1 &#8211; p) log(1 &#8211; q)] differs from this only by dropping the terms that don’t depend on q, and thus has the same minimum. However, this is merely an intuitive motivation: we are not actually comparing two discrete distributions, but rather measuring how well our continuous CDF matches specific points. <a href="#03cf3499-e10c-403e-b780-3732fae8d50d-link" aria-label="Jump to footnote reference 8"><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/21a9.png" alt="↩" class="wp-smiley" style="height: 1em; max-height: 1em;" />︎</a></li><li id="a39f4f4a-6a4d-4c66-abc3-9900e1d00e9d">Note that although the generalised gamma distribution has three parameters, as far as I can tell it does not have the flexibility to fit three arbitrary points of CDF data. I came to this conclusion by extensive empirical investigation, but I haven’t been able to locate or write a proof to conclusively establish this one way or another. Please write to me if you know the answer. By the way, I don’t know of any parametric 3-parameter distribution that has this property. I used flexible distributions for ESPAI because they are the only solution I am aware of. <a href="#a39f4f4a-6a4d-4c66-abc3-9900e1d00e9d-link" aria-label="Jump to footnote reference 9"><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/21a9.png" alt="↩" class="wp-smiley" style="height: 1em; max-height: 1em;" />︎</a></li><li id="0e1c4d58-978b-464d-80f9-f164c7337514">The code uses the paid MakeDistribution API, but a copy of all API responses needed to perform the analysis is stored in the repository. <a href="#0e1c4d58-978b-464d-80f9-f164c7337514-link" aria-label="Jump to footnote reference 10"><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/21a9.png" alt="↩" class="wp-smiley" style="height: 1em; max-height: 1em;" />︎</a></li><li id="b147a578-bb00-4eb5-b834-6b7e806152dd">I informally explored possible biases in the Gamma fits using the following histograms of residuals. While several of the residual distributions seem clearly biased, they also in most cases have 80% of the probability mass quite close to a residual of zero. I still do not fully understand why the effect of this data on aggregate CDFs is so muted, but I have not prioritised a more rigorous analysis.<br><img loading="lazy" decoding="async" width="800" height="533" class="wp-image-3699" style="width: 800px;" src="http://aiimpacts.org/wp-content/uploads/2024/12/prev_fits_bias_hist_FAOL.png" alt="" srcset="http://aiimpacts.org/wp-content/uploads/2024/12/prev_fits_bias_hist_FAOL.png 4500w, http://aiimpacts.org/wp-content/uploads/2024/12/prev_fits_bias_hist_FAOL-300x200.png 300w, http://aiimpacts.org/wp-content/uploads/2024/12/prev_fits_bias_hist_FAOL-1024x683.png 1024w, http://aiimpacts.org/wp-content/uploads/2024/12/prev_fits_bias_hist_FAOL-768x512.png 768w, http://aiimpacts.org/wp-content/uploads/2024/12/prev_fits_bias_hist_FAOL-1536x1024.png 1536w, http://aiimpacts.org/wp-content/uploads/2024/12/prev_fits_bias_hist_FAOL-2048x1365.png 2048w" sizes="auto, (max-width: 800px) 100vw, 800px" /> <a href="#b147a578-bb00-4eb5-b834-6b7e806152dd-link" aria-label="Jump to footnote reference 11"><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/21a9.png" alt="↩" class="wp-smiley" style="height: 1em; max-height: 1em;" />︎</a></li><li id="b7ab8693-8e6a-485f-a2d9-de48fdc21c2a">Due to the large framing effects of both tasks vs occupations, and fixed-years vs fixed-probabilities, which have been consistently observed, one may reasonably quarrel with describing this plot as showing “disagreement among respondents” or “the range of opinion among experts”. Part of why the range is so wide is that responses are highly sensitive to framing. Rather than saying experts <em>disagree</em> per se, purists might wish to say that expert opinion is undefined or unstable.<br><br>This is a rather philosophical point. The more practical version of it is to ask whether we should aggregate accross these framings, or just present them separately.<br><br>My position (further discussed <a href="#Aggregating_across_the_task_and_occupation_framings">here</a>) is that while disaggregated results should also be available, aggregation is necessary to produce useful results. Aggregating things that have some commonalities and some differences is indeed inherent to science. While previous authors presented HLMI and FAOL separately, they did not present fixed-years and fixed-probabilities separately, which would be required if we take the anti-aggregation argument to its full conclusion. <a href="#b7ab8693-8e6a-485f-a2d9-de48fdc21c2a-link" aria-label="Jump to footnote reference 12"><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/21a9.png" alt="↩" class="wp-smiley" style="height: 1em; max-height: 1em;" />︎</a></li><li id="79fe81fe-96c1-4ed8-b302-574fa1ba2b76">Their methodology is different from what I used in the plots above, but yields very similar results for the median year. <a href="#79fe81fe-96c1-4ed8-b302-574fa1ba2b76-link" aria-label="Jump to footnote reference 13"><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/21a9.png" alt="↩" class="wp-smiley" style="height: 1em; max-height: 1em;" />︎</a></li><li id="a52b53bc-1346-4411-b073-2ffb89d84b32">Here is the full table: <a href="#a52b53bc-1346-4411-b073-2ffb89d84b32-link" aria-label="Jump to footnote reference 14"><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/21a9.png" alt="↩" class="wp-smiley" style="height: 1em; max-height: 1em;" />︎</a></li></ol>


<p>
<table>
        <thead>
          <tr>
            <th>Title</th>
            <th>Year</th>
            <th>Link</th>
            <th>Citation</th>
          </tr>
        </thead>
        <tbody>
          <tr>
            <td>Artificial intelligence: Arguments for catastrophic risk</td>
            <td>2024</td>
            <td><a href="https://compass.onlinelibrary.wiley.com/doi/abs/10.1111/phc3.12964">Link</a></td>
            <td>HLMI only</td>
          </tr>
          <tr>
            <td>Safety cases for frontier AI</td>
            <td>2024</td>
            <td><a href="https://arxiv.org/abs/2410.21572">Link</a></td>
            <td>No numbers</td>
          </tr>
          <tr>
            <td>Me, myself and AI: How gender, personality and emotions determine willingness to use Strong AI for self-improvement</td>
            <td>2024</td>
            <td><a href="https://www.sciencedirect.com/science/article/pii/S0040162524005584">Link</a></td>
            <td>HLMI only</td>
          </tr>
          <tr>
            <td>Theory Is All You Need: AI, Human Cognition, and Causal Reasoning</td>
            <td>2024</td>
            <td><a href="https://www.bu.edu/dbi/files/2024/08/FelinHolwegAug2024_SSRN.pdf">Link</a></td>
            <td>No numbers</td>
          </tr>
          <tr>
            <td>Shared Awareness Across Domain‐Specific Artificial Intelligence</td>
            <td>2024</td>
            <td><a href="https://onlinelibrary.wiley.com/doi/abs/10.1002/aisy.202300740">Link</a></td>
            <td>No numbers</td>
          </tr>
          <tr>
            <td>Existential risk from transformative AI: an economic perspective</td>
            <td>2024</td>
            <td><a href="https://journals.vilniustech.lt/index.php/TEDE/article/view/21525">Link</a></td>
            <td>HLMI only</td>
          </tr>
          <tr>
            <td>Theory is all you need: AI, human cognition, and decision making</td>
            <td>2024</td>
            <td><a href="https://papers.ssrn.com/sol3/papers.cfm?abstract_id=4737265">Link</a></td>
            <td>No numbers</td>
          </tr>
          <tr>
            <td>Generative artificial intelligence usage by researchers at work</td>
            <td>2024</td>
            <td><a href="https://www.sciencedirect.com/science/article/pii/S0736585324000911">Link</a></td>
            <td>No numbers</td>
          </tr>
          <tr>
            <td>AI Horizon Scanning, White Paper p3395, IEEE-SA. Part I: Areas of Attention</td>
            <td>2024</td>
            <td><a href="https://arxiv.org/abs/2410.01808">Link</a></td>
            <td>Two tasks (build a payment processing site, fine-tune an LLM)</td>
          </tr>
          <tr>
            <td>Generative AI, Ingenuity, and Law</td>
            <td>2024</td>
            <td><a href="https://ieeexplore.ieee.org/abstract/document/10598190/">Link</a></td>
            <td>HLMI only</td>
          </tr>
          <tr>
            <td>AI Emergency Preparedness: Examining the federal government’s ability to detect and respond to AI-related national security threats</td>
            <td>2024</td>
            <td><a href="https://arxiv.org/abs/2407.17347">Link</a></td>
            <td>No numbers</td>
          </tr>
          <tr>
            <td>Transformative AI, existential risk, and real interest rates</td>
            <td>2024</td>
            <td><a href="https://basilhalperin.com/papers/agi_emh.pdf">Link</a></td>
            <td>HLMI only</td>
          </tr>
          <tr>
            <td>Misrepresented Technological Solutions in Imagined Futures: The Origins and Dangers of AI Hype in the Research Community</td>
            <td>2024</td>
            <td><a href="https://ojs.aaai.org/index.php/AIES/article/view/31737">Link</a></td>
            <td>No numbers</td>
          </tr>
          <tr>
            <td>Eliciting the Priors of Large Language Models using Iterated In-Context Learning</td>
            <td>2024</td>
            <td><a href="https://arxiv.org/abs/2406.01860">Link</a></td>
            <td>HLMI only</td>
          </tr>
          <tr>
            <td>Strategic Insights from Simulation Gaming of AI Race Dynamics</td>
            <td>2024</td>
            <td><a href="https://arxiv.org/abs/2410.03092">Link</a></td>
            <td>No numbers</td>
          </tr>
          <tr>
            <td>Evolutionary debunking and value alignment</td>
            <td>2024</td>
            <td><a href="https://globalprioritiesinstitute.org/wp-content/uploads/Michael-T.-Dale-and-Bradford-Saad-Evolutionary-debunking-and-value-alignment.pdf">Link</a></td>
            <td>No numbers</td>
          </tr>
          <tr>
            <td>Robust Technology Regulation</td>
            <td>2024</td>
            <td><a href="https://arxiv.org/abs/2408.17398">Link</a></td>
            <td>Extinction risk only</td>
          </tr>
          <tr>
            <td>Interpreting Affine Recurrence Learning in GPT-style Transformers</td>
            <td>2024</td>
            <td><a href="https://arxiv.org/abs/2410.17438">Link</a></td>
            <td>No numbers</td>
          </tr>
          <tr>
            <td>Malicious use of AI and challenges to psychological security: Future risks</td>
            <td>2024</td>
            <td><a href="https://russiancouncil.ru/en/analytics-and-comments/analytics/malicious-use-of-ai-and-challenges-to-psychological-security-future-risks/">Link</a></td>
            <td>HLMI and FAOL</td>
          </tr>
          <tr>
            <td>Grow Your Artificial Intelligence Competence</td>
            <td>2024</td>
            <td><a href="https://ieeexplore.ieee.org/abstract/document/10685842/">Link</a></td>
            <td>No numbers</td>
          </tr>
        </tbody>
      </table>
</p>
]]></content:encoded>
					
		
		
			</item>
		<item>
		<title>The purpose of philosophical AI will be: To orient ourselves in thinking</title>
		<link>http://aiimpacts.org/the-purpose-of-philosophical-ai-will-be-to-orient-ourselves-in-thinking/</link>
		
		<dc:creator><![CDATA[Katja Grace]]></dc:creator>
		<pubDate>Mon, 28 Oct 2024 16:46:53 +0000</pubDate>
				<category><![CDATA[Essay Competition on the Automation of Wisdom and Philosophy]]></category>
		<guid isPermaLink="false">https://aiimpacts.org/?p=3639</guid>

					<description><![CDATA[Max Noichl 1 This was a prize-winning entry into the Essay Competition on the Automation of Wisdom and Philosophy. Summary In this essay I will suggest a lower bound for the impact that artificial intelligence <a class="mh-excerpt-more" href="http://aiimpacts.org/the-purpose-of-philosophical-ai-will-be-to-orient-ourselves-in-thinking/" title="The purpose of philosophical AI will be: To orient ourselves in thinking"></a>]]></description>
										<content:encoded><![CDATA[
<p></p>



<p><em>Max Noichl</em> <a href="#fn1" class="footnote-ref" id="fnref1"
role="doc-noteref"><sup>1</sup></a></p>
<p><em>This was a prize-winning entry into the Essay Competition on the Automation of Wisdom and Philosophy.</em></p>

<p><h3>Summary</h3> <em>In this essay I will suggest a lower
bound for the impact that artificial intelligence systems can have on
the automation of philosophy. Specifically I will argue that skepticism
is warranted about whether LLM-based systems similar to the best ones
available right now will be able to independently produce philosophy at
a level of quality and creativity that is interesting to us. But they
are clearly already able to solve medium-complexity language tasks in a
way that makes them useful to structure and consolidate the contemporary
philosophical landscape, allowing for novel and interesting ways to
orient ourselves in thinking.</em></p>

<p><h3>Introduction</h3></p>
<p>The purpose of philosophical AI will be: To orient ourselves in
thinking. This position is opposed to the view that the LLM-based
artificial intelligence systems which are at this point foreseeable,
will autonomously produce philosophy that is of a high enough quality
and novelty to be interesting to us. In this essay I will briefly try to
make this position plausible. I will then sketch the alternative
direction in which I suspect the most impactful practical interaction of
philosophy and AI will go and present a pilot study of what this may
look like. Finally, I will argue that this direction can integrate well
into contemporary philosophical practice and solve some previously
unresolved desiderata.</p>

<p><h3>Autonomous production of philosophy</h3></p>
<p>The first idea that we might have when thinking about how artificial
intelligence might serve to automate philosophy is that the AI system is
going to philosophize for—<em>instead</em> —of us. And indeed the
currently best publicly available systems seem to show some basic
promise. They<a href="#fn2" class="footnote-ref" id="fnref2"
role="doc-noteref"><sup>2</sup></a> are able to recapitulate classic
philosophical arguments and thought experiments with reasonable,
although somewhat spotty, quality, and when vaguely prompted to opine on
topics of philosophical impact they are also able to identify classical
lines of argument.</p>
<p>But these abilities are very much in line with an understanding of
LLMs that sees them largely as sophisticated mechanisms for the
reproduction and adaptation of already present textual material, which
would of course be in stark contrast to the capabilities that are
arguably necessary for the production of truly novel and logically
coherent philosophy, namely strong abstract reasoning capabilities.</p>

<p><h3>Some grounds for scepticism</h3></p>
<p>To me it seems like the capability profile of the language models we
have seen so far is distinctly <em>weird</em>. They play chess, to some
degree convincingly, although not well, but they are abysmal at
tic-tac-toe. They can explain simulated annealing perfectly well, but
can’t tell me reliably which countries in Europe start with a ‘Q’.
Generally speaking, it seems to be hard to predict or intuit whether one
of the current systems we have available will be good at a task without
just trying it out. And of course, much harder still to predict what
they will be good at in the future.</p>
<p>But I do believe that we have reasonable grounds for at least a
certain amount of skepticism about whether really strong reasoning
capabilities are around the corner. First, when trying to get LLMs to
produce philosophical reasoning, it is common that they struggle to
transfer argument schemes to novel contexts and to generalize them to
domains that are not commonly used as examples in the literature. It
also seems hard to keep them arguing a coherent point and to maintain
truth and consistency through prolonged arguments. Finally, when
simulating philosophical debates between multiple LLM agents, I have
found them to be extremely stereotypical, repeating mostly stale
commonplaces, and failing to come up with novel argumentative
patterns—experiences which are in line with at least some lines of
research that question current systems’ abstract reasoning
capabilities.<a href="#fn3" class="footnote-ref" id="fnref3"
role="doc-noteref"><sup>3</sup></a></p>
<p><h3>A lower bound</h3></p>
<p>But publicly predicting that contemporary AI systems are unable to
ever achieve this or that specific task has been a good method to force
oneself to a public correction a few months later, or to stubbornly
remain denying the obvious in increasingly ridiculous fashion.<a
href="#fn4" class="footnote-ref" id="fnref4"
role="doc-noteref"><sup>4</sup></a></p>
<p>Therefore, instead of making any strong claims about the abilities of
current or future AI systems, I suggest that the most
<em>productive</em> way forward to consider the potential of automation
of philosophy is to articulate what is reliably achievable with the
systems that we have available now, and what plays into their current
abilities and sidesteps their faults. As I have mentioned, a general
formulation for the abilities of large language models will likely not
be forthcoming. But a most uncontroversial provisional formulation
instead might be something like: These systems excel at
medium-complexity language tasks, which are similar to tasks solved in
everyday language or well represented in public code bases. And of
course, as computer systems, they excel in doing these tasks over and
over again, many thousands of times.</p>
<p>The question we need to answer is thus, how philosophy might profit
from a process of automation that plays into these precise strengths.
Answering this question will provide us with a firm, unspeculative lower
bound of what is possible in the automation of philosophy.</p>
<p><h3>Making it more concrete</h3></p>
<p>I have given some reason to think that the most likely short-term
role for artificial intelligence within philosophy is not going to be
through independently reasoning non-human intelligences that directly
produce philosophy in a way that is on par, or superior to what we are
able to do now. Rather, I argued, philosophy will be altered by the
ability of artificial intelligence to integrate and structure large
quantities of thought, which might drastically increase the cohesion of
our collective philosophical enterprise. But this proposal might seem
somewhat abstract. To give a more concrete idea of what I am thinking
about, I have conducted a little pilot study.</p>
<p>For this pilot study, I have scraped the whole open-access
bibliography of “philosophy of artificial intelligence” available on the
PhilArchive.<a href="#fn5" class="footnote-ref" id="fnref5"
role="doc-noteref"><sup>5</sup></a> I have also searched for all
articles containing ‘artificial intelligence’, ‘machine learning’, or
‘deep learning’ in the last 20 years among ten highly reputable
Anglophone philosophy journals and integrated them into my dataset.</p>
<p>I then filtered out unusable texts—texts that were obviously not
philosophy<a href="#fn6" class="footnote-ref" id="fnref6"
role="doc-noteref"><sup>6</sup></a>, texts that were badly OCRed, and
texts that were not in the English language, leaving me with a sample of
1,025 full-text articles.</p>
<p>In the first pass, I used GLiNER-large<a href="#fn7"
class="footnote-ref" id="fnref7" role="doc-noteref"><sup>7</sup></a>, a
flexible LLM-based Named Entity Recognition system, to search for
entities that conformed to the working definition of “a philosophical
theory or philosophical position, a view that attempts to explain or
account for a particular problem in philosophy, or a named
argument.”</p>
<p>This first pass extracted from each article a number of relatively
low-quality but passable candidate positions, things like ‘naturalized
moral psychology’, ‘naturalistic framework’, ‘moderate defense’, ‘human
nature is bad’, ‘schools of thought’, ‘Confucian tradition’, etc.</p>
<p>In a second pass, these articles were fed to GPT-4o, which searched
them for philosophical positions and parsed them into a structured data
format, which contained a label for the position, a definition of the
position that had to be drawn from the text, a number between -1 and 1
indicating whether the author was arguing in favor or against the
position, with 0 indicating neutrality, and the exact passage at which
this stance became apparent. The initial candidate positions extracted
by GLiNER were used to identify potentially relevant candidate
positions. These were then fed to GPT-4o to keep the naming in the
dataset consistent.</p>
<p>At the end of this process, for our 1,025 papers, I had gathered a
total of 6,059 distinct positions, which contained named positions and
arguments like ‘functionalism’, ‘computationalism’, ‘the Chinese room
argument’, ‘connectionism’, etc.</p>
<p>There are a number of potentially interesting analyses that are now
possible on this dataset. But for this pilot project, I conducted an
overview mapping, in which I combined two nearest-neighbor graphs, one
that linked articles to articles with similar position profiles
(articles arguing for, and denying the same things), and one based on
semantic similarity, which was determined via embeddings produced
through the all-mpnet-base-v2<a href="#fn8" class="footnote-ref"
id="fnref8" role="doc-noteref"><sup>8</sup></a> language model. These
graphs were combined, resulting in a new graph in which thematically
similar texts are moved close together in the global picture, while
groups of texts that are thematically similar but argue for different
position profiles are locally split apart. This combined graph was then
reweighted and laid out using uniform manifold approximation and
projection in two dimensions.<a href="#fn9" class="footnote-ref"
id="fnref9" role="doc-noteref"><sup>9</sup></a></p>
<p>I then applied an hDBSCAN, a clustering algorithm, to this layout and
marked the most relevant positions for each cluster on the
two-dimensional layout. The results can be explored below:</p>
<figure>
<a href="http://aiimpacts.org/wp-content/uploads/2024/10/ai_phil_positions.png"><img decoding="async" src="http://aiimpacts.org/wp-content/uploads/2024/10/ai_phil_positions.png"
alt="Position-clusters in the philosophical literature on AI (Pilot Study)." /></a>
<figcaption aria-hidden="true"><strong>Position-clusters in the philosophical
literature on AI (Pilot Study).</strong></figcaption>
</figure>
<br>
<p>The clusters are marked with dashed lines, the grey points represent
individual papers, and the positions are marked on top of them, with
blue positions being those that are positively held by the authors in
the cluster and red positions being denied.</p>
<p>We note that the map reproduces a sensible structure of the whole
field with questions that relate artificial intelligence to the
philosophy of mind towards the upper left, questions that relate to the
moral status of artificial intelligence towards the center right, and
questions about the societal impact of artificial intelligence and the
connected ethical questions towards the bottom.</p>
<p>We also note that in quite a few instances we find clearly
oppositional local structures, for example with clusters denying or
appraising the Chinese room argument together with the appropriate
associated stances on computationalism or universal realizability
(middle-top). Similar things are true for functionalism (left-middle),
as well as utilitarianism and virtue ethics (right-lower middle).</p>
<p><strong>Philosophical relevance</strong></p>
<p>I think that this pilot shows that, while quite a bit of additional
work is evidently needed, contemporary LLM systems can reliably parse
large amounts of philosophical text into structured representations that
can be used to map out the argumentative landscapes. And while this is
not automated philosophical reasoning, this is certainly not nothing.<a
href="#fn10" class="footnote-ref" id="fnref10"
role="doc-noteref"><sup>10</sup></a> We commonly think about philosophy
as a large, intractable net of interlinked arguments, where each single
premise, if denied or accepted, has numerous implications for others,
opening and closing paths to various positions—with philosophy arguably
being the collective task of maintaining and refining this
structure.</p>
<p>But this structure is never made explicit, and each philosopher,
somewhat lonely, tends to produce philosophy in essays, which add to
this whole structure only in a very local and convoluted fashion. The
largest promise of automated philosophy as we can foresee it at this
point, is thus to make this process explicit, to draw out the collective
structure into the open, make it accessible, and, to borrow a phrase
from Kant, to find a novel way <em>to orient ourselves in
thinking.</em><a href="#fn11" class="footnote-ref" id="fnref11"
role="doc-noteref"><sup>11</sup></a></p>
<p><em>I want to thank Christopher Zosh, Scott Page, Johannes Marx, John
Miller, Melanie Mitchell, Arseny Moskvichev, and Robert Ward as well as
my advisors Dominik Klein and Erik Stei for helpful discussions during
the preparation of this essay. This project is part of my PhD at the
department of theoretical philosophy at Utrecht University, and will be
available soon in article form, alongside the code. Feel free to get in
touch or learn more about my work via <a
href="https://www.maxnoichl.eu/">https://www.maxnoichl.eu/</a></em></p>
<p><h3>Footnotes</h3></p>
<section id="footnotes" class="footnotes footnotes-end-of-document"
role="doc-endnotes">
<hr />
<ol>
<li id="fn1"><p>Disclosure of AI usage: OpenAI’s GPT-4o was used as a
coding assistant. OpenAI’s Whisper model was used to partially dictate
this essay. OpenAI’s GPT-4o API was used for the presented analysis.<a
href="#fnref1" class="footnote-back" role="doc-backlink"><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/21a9.png" alt="↩" class="wp-smiley" style="height: 1em; max-height: 1em;" />︎</a></p></li>
<li id="fn2"><p>All the (informal) tests I have made in the process of
writing this essay have been conducted on Anthropic’s Claude Opus model,
OpenAI’s GPT-4o &amp; GPT-4 Turbo, and Meta’s Llama 3 70B.<a
href="#fnref2" class="footnote-back" role="doc-backlink"><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/21a9.png" alt="↩" class="wp-smiley" style="height: 1em; max-height: 1em;" />︎</a></p></li>
<li id="fn3"><p>Lewis and Mitchell. <em>Using Counterfactual Tasks to
Evaluate the Generality of Analogical Reasoning in Large Language
Models.</em> 2024. arXiv: <a
href="https://arxiv.org/abs/2402.08955">2402.08955</a>; Moskvichev,
Odouard, and Mitchell. <em>The ConceptARC Benchmark: Evaluating
Understanding and Generalization in the ARC Domain.</em> 2023. arXiv: <a
href="https://arxiv.org/abs/2305.07141">2305.07141</a><a href="#fnref3"
class="footnote-back" role="doc-backlink">&#x21a9;︎</a></p></li>
<li id="fn4"><p>E. g. OpenAI’s o1-preview model was released a few weeks
after the writing of this piece, and apparently achieved a marked
increase in reasoning capabilities. When I gave it the final draft of
this article to look over for typos, it did flag the “European countries
with Q”-example I gave earlier as misleading, as there are, as
o1-preview correctly noted, no such countries. I nonetheless currently
belief that the arguments in this essay still hold.<a href="#fnref4"
class="footnote-back" role="doc-backlink"><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/21a9.png" alt="↩" class="wp-smiley" style="height: 1em; max-height: 1em;" />︎</a></p></li>
<li id="fn5"><p>Philosophy of Artificial Intelligence &#8211; Bibliography
edited by Eric Dietrich &#8211; <a
href="https://philarchive.org/browse/philosophy-of-artificial-intelligence">PhilArchive</a>
<em>(accessed: 8.7.2024)</em><a href="#fnref5" class="footnote-back"
role="doc-backlink">&#x21a9;︎</a></p></li>
<li id="fn6"><p>Many material machine-learning and computer-interaction
articles somehow end up on PhilPapers, as well as a large collection of
random things.<a href="#fnref6" class="footnote-back"
role="doc-backlink"><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/21a9.png" alt="↩" class="wp-smiley" style="height: 1em; max-height: 1em;" />︎</a></p></li>
<li id="fn7"><p>Zaratiana et al. *GLiNER: Generalist Model for Named
Entity Recognition Using Bidirectional Transformer. 2023. arXiv: <a
href="https://arxiv.org/abs/2311.08526">2311.08526</a><a href="#fnref7"
class="footnote-back" role="doc-backlink">&#x21a9;︎</a></p></li>
<li id="fn8"><p>Using the accessible implementation provided by Reimers
and Gurevych. Sentence-BERT: Sentence Embeddings Using Siamese
BERT-Networks. 2019. arXiv: <a
href="https://arxiv.org/abs/1908.10084">1908.10084</a><a href="#fnref8"
class="footnote-back" role="doc-backlink">&#x21a9;︎</a></p></li>
<li id="fn9"><p>McInnes, Healy, and Melville. <em>UMAP: Uniform Manifold
Approximation and Projection for Dimension Reduction.</em> 2018. arXiv:
<a href="https://arxiv.org/abs/1802.03426">1802.03426</a><a
href="#fnref9" class="footnote-back" role="doc-backlink">&#x21a9;︎</a></p></li>
<li id="fn10"><p>And as a potential high-level interface between AI
generated material and philosophers, it might also be crucial for the
development of computer assisted philosophy, if the reasoning
capabilities of the AI systems were to drastically improve.<a
href="#fnref10" class="footnote-back" role="doc-backlink"><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/21a9.png" alt="↩" class="wp-smiley" style="height: 1em; max-height: 1em;" />︎</a></p></li>
<li id="fn11"><p>Kant. Was Heißt: Sich Im Denken Orientiren? 1786. <a
href="https://korpora.org/Kant/aa08/131.html">Source</a><a
href="#fnref11" class="footnote-back" role="doc-backlink">&#x21a9;︎</a></p></li>
</ol>
</section>

]]></content:encoded>
					
		
		
			</item>
		<item>
		<title>Machines and Moral Judgment</title>
		<link>http://aiimpacts.org/machines-and-moral-judgment/</link>
		
		<dc:creator><![CDATA[Katja Grace]]></dc:creator>
		<pubDate>Sun, 27 Oct 2024 17:24:20 +0000</pubDate>
				<category><![CDATA[Essay Competition on the Automation of Wisdom and Philosophy]]></category>
		<guid isPermaLink="false">https://aiimpacts.org/?p=3635</guid>

					<description><![CDATA[By Jacob Sparks This was a prize-winning entry into the Essay Competition on the Automation of Wisdom and Philosophy. §1 Good AGI The explicit goal of most major AI labs is to create artificial general <a class="mh-excerpt-more" href="http://aiimpacts.org/machines-and-moral-judgment/" title="Machines and Moral Judgment"></a>]]></description>
										<content:encoded><![CDATA[
<p>By Jacob Sparks</p>



<p><em>This was a prize-winning entry into the Essay Competition on the Automation of Wisdom and Philosophy.</em></p>



<h3 class="wp-block-heading">§1 Good AGI</h3>



<p>The explicit goal of most major AI labs is to create artificial general intelligence (AGI): machines that can assist us across a wide range of tasks. Additionally, they all want to build systems that are safe, fair and beneficial to their users – machines that are <em>good</em>. But, building machines that are both generally intelligent and good requires building machines that can “think” about what’s good, that make their own moral judgments. And this raises both philosophical and technical questions that we have barely started to address.</p>



<h3 class="wp-block-heading">§2 What is a Moral Judgment?</h3>



<p>Moral judgments, in the sense I intend, are judgments with <em>moral content</em>. They are about what is right or good in a non-reductive sense. Judgments of this kind are philosophically puzzling. They are where thought becomes practical, where the cognitive and conative aspects of intelligence come together. They raise difficult questions: how are they related to motivation and action? Can they be said to be true or false? If so, is their truth objective, or is it determined ultimately by our attitudes? What is the proper method for resolving disputes about them? What are they even about?</p>



<p>In machine ethics, “moral judgment” often refers to any kind of judgment that is morally significant. In this sense, we can speak of the “moral judgments” current AI systems make when determining risk scores, diagnosing disease, or driving a car. Or we could talk more speculatively about the “moral judgments” machines would need to make to determine a criminal sentence, treat a disease, or buy a car on your behalf. Asking if machines can make these kinds of “moral judgments” is really just asking about their trustworthiness performing these morally significant tasks. But nothing in these debates touches on the question of how machines can make moral judgments in the sense I intend.</p>



<p>When some speak of building “moral machines” or about “putting ethical principles into machines” they are thinking about building systems that act in accordance with some particular ethical theory – Utilitarian, Rossian, Kantian, Contractualist, etc. The major debate here is whether these theories can be expressed in sufficiently precise ways to govern the behavior of an AI. But machines built along these lines would not be making moral judgments, in my sense, even if their behavior was “moral” according to one or more of these theories. If someone slavishly interpreted every moral question as being about utility maximization, prima facie duty satisfaction, maxim universalizability, hypothetical consent, etc., and failed to see that, whatever their merits, none of these theories captures the <em>meaning</em> of “good” or “right,” they would fail to make a moral judgment. One must be able to wonder if, after all, it would be right or good to maximize, satisfy, etc.<span id='easy-footnote-1-3635' class='easy-footnote-margin-adjust'></span><span class='easy-footnote'><a href='http://aiimpacts.org/machines-and-moral-judgment/#easy-footnote-bottom-1-3635' title='Some have also looked at particularist approaches to building “moral machines.” According to particularism, there are no useful general moral principles. So on these views, we would need to find ways for machines to learn what is right or good that didn’t involve the use of such principles. The point I’m making in this paragraph, however, would still remain: one could build these kinds of particularist “moral machines,” without building machines that make judgments with moral content.'><sup>1</sup></a></span>



<p>Even if we grant that one of these traditional moral theories is correct, I’m not making the trivial claim that good AGI requires machines getting it right about moral questions. There is no guarantee that when you make moral judgments you get it right. What’s important about moral reasoning is that it allows you to hold your own motivations at a distance, as presenting possibilities that you can choose to act on or not. Moral judgments are a way for the cognitive aspects of intelligence to shape the conative ones in a process that overcomes this reflective distance. If machines are going to behave well across the range of use cases intended for AGI, they’ll need to make these kinds of fallible and philosophically puzzling moral judgments. And to build such machines, we’ll have to learn much more about what moral judgments are and how they work.</p>



<h3 class="wp-block-heading">§3 Moral Judgments Are Strange</h3>



<p>Moral judgments are philosophically puzzling for two main reasons. The first has to do with their <em>form</em>. Like beliefs, they attempt to represent an independent reality. We’re trying to get it right when we make moral judgments. We want our moral judgments to be <em>true</em> and this gives them a “world to mind” direction of fit. But moral judgments are also like desires. They aim to change reality, and this gives them a “mind to world” direction of fit. We want to do what we judge to be right or good. We often act in accordance with and <em>because of</em> our moral judgments. They are, as some philosophers put it, “intrinsically motivating.” But, according to a widely accepted doctrine called “The Humean Theory of Motivation,” nothing could be both a belief and a desire, since each has a different direction of fit. According to the Humean Theory, beliefs and desires have a necessary and distinct role to play in the explanation of action. But moral judgments seem to muddy that distinction.</p>



<p>The second puzzle has to do with the <em>content</em> of moral judgments. Moral facts – the things we are judging about – seem to be both fully grounded in and yet somehow independent of natural facts. On the one hand, if something is good or right, it is good or right in virtue of other natural properties that it has. Every action that’s right is right <em>because</em> it keeps a promise, makes someone happy, relieves suffering, etc. That’s why constructing moral theories, where we attempt to characterize moral properties in terms of natural properties, is a project that makes sense. On the other hand, being good or right seems to be something above and beyond any natural property. However you explain why some action is right, you always mean something more by “right” than what you cite in your explanation. Even if some right act is right because it keeps a promise, when you call it “right,” you don’t just mean “keeps a promise.”&nbsp; Otherwise you’d just be repeating yourself. Moreover, we all recognize that sometimes it isn’t right to keep a promise. So, how could anything have the content that moral judgments purport to have, something that is both grounded in and independent of non-moral facts?</p>



<p>There are many who attempt to resolve these puzzles and their attempts comprise most of what philosophers call “metaethics.” Some metaethicists think moral judgments really are just desires (with no objective correctness conditions), or really are just beliefs (with no intrinsic motivation), or both (denying the Humean Theory). Some think the contents of moral judgments really are just natural facts (and so not independent of natural facts) or that some of them are not dependent on any natural fact (and so not grounded in natural facts). But even if we accept one or another of these solutions, we shouldn’t lose our appreciation of the initial puzzles. These puzzles show us that moral judgments are theoretically strange, but they also show us how and why moral judgments are practically important.</p>



<p>The capacity to make moral judgments involves a kind of active reflection. When we think about what’s good or right, we are stepping back and taking stock of our inclinations. Whatever we might want or intend to do, we can ask, “yes, but would it be good?” No matter how we describe our action we can ask, “yes, but would it be right?” And, importantly, how we answer those questions matters to us and to what we do. Moral judgments allow us to ask potent questions about any motivation or any description under which we might act. They give us both a kind of freedom from our inclinations and an external standard for our actions to live up to. Without the capacity to think in this way, we’d be like animals.</p>



<p>If machines could make moral judgments, they too would have a kind of freedom. Some might find that problematic. They would prefer generally intelligent machines to only pursue the goals we give them or to be otherwise bound to human needs, desires, and aims. But machines that made moral judgments would also hold themselves to a standard that is independent of any of their (or our) motivations. And that is precisely what a machine needs to do in order to be a good AGI.</p>



<h3 class="wp-block-heading">§4 Good AGI Requires Moral Judgment</h3>



<p>The basic argument that good AGI requires the capacity to make moral judgments involves a generalization of what Stuart Russell calls “The King Midas Problem.” Midas came to regret his wish that everything he touched turn to gold when his food, drink and daughter were turned to gold as well.&nbsp;</p>



<p>In the context of AGI, Russell uses this allegory to illustrate the idea that “the achievement of … any fixed objective can result in arbitrarily bad outcomes.” Tell an intelligent machine to cure cancer, and it might induce tumors in every human to be able to conduct more experiments; tell the machine to get you from A to B as quickly as possible, and it might jostle you catastrophically, etc. Russell’s solution to this problem is to build what he calls “beneficial AI.” These are machines designed to achieve, not some fixed objective, but our objectives. According to Russell, the machine’s only<em> </em>goal should be to satisfy our preferences, it should be uncertain about what those preferences are and should learn about our preferences by observing our behavior.</p>



<p>Russell’s approach is promising. Machines designed along these lines partially avoid the King Midas Problem, since we don’t need to specify any objective for them. But it is only partial avoidance. Humans can have preferences for all manner of terrible things, and optimizing on any objective, even one that remains unspecified and must be learned, can have disastrous results. Even when we take our aggregate or collective preferences, optimizing for their satisfaction can lead to very bad outcomes. At various times in history, the collective preferred to put some people in subservient roles on the basis of their gender or race. Today we collectively prefer to treat animals in horrific ways.</p>



<p>Russell is aware of this issue. He asks, “what should machines learn from humans who enjoy the suffering of others?” His answer is that, since these kinds of evil preferences would involve the frustration of other human preferences, there will naturally be some discount rate on their satisfaction. The only real question Russell sees here is about the balance between loyal AI that focuses exclusively on the preferences of some person or set of persons, and utilitarian AI that tries to maximize everyone’s utility.</p>



<p>This response (as well as Russell’s choice to call his approach “Beneficial AI”) indicates a failure to appreciate the difference between the non-moral question, “Does it satisfy a preference?” and the moral question, “Is it good?” This distinction is essential. Evil preferences should count for nothing, even if everyone shares them. All objectives, even ones machines learn from humans, should be subject to the kind of reflective scrutiny inherent to moral thought.</p>



<p>When machines operate in narrow contexts, the meaning of a term like “good” can be given a sufficiently reductive analysis. Playing chess, assuming we’re trying to win, a good chess move just is a move that makes winning more likely. But AGI does not operate in a narrow context. A good move for a generally intelligent machine cannot be specified – that is Russell’s insight. But neither can a good move for a generally intelligent machine simply be read off human preferences. When we’re talking about the wide context of AGI, the only move that is always good is a good move. If an AGI can’t work with some non-reductive sense of “good,” it won’t be a good AGI.</p>



<h3 class="wp-block-heading">§5 But How?</h3>



<p>Unfortunately, it isn’t at all clear what we’re doing when we think something is good and it isn’t clear how to build machines that can do the same. I’ve said moral judgment involves a kind of active reflection. But what can bring reflection to an end? And how can any reflection affect<em> </em>what it reflects? Importantly, in answering these questions and characterizing moral judgment, we can’t be content with the kinds of answers philosophers usually give. To hear that a moral judgment is a certain type of belief or a certain type of desire does not help us design artificial agents that can make such judgments. We need to speak the language of the people building AGI. However, since metaethicists tend to disagree about the details and since expressing philosophical theories of moral judgment in the precise terms required by computer science is exceptionally difficult, what I say here will be highly speculative.</p>



<p>One potentially promising paradigm comes from reinforcement learning (RL). Reinforcement agents learn to maximize a reward by interacting with their environment. They have an ability to sense the state of their environment and to take actions to affect that environment. Their goals are represented by a reward function that returns some value for each possible &lt;state, action&gt; pair. The central assumption of reinforcement learning – sometimes called the reward hypothesis – is that any goal can be represented as an attempt to maximize some suitably chosen reward function. Doing what’s right might be thought of as the ultimate goal of any agent capable of moral judgment. So, if the reward hypothesis is correct, there should be some RL agent who succeeds in making moral judgments.</p>



<p>There are many different variations on the basic learning problem faced by reinforcement agents. The environment may be deterministic or stochastic. The agent may or may not have a model of the environment that predicts what transitions will take place given various actions. The agent may balance present and future reward in different ways. The policy that agents use to select an action may be deterministic, selecting a specific action for each state of the environment, or stochastic, selecting a probability distribution over actions for each state. The reward agents receive may come with greater or lesser frequency. The agent may or may not have a value function that predicts future reward, given a specific policy. Which kind of RL agent, operating in which kind of environment, would succeed in making moral judgments? Where in these formalisms can we locate the moral judgment?&nbsp;</p>



<p>A reinforcement agent’s reward function is something that is both “objective” in the sense that it isn’t determined by the agent and “intrinsically motivating” in that it determines the policy the agent learns and the actions they ultimately take. However, an agent who has a specified reward function seems to lack the kind of agency required to make moral judgments, since they lack reflective distance from the goal of maximizing their specified reward.</p>



<p>This is similar to the problem of trying to build “moral machines” by using supervised learning to predict the moral judgments of humans. Systems designed along these lines would not be holding their own motivations at arm’s length in the way moral judgment requires. Moreover, this approach risks calcifying moral thought, since machines would be aping the moral judgments of imperfect humans at a particular time and place. True moral reasoning is more dynamic and adaptive.</p>



<p>More promising would be RL agents who were uncertain about their reward function and had to learn about it through their actions. This is what Russell proposes. But the nature of this uncertainty is critical. On Russell&#8217;s view, machines should be <em>initially</em> uncertain about their reward function and should learn about it by observing human behavior. He admits that, with enough observation, an RL agent may become completely confident about the human reward it aims to maximize. However, these kinds of agents would lack the kind of reflective distance characteristic of moral judgment. Even if a machine is certain that some course of action would maximize human reward, it should still be able to ask if it is right to pursue it.</p>



<p>We could imagine agents who are always uncertain about the reward they are trying to maximize. But what kind of uncertainty is needed? Is it the kind of uncertainty we can express as a probability distribution over different reward functions or is it a deeper kind of uncertainty that resists such characterization? What mechanism can assure that some degree of uncertainty persists? How should machines choose a policy given the persistent kind of uncertainty that moral concepts seem to engender?</p>



<p>Even if we had satisfactory questions to these answers, other complications remain. Unlike the contents of our moral judgments, an RL agent’s reward is not something that is both grounded in, and also independent of, the environment. Likewise, while an RL agent’s predictions about reward – its value function – shares some features with moral judgment in being both belief-like and desire-like, it doesn’t seem to achieve the reflective distance indicative of moral judgment. An agent’s value function is not a way for them to hold a mirror up to their own motivations and decide which to endorse and which to reject.</p>



<p>Finally, in applications of RL, actions are usually individuated in simple ways – a move in a chess game, selecting the next word, or the next piece of content, etc. But when humans act, our actions are individuated by the knowledge, motives and intentions we bring to them. One and the same move in the chess game might be a blunder, a way to keep the game interesting, a kindness shown to a child, or an attempt to hustle an opponent. If we want to build machines that make moral judgments, we will need to think about their actions in more sophisticated ways.</p>



<h3 class="wp-block-heading">§6 The Path Forward</h3>



<p>Despite the concerns I’ve raised, I see no reason to think building machines that make moral judgments is impossible. We may be able to find computationally useful notions of agent, action, reward, value and uncertainty that will allow us to build machines that have reflective distance from their own motivations and that hold themselves to an external standard that resists specification. If we are going to progress along the path to good AGI, we need to confront the philosophical puzzles raised by moral judgment in the unfamiliar context of machine learning. This project is just beginning.</p>



<h2 class="wp-block-heading">Notes</h2>
]]></content:encoded>
					
		
		
			</item>
		<item>
		<title>Towards the Operationalization of Philosophy &#038; Wisdom</title>
		<link>http://aiimpacts.org/towards-the-operationalization-of-philosophy-wisdom/</link>
		
		<dc:creator><![CDATA[Katja Grace]]></dc:creator>
		<pubDate>Sun, 27 Oct 2024 17:21:02 +0000</pubDate>
				<category><![CDATA[Essay Competition on the Automation of Wisdom and Philosophy]]></category>
		<guid isPermaLink="false">https://aiimpacts.org/?p=3634</guid>

					<description><![CDATA[By Thane Ruthenis This was a prize-winning entry into the Essay Competition on the Automation of Wisdom and Philosophy. Summary Philosophy and wisdom, and the processes underlying them, currently lack a proper operationalization: a set <a class="mh-excerpt-more" href="http://aiimpacts.org/towards-the-operationalization-of-philosophy-wisdom/" title="Towards the Operationalization of Philosophy &#38; Wisdom"></a>]]></description>
										<content:encoded><![CDATA[
<p>By Thane Ruthenis</p>



<p><em>This was a prize-winning entry into the Essay Competition on the Automation of Wisdom and Philosophy.</em></p>



<h2 class="wp-block-heading">Summary</h2>



<p>Philosophy and wisdom, and the processes underlying them, currently lack a proper <em>operationalization</em>: a set of robust formal or semi-formal definitions. If such definitions were found, they could be used as the foundation for a strong methodological framework. Such a framework would provide clear guidelines for how to engage in high-quality philosophical/wise reasoning and how to evaluate whether a given attempt at philosophy or wisdom was a success or a failure.</p>



<p>To address that, I provide candidate definitions for philosophy and wisdom, relate them to intuitive examples of philosophical and wise reasoning, and offer a tentative formalization of both concepts. The motivation for this is my belief that the lack of proper operationalization is the main obstacle to both (1) scaling up the work done in these domains (i. e., creating a bigger ecosystem that would naturally attract funding), and (2) automating them.</p>



<p>The discussion of philosophy focuses on the tentative formalization of a specific <em>algorithm</em> that I believe is central to philosophical thinking: the algorithm that allows humans to derive novel ontologies (conceptual schemes). Defined in a more fine-grained manner, the function of that algorithm is “deriving a set of assumptions using which a domain of reality could be decomposed into subdomains that could be studied separately”.</p>



<p>I point out the similarity of this definition to <a href="https://www.lesswrong.com/posts/gvzW46Z3BsaZsLc25/natural-abstractions-key-claims-theorems-and-critiques-1">John Wentworth’s operationalization of natural abstractions</a>, from which I build the formal model.</p>



<p>From this foundation, I discuss the discipline of philosophy more broadly. I point out instances where humans seem to employ the “algorithm of philosophical reasoning”, but which don’t fall under the standard definition of “philosophy”. In particular, I discuss the category of research tasks varyingly called “qualitative” or “non-paradigmatic” research, arguing that the core cognitive processes underlying them are implemented using “philosophical reasoning” as well.</p>



<p>Counterweighting that, I define philosophy-as-a-discipline as a special case of such research. While “qualitative research” within a specific field of study focuses on decomposing the domain of reality <em>within</em> that field’s remit, “philosophy” focuses on decomposing reality-as-a-whole (which, in turn, produces the previously mentioned “specific fields of study”).</p>



<p>Separately, I operationalize wisdom as meta-level cognitive heuristics that take object-level heuristics for planning/inference <em>as inputs</em>, and output predictions about the real-world consequences of an agent which makes use of said object-level heuristics. I provide a framework of agency in which that is well-specified as “inversions of inversions of environmental causality”.</p>



<p>I close things off with a discussion of whether “human-level” and “superhuman” AIs would be wise/philosophical (arguing yes), and what options my frameworks offer regarding scaling up or automating both types of reasoning.</p>



<h2 class="wp-block-heading">1. Philosophical Reasoning</h2>



<p>One way to define philosophy is “the study of confusing questions”. Typical philosophical reasoning happens when you notice that you have some intuitions or nagging questions about a domain of reality which hasn’t already been transformed into a formal field of study, and you follow them, attempting to gain clarity. If successful, this often results in the creation of a new field of study focused solely on that domain, and the relevant inquiries stop being part of philosophy.</p>



<p>Notable examples include:</p>



<ul class="wp-block-list">
<li>Physics, which started as “natural philosophy”.</li>



<li>Chemistry, which was closely related to a much more philosophical “alchemy”.</li>



<li>Economics, rooted in moral philosophy.</li>



<li>Psychology, from philosophy of mind.</li>
</ul>



<p>Another field that serves as a good example is <a href="https://intelligence.org/files/TechnicalAgenda.pdf">agent foundations</a><span id='easy-footnote-1-3634' class='easy-footnote-margin-adjust'></span><span class='easy-footnote'><a href='http://aiimpacts.org/towards-the-operationalization-of-philosophy-wisdom/#easy-footnote-bottom-1-3634' title='A niche field closely tied to AI research, which attempts to formalize the notion of generally intelligent agents capable of pursuing coherent goals across different contexts, domains, and time scales.'><sup>1</sup></a></span>, for those readers familiar with it.</p>



<p>One notable feature of this process is that the new fields, once operationalized, become decoupled from the rest of reality by certain assumptions. A focus on laws that apply to all matter (physics); or on physical interactions of specific high-level structures that are only possible under non-extreme temperatures and otherwise-constrained environmental conditions (chemistry); or on the behavior of human minds; and so on.</p>



<p>This isolation allows each of these disciplines to be studied <em>separately</em>. A physicist doesn’t need training in psychology or economics, and vice versa. By the same token, a physicist mostly doesn’t need to engage in interdisciplinary philosophical ponderings: the philosophical work that created the field has already laid down the conceptual boundaries beyond which physicists mostly don’t <em>need</em> to go.</p>



<p>The core feature underlying this overarching process of philosophy is the aforementioned “philosophical reasoning”: the cognitive algorithms that implement our ability to <em>generate</em> valid decompositions of systems or datasets. Formalizing these algorithms should serve as the starting point for operationalizing philosophy in a more general sense.</p>



<h3 class="wp-block-heading">1A. What Is an Ontology?</h3>



<p>In the context of this text, an “ontology” is a decomposition of some domain of study into a set of higher-level concepts, which characterize the domain in a way that is compact, comprehensive, and could be used to produce models that have high predictive accuracy.</p>



<p>In more detail:</p>



<ul class="wp-block-list">
<li><strong>“Compactness”:</strong> The ontology has fewer “moving parts” (concepts, variables) than a full description of the corresponding domain. Using models based on the ontology for making predictions requires a dramatically lower amount of computational or cognitive resources, compared to a “fully detailed” model.</li>



<li><strong>“Accuracy”</strong>: An ontology-based model produces predictions about the domain that are fairly accurate at a high level, or have a good upper bound on error.</li>



<li><strong>“Comprehensive”</strong>: The ontology is valid for all or almost all systems that we would classify as belonging to the domain in question, and characterizes them according to a known, finite family of concepts.</li>
</ul>



<p>Chemistry talks about atoms, molecules, and reactions between them; economics talks about agents, utility functions, resources, and trades; psychology recognizes minds, beliefs, memories, and emotions. An ontology answers the question of <em>what</em> you study when you study some domain, characterizes the joints along which the domain can be carved and which questions about it are meaningful to focus on. (In this sense, it’s similar to the philosophical notion of a “conceptual scheme”, although I don’t think it’s an exact match.)</p>



<p>Under this view, deriving the “highest-level ontology” – the ontology for the-reality-as-a-whole – decomposes reality into a set of concepts such as “physics”, “chemistry”, or “psychology”. These concepts explicitly classify which parts of reality could be viewed as their instances, thereby decomposing reality into domains that could be studied separately (and from which the <em>disciplines</em> of physics, chemistry, and psychology could spring).</p>



<p>By contrast, on a lower level, arriving at the ontology of some specific field of study allows you to decompose it into specific sub-fields. These subfields can, likewise, be studied mostly separately. (The study of gasses vs. quantum particles, or inorganic vs. organic compounds, or emotional responses vs. memory formation.)</p>



<p>One specific consequence of the above desiderata is that <a href="https://www.lesswrong.com/posts/nLhHY2c8MWFcuWRLx/good-ontologies-induce-commutative-diagrams">good ontologies commute</a>.</p>



<p>That is, suppose you have some already-defined domain of reality, such as “chemistry”. You’d like to further decompose it into sub-domains. You take some system from this domain, such as a specific chemical process, and derive a prospective ontology for it. The ontology purports to decompose the system into a set of high-level variables plus compactly specified interactions between them, producing a predictive model of it.</p>



<p>If you then take a different system from the same domain, <em>the same ontology</em> should work for it. If you talked about “spirits” and “ether” in the first case, but you need to discuss “molecules” and “chemical reactions” to model the second one, then the spirits-and-ether ontology doesn’t suffice to capture the entire domain. And if there are <em>no</em> extant domains of reality which are well-characterized by your ontology – if the ontology of spirits and ether was derived by “overfitting” to the behavior of the first system, and it fails to robustly generalize to other examples – then this ontology is a bad one.</p>



<p>The go-to historical example comes from the field of chemistry: <a href="https://en.wikipedia.org/wiki/Phlogiston_theory">the phlogiston theory</a>. The theory aimed to explain combustion, modeling it as the release of some substance called “phlogiston”. However, the theory’s explanations for different experiments implied contradictory underlying dynamics. In some materials, phlogiston was supposed to have positive mass (and its release decreased the materials’ weight); in others, negative mass (in metals, to explain why they <em>gained</em> weight after being burned). The explanations for its interactions with air were likewise ad-hoc, often invented <em>post factum</em> to rationalize an experimental result, and essentially never to predict it. That is, they were overfit.</p>



<p>Another field worth examining here is agent foundations. The process of deriving a suitable ontology for it hasn’t yet finished. Accordingly, it is plagued by questions of what concepts / features it should be founded upon. Should we define agency from idealized utility-maximizers, or should we define it <a href="https://www.lesswrong.com/posts/moi3cFY2wpeKGu9TT/clarifying-the-agent-like-structure-problem">structurally</a>? Is consequentialism-like goal-directed behavior even <a href="https://www.lesswrong.com/s/nyEFg3AuJpdAozmoX">the right thing to focus on</a>, when studying real-world agent-like systems? <a href="https://www.lesswrong.com/posts/gQY6LrTWJNkTv8YJR/the-pointers-problem-human-values-are-a-function-of-humans">What formal definition do the “values” of realistic agents have?</a></p>



<p>In other words: what is the set of variables which serve to compactly and comprehensively characterize and model <em>any</em> system we intuitively associate with “agents”, the same way chemistry can characterize any chemical interaction in terms of molecules and atoms?</p>



<p>Another telling example is mechanistic interpretability. Despite being a very concrete and empirics-based field of study, it likewise involves attempts to derive a novel ontology for studying neural networks. Can individual neurons be studied separately? The evidence suggests otherwise. If not, what <em>are</em> the basic “building blocks” of neural networks? We can <a href="https://www.lesswrong.com/posts/sxhfSBej6gdAwcn7X/coordinate-free-interpretability-theory">always</a> decompose a given set of activations into sparse components, but what decompositions would be robust, i. e., <a href="https://www.lesswrong.com/posts/TTTHwLpcewGjQHWzh/what-is-the-true-name-of-modularity">applicable to all forward passes</a> of a given ML model? (<a href="https://transformer-circuits.pub/2023/monosemantic-features/index.html">Sparse autoencoders</a> represent some progress along this line of inquiry.)</p>



<p>At this point, it should be noted that the process of deriving ontologies, which was previously linked to “philosophical reasoning”, seems to show up in contexts that are far from the traditional ideas of what “philosophy” is. I argue that this is not an error: we are attempting to investigate a cognitive algorithm that is core to philosophy-as-a-discipline, yet it’s not a given that this algorithm would show up <em>only</em> in the context of philosophy. (An extended discussion of this point follows in 1C and 1E.)</p>



<p>To summarize: Philosophical reasoning involves focusing on some domain of reality<span id='easy-footnote-2-3634' class='easy-footnote-margin-adjust'></span><span class='easy-footnote'><a href='http://aiimpacts.org/towards-the-operationalization-of-philosophy-wisdom/#easy-footnote-bottom-2-3634' title='Which was, itself, made independent from the rest of reality by assumptions produced by a higher-level instance of philosophical reasoning.'><sup>2</sup></a></span> to derive an ontology for it. That ontology could then be used to produce a “high-level summary” of any system from the domain, in terms of specific high-level variables and compactly specifiable interactions between them. This, in turn, allows to decompose this domain into further sub-domains.</p>



<h3 class="wp-block-heading">1B. Tentative Formalization</h3>



<p>Put this way, the definition could be linked to <a href="https://www.lesswrong.com/posts/gvzW46Z3BsaZsLc25/natural-abstractions-key-claims-theorems-and-critiques-1">John Wentworth’s definition of natural abstractions</a>.</p>



<p>The Natural Abstraction Hypothesis states that the real-world data are distributed such that, for any set of “low-level” variables <em>L</em> representing some specific system or set of systems, we can derive the (set of) high-level variable(s) <em>H</em>, such that they would serve as “natural latents” for <em>L</em>. That is: conditional on the high-level variables <em>H</em>, the low-level variables <em>L</em> would become (approximately) independent<span id='easy-footnote-3-3634' class='easy-footnote-margin-adjust'></span><span class='easy-footnote'><a href='http://aiimpacts.org/towards-the-operationalization-of-philosophy-wisdom/#easy-footnote-bottom-3-3634' title='The generalization of the framework explicitly able to handle approximation could be found &lt;a href=&quot;https://www.lesswrong.com/posts/dWQWzGCSFj6GTZHz7/natural-latents-the-math&quot;&gt;through this link&lt;/a&gt;.'><sup>3</sup></a></span>:</p>



<figure class="wp-block-image"><img decoding="async" src="https://lh7-rt.googleusercontent.com/docsz/AD_4nXevrp9trzrz54f6t43y_hsLuE7fde7AsQ_fq0rApsLxMstJBJVYhvO-4z4GpVY_Hdvawq7EkFnhEf3mz6mJfiBof-MDV9H6_W50aZW22Kt_KQbvIECzj_YYNKI0k_pu66q2AmP6isozIMRpFwIRYhagoxw?key=ogmKpL6fBJgn1Y18NUTnww" alt=""/></figure>



<figure class="wp-block-image"><img decoding="async" src="https://lh7-rt.googleusercontent.com/docsz/AD_4nXed6ztK0ywsCC_Wfe4Hrx8WMFfPwJFaRz2saB9FBGf4D6xgRRqPJDAiIpY3ufH780JkKXZAo_-0gVfFl7hHoyzRDcJmiCO93RSQwIvQbXdodj9fPErnjt-4sy6cXArV1dna4UfS_eXdFZ-vxgZ-IWnPmlFv?key=ogmKpL6fBJgn1Y18NUTnww" alt=""/></figure>



<p>(Where “\” denotes set subtraction, meaning <em>L </em>\ <em>L_i</em> is the set of all <em>L_k</em> except <em>L_i</em>.)</p>



<p>There are two valid ways to interpret <em>L</em> and <em>H</em>.</p>



<ul class="wp-block-list">
<li><a href="https://www.lesswrong.com/posts/vvEebH5jEvxnJEvBC/abstractions-as-redundant-information">The “bottom-up” interpretation</a>: <em>L_i</em> could be different parts of a specific complex system, such as small fragments of a spinning gear. <em>H</em> would then correspond to a set of high-level properties of the gear, such as its rotation speed, the mechanical and molecular properties of its material, and so on. Conditional on <em>H</em>, the individual <em>L_i</em> become independent: once we’ve accounted for the shared material, for example, the only material properties by which they vary are e. g. small molecular defects, individual to each patch.</li>



<li><a href="https://www.lesswrong.com/posts/N2JcFZ3LCCsnK2Fep/the-minimal-latents-approach-to-natural-abstractions">The “top-down” interpretation</a>: <em>L_i</em> could be different examples of systems belonging to some reference class of systems, such as individual examples of trees. <em>H</em> would then correspond to the general “tree” abstraction, capturing the (distribution over the) shapes of trees, the materials they tend to be made of, and so on. Conditional on <em>H</em>, the individual <em>L_i</em> become independent: the “leftover” variance are various contingent details such as “how many leaves this particular tree happens to have”.</li>
</ul>



<p>Per the hypothesis, the high-level variables <em>H</em> would tend to correspond to intuitive human abstractions. In addition, they would be “in the territory” and convergent – in the sense that <em>any</em> efficient agent (or agent-like system) that wants to model some chunk of the world would arrive at approximately <em>the same</em> abstractions for this chunk, regardless of the agent’s goals and quirks of its architecture. What information is shared between individual fragments of a gear, or different examples of trees, is some <em>ground-truth</em> fact about the systems in question, rather than something subject to the agent’s choice.<span id='easy-footnote-4-3634' class='easy-footnote-margin-adjust'></span><span class='easy-footnote'><a href='http://aiimpacts.org/towards-the-operationalization-of-philosophy-wisdom/#easy-footnote-bottom-4-3634' title='In theory, there might be some free parameters regarding the &lt;em&gt;exact&lt;/em&gt; representation an agent would choose, if there are several possible representations with the same size and predictive accuracy. Efforts to show that any two such representations would be importantly isomorphic to each other &lt;a href=&quot;https://www.lesswrong.com/posts/fJb8ryrMW5XfJaq7m/approximately-deterministic-natural-latents#Minimality__Maximality__and_Isomorphism_of_Deterministic_Natural_Latents&quot;&gt;are ongoing&lt;/a&gt;.'><sup>4</sup></a></span>



<p><a href="https://distill.pub/2020/circuits/zoom-in/#three-speculative-claims">The Universality Hypothesis</a> in machine-learning interpretability is a <a href="https://transformer-circuits.pub/2023/monosemantic-features/index.html#phenomenology-universality">well-supported</a> empirical complement of the NAH. While it doesn’t shed much light on what exact mathematical framework for abstractions we should use, it supplies strong evidence in favor of the NAH’s basic premise: that there’s <em>some</em> notion of abstraction which is convergently learned by agents and agent-like systems.</p>



<p>A natural question, in this formalism, is how to <em>pick</em> the initial set of low-level variables <em>L </em>for the ontology of which we’d be searching: how we know to draw the boundary around the gear, how we know to put only examples of trees into the set <em>L</em>. That question is currently open, although one simple way to handle it might be to simply search for a set such that it’d have a nontrivial natural latent <em>H</em>.</p>



<p>The NAH framework captures the analysis in the preceding sections well. <em>H</em> constitutes the ontology of <em>L</em>, creating conditional independence between the individual variables. Once <em>H</em> is derived, we can study each of <em>L_i</em> separately. (More specifically, we’d be studying <em>L_i</em> <em>conditioned on</em> <em>H</em>: the individual properties of a specific tree <em>in the context of</em> it being a tree; the properties of a physical system in the context of viewing it as a physical system.)</p>



<p>If the disagreement over the shape of <em>H</em> exists – if researchers or philosophers are yet to converge to the same <em>H</em> – that’s a sign that <em>no</em> proposed <em>H</em> is correct, that it fails to robustly induce independence between <em>L_i</em>. (Psychology is an illustrative example here: there are many extant ontologies purporting to characterize the human mind. But while most of them explain <em>some</em> phenomena, none of them explain <em>everything</em>, which leads to different specialists favoring different ontologies – and which is evidence that the correct framework is yet to be found.)</p>



<p>This definition could be applied iteratively: an ontology <em>H</em> would usually consist of a set of variables as well, and there could be a set of even-higher-level variables inducing independence between them. We could move from the description of reality in terms of “all elementary particles in existence” to “all atoms in existence”, and then, for example, to “all cells”, to “all organisms”, to “all species”. Or: “all humans” to “all cities” to “all countries”. Or: starting from a representation of a book in terms of individual sentences, we can compress it to the summary of its plot and themes; starting from the plots and themes of a set of books, we can derive common literary genres. Or: starting from a set of sensory experiences, we can discover some commonalities between these experiences, and conclude that there is some latent “object” depicted in all of them (such as compressing the visual experiences of seeing a tree from multiple angles into a “tree” abstraction). And so on.</p>



<p>In this formalism, we have two notable operations:</p>



<ol class="wp-block-list">
<li>Deriving <em>H</em> given some <em>L</em>.</li>



<li>Given some <em>L,</em><em>H</em>, and the relationship <em>P(L | H)</em>, propagating some target state “up” or “down” the hierarchy of abstractions.
<ul class="wp-block-list">
<li>That is, if <em>H</em> = <em>H*</em>, what’s <em>P(L | H = H*)</em>? Given some high-level state (macrostate), what’s the (distribution over) low-level states (microstates)?</li>



<li>On the flip side, if <em>L</em> = <em>L*</em>, what’s <em>P(H | L = L*)</em>? Given some microstate, what macrostate does it correspond to?</li>
</ul>
</li>
</ol>



<p>It might be helpful to think of <em>P(L | H)</em> and <em>P(H | L)</em> as defining functions for abstracting down <em>H → L</em> and abstracting up <em>L → H</em>, respectively, rather than as a probability distribution. Going forward, I will be using this convention.</p>



<p>I would argue that (2) represents the kinds of thinking, including highly intelligent and sophisticated thinking, which <em>do not correspond to philosophical reasoning</em>. In that case, we already have <em>H → L</em> pre-computed, the ontology defined. The operations involved in propagating the state up/down might be rather complex, but they’re ultimately “closed-form” in a certain sense.</p>



<p>Some prospective examples:</p>



<ul class="wp-block-list">
<li>Tracking the consequences of local political developments on the global economy (going “up”), or on the experiences of individual people (going “down”).</li>



<li>Evaluating the geopolitical impact of a politician ingesting a specific poisonous substance at a specific time (going “up”).</li>



<li>Modeling the global consequences of an asteroid impact while taking into account orbital dynamics, weather patterns, and chemical reactions (going “down” to physical details, then back “up”).</li>



<li>Translating a high-level project specification to build a nuclear reactor into specific instructions to be carried out by manufacturers (“down”).</li>



<li>Estimating the consequences of a specific fault in the reactor’s design on global policies towards nuclear power (“up”).</li>
</ul>



<p>As per the examples, this kind of thinking very much encompasses some domains of research and engineering.</p>



<p>(1), on the other hand, potentially represents <em>philosophical reasoning</em>. The question is: what specific cognitive algorithms are involved in that reasoning?</p>



<p>Intuitively, some sort of “babble-and-prune” brute-force approach seems to be at play. We need to semi-randomly test various possible decompositions, until ultimately arriving at one that is actually robust. Another feature is that this sort of thinking requires a wealth of <em>concrete examples</em>, a “training set” we have to study to derive the right abstractions. (Which makes sense: we need a representative sample of the set of random variables <em>L</em> in order to derive approximate conditional-independence relations between them.)</p>



<p>But given that, empirically, the problem of philosophy is computationally tractable at all, it would seem that some heuristics are at play here as well. Whatever algorithms underlie philosophical reasoning, they’re able to narrow down the hypothesis space of ontologies that we have to consider.</p>



<p>Another relevant intuition: from a computational-complexity perspective, the philosophical reasoning of (1), in general, seems to be more demanding than the more formal non-philosophical thinking of (2). Philosophical reasoning seems to involve some iterative search-like procedures, whereas the “non-philosophical” thinking of (2) involves only “simpler” closed-form deterministic functions.</p>



<p>This fits with the empirical evidence: deriving a new useful model for representing some domain of reality is usually a task for entire fields of science, whereas <em>applying</em> a model is something any individual competent researcher or engineer is capable of.</p>



<h3 class="wp-block-heading">1C. Qualitative Research</h3>



<p>Suppose that the core cognitive processes underlying philosophy are indeed about deriving novel ontologies. Is the converse true: all situations in which we’re deriving some novel ontology are “philosophy-like” undertakings, in some important sense?</p>



<p>I would suggest yes.</p>



<p>Let’s consider the mechanistic-interpretability example from 1A. Mechanistic interpretability is a very concrete, down-to-earth field of study, with tight empirical-testing loops. Nevertheless, things like the Universality Hypothesis, and speculations that the computations in neural networks could be decomposed into “computational circuits”, certainly have a <em>philosophical</em> flavor to them – even if the relevant reasoning happens far outside the field of academic philosophy.</p>



<p>Chris Olah, a prominent ML interpretability researcher, <a href="https://transformer-circuits.pub/2024/qualitative-essay/index.html">characterizes this as “qualitative research”</a>. He points out that one of the telltale signs that this type of research is proceeding productively is finding <em>surprising structure</em> in your empirical results. In other words: finding some way to look at the data which hints at the underlying ontology.</p>



<p>Another common term for this type of research is “pre-paradigmatic” research. The field of agent foundations, for example, is often called “pre-paradigmatic” in the sense that within its context, we don’t know how to correctly phrase even the <em>questions</em> we want answered, nor the definitions of the basic features we want to focus on.</p>



<p>Such research processes are common even in domains that have long been decoupled from philosophy, such as physics. Various attempts to derive the Theory of Everything often involve grappling with very philosophy-like questions regarding the ontology of a physical universe consistent with all our experimental results (e. g., string theory). Different interpretations of quantum mechanics is an even more obvious example.</p>



<p>Thomas Kuhn&#8217;s <a href="https://en.wikipedia.org/wiki/The_Structure_of_Scientific_Revolutions"><em>The Structure of Scientific Revolutions</em></a> naturally deserves a mention here. His decomposition of scientific research into “paradigm shifts” and “normal science” would correspond to the split between (1) and (2) types of reasoning as outlined in the previous section. The research that fuels paradigm shifts would be of the “qualitative”, non-paradigmatic, ontology-discovering type.</p>



<p>Things similar to qualitative/non-paradigmatic research also appear in the world of <em>business</em>. Peter Thiel’s <a href="https://en.wikipedia.org/wiki/Zero_to_One">characterization</a> of startups as engaging in “zero to one” creation of qualitatively new markets or goods would seem to correspond to deriving some novel business frameworks, i. e., <em>ontologies</em>, and succeeding by their terms. (“Standard”, non-startup businesses, in this framework’s view, rely on more “formulaic” practices – i. e., on making use of already pre-computed <em>H → L</em>. Consider opening a new steel mill, which would produce well-known products catering to well-known customers, vs. betting on a specific AI R&amp;D paradigm, whose exact place in the market is impossible to predict even if it succeeds.)</p>



<p>Nevertheless, intuitively, there still seems to be <em>some</em> important difference between these “thin slices” of philosophical reasoning scattered across more concrete fields, and “pure” philosophy.</p>



<p>Before diving into this, a short digression:</p>



<h3 class="wp-block-heading">1D. Qualitative Discoveries Are Often Counterfactual</h3>



<p>Since non-paradigmatic research seems more computationally demanding, requiring a greater amount of expertise than in-paradigm reasoning, its results are often highly <em>counterfactual</em>. While more well-operationalized frontier discoveries are often made by many people near-simultaneously, highly qualitative discoveries could often be attributed to a select few people.</p>



<p>As a relatively practical example, <a href="https://www.lesswrong.com/posts/csHstEPagqs8wChhh/examples-of-highly-counterfactual-discoveries">Shannon’s information theory</a> plausibly counts. The discussion through the link also offers some additional prospective examples.</p>



<p>From the perspective of the “zero-to-one startups are founded on novel philosophical reasoning” idea, this view is also supported. If a novel startup fails due to some organizational issues before proving the profitability of its business plan, it’s not at all certain that it would be quickly replaced by someone trying the same idea, <em>even if</em> its plan were solid. Failures of the Efficient Market Hypothesis are common in this area.</p>



<h3 class="wp-block-heading">1E. What Is “Philosophy” As a Discipline?</h3>



<p>Suppose that the low-level system <em>L</em> represents some practical problem we study. A neural network that we have to interpret, or the readings yielded by a particle accelerator which narrow down the fundamental physical laws, or the behavior of some foreign culture that we want to trade with. Deriving the ontology <em>H</em> would be an instance of non-paradigmatic research, i. e., philosophical reasoning. But once <em>H</em> is derived, it would be relatively easily put to use solving practical problems. The relationship <em>H → L</em>, once nailed down, would quickly be handed off to engineers or businessmen, who could start employing it to optimize the natural world.</p>



<p>As an example, consider <a href="https://www.lesswrong.com/tag/anthropics">anthropics</a>, an emerging field studying <a href="https://en.wikipedia.org/wiki/Anthropic_principle">anthropic principles</a> and extended reasoning similar to <a href="https://en.wikipedia.org/wiki/Doomsday_argument">the doomsday argument</a>.<span id='easy-footnote-5-3634' class='easy-footnote-margin-adjust'></span><span class='easy-footnote'><a href='http://aiimpacts.org/towards-the-operationalization-of-philosophy-wisdom/#easy-footnote-bottom-5-3634' title='Given that we’re able to make a particular observation, what does that tell us about the structure of reality? For example, the fact that intelligent life exists at all already narrows down what laws of physics our universe must have. We can infer something about them purely from observing our own existence, without actually looking around and directly studying them.'><sup>5</sup></a></span> Anthropics doesn’t study a concrete practical problem. It’s a very high-level discipline, more or less abstracting over the-world-as-a-whole (or our experiences of it). Finding a proper formalization of anthropics, which satisfactorily handles all edge cases, would result in advancement in decision theory and probability theory. But there are no <em>immediate</em> practical applications.</p>



<p>They likely do exist. But you’d need to propagate the results <em>farther</em> down the hierarchy of abstractions, moving through these theories down to specific subfields and then to specific concrete applications. None of the needed <em>H → L</em> pathways are derived, there’s a lot of multi-level philosophical work to be done. And there’s always the possibility that it would yield no meaningful results, or end up as a very circuitous way to justify common intuitions.</p>



<p>The philosophy of mind could serve as a more traditional example. Branches of it are focused on investigating the nature of consciousness and qualia. Similarly, it’s a very “high-level” direction of study, and the success of its efforts would have significant implications for numerous other disciplines. But it’s not known what, if any, <em>practical</em> consequences of such a success would be.</p>



<p>Those features, I think, characterize “philosophy” as a separate discipline. Philosophy (1) involves attempts to derive wholly new multi-level disciplines, starting from very-high-level reasoning about the-world-as-a-whole (or, at least, drawing on several disciplines at once), and (2) it only cashes out in practical implementations after several iterations of concretization.</p>



<p>In other words, philosophy is the continuing effort to derive the complete <em>highest-level</em> ontology of our experiences/our world.</p>



<h3 class="wp-block-heading">1F. On Ethics</h3>



<p>An important branch of philosophy which hasn&#8217;t been discussed so far is moral philosophy. The previously outlined ideas generalize to it in a mostly straightforward manner, though with a specific “pre-processing” twist.</p>



<p>This is necessarily going to be a very compressed summary. For proper treatment of the question, I recommend Steven Byrnes’ series on <a href="https://www.lesswrong.com/s/HzcM2dkCq7fwXBej8">the human brain</a> and <a href="https://www.lesswrong.com/s/6uDBPacS6zDipqbZ9">valence signals</a>, or my (admittedly fairly outdated) <a href="https://www.lesswrong.com/posts/kmpNkeqEGvFue7AvA/value-formation-an-overarching-model">essay on value formation</a>.</p>



<p>To start off, let’s assume that the historical <em>starting point</em> of moral philosophy are human moral intuitions and feelings. Which actions, goals, or people “feel like” good or bad things, what seems just or unfair, and so on. From this starting point, people developed the notions of morality and ethics, ethical systems, social norms, laws, and explicit value systems and ideologies.</p>



<p>The process of moral philosophy can then be characterized as follows:</p>



<p>As a premise, human brains contain learning algorithms plus a suite of reinforcement-learning training signals.</p>



<p>In the course of life, and especially in childhood, a human learns a vast repository of value functions. These functions take sensory perceptions and thoughts as inputs, and output “valence signals” in the form of real numbers. The valence assigned to a thought is based on learned predictive heuristics about whether a given type of thought has historically led to positive or negative reward (as historically scored by innate reinforcement-signal functions).</p>



<p>The valences are <em>perceived by</em> human minds as a type of sensory input. In particular, a subset of learned value functions could be characterized as “moral” value functions, and their outputs are perceived by humans as the aforementioned feelings of “good”, “bad”, “justice”, and so on.</p>



<p>Importantly, the learned value functions aren’t part of a human’s learned <em>world-model</em> (<a href="https://en.wikipedia.org/wiki/Explicit_knowledge">explicit knowledge</a>). As the result, their explicit definitions aren’t immediately available to our conscious inspection. They’re “black boxes”: we only perceive their outputs.</p>



<p>One aspect of moral philosophy, thus, is to <em>recover</em> these explicit definitions: what value functions you’ve learned and what abstract concepts they’re “attached to”. (For example: does “stealing” feel bad because you think it’s unfair, or because you fear being caught? You can investigate this by, for example, imagining situations in which you manage to steal something in circumstances where you feel confident you won’t get caught. This would allow you to remove the influence of “fear of punishment”, and thereby determine whether you have a fairness-related value function.)</p>



<p>That is a type of philosophical reasoning: an attempt to “abstract up” from a set of sensory experiences of a specific modality, to a function defined over high-level concepts. (Similar to recovering a “tree” abstraction by abstracting up from a set of observations of a tree from multiple angles.)</p>



<p>Building on that, once a human has recovered (some of) their learned value functions, they can keep abstracting up in the manner described in the preceding text. For example, a set of values like “I don’t like to steal”, “I don’t like to kill”, “I don’t like making people cry” could be abstracted up to “I don’t want to hurt people”.</p>



<p>Building up further, we can abstract over the set of value systems recovered by <em>different </em>people, and derive e. g. the values of a society…</p>



<p>… and, ultimately, “human values” as a whole.</p>



<p>Admittedly, there are some obvious complications here, such as the need to handle value conflicts / inconsistent values, and sometimes making the deliberate choice to discard various data points in the process of computing higher-level values (often on the basis of <em>meta-value</em> functions). For example, not accounting for violent criminals when computing the values a society wants to strive for, or discarding violent impulses when making decisions about what kind of person you want to be.</p>



<p>In other words: when it comes to values, there is an “ought” sneaking into the process of abstracting-up, whereas in all other cases, it’s a purely “is”-fueled process.</p>



<p>But the “ought” side of it can be viewed as simply making decisions about what data to put in the <em>L</em> set, which we’d then abstract over in the usual, purely descriptive fashion.</p>



<p>From this, I conclude that the basic algorithmic machinery, especially one underlying the <em>philosophical</em> (rather than the <em>political</em>) aspects of ethical reasoning, is still the same as with all other kinds of philosophical reasoning.</p>



<h3 class="wp-block-heading">1G. Why Do “Solved” Philosophical Problems Stop Being Philosophy?</h3>



<p>As per the formulations above:</p>



<ul class="wp-block-list">
<li>The endeavor we intuitively view as “philosophy” is a specific subset of general philosophical reasoning/non-paradigmatic research. It involves thinking about the world in a very general sense, <em>without</em> the philosophical assumptions that decompose it into separate domains of study.</li>



<li>“Solving” a philosophical problem involves deriving an ontology/paradigm for some domain of reality, which allows to decouple that domain from the rest of the world and study it mostly separately.</li>
</ul>



<p>Put like this, it seems natural that philosophical successes move domains outside the remit of philosophy. Once a domain has been delineated, thinking about it <em>by definition</em> no longer requires the interdisciplinary reasoning characteristic of philosophy-as-a-discipline. Philosophical reasoning seeks to render itself unnecessary.</p>



<p>(As per the previous sections, working in the domains thus delineated could still involve qualitative research, i. e., philosophical reasoning. But not the specific <em>subtype</em> of philosophical reasoning characteristic of philosophy-as-a-discipline, involving reasoning about the-world-as-a-whole.)</p>



<p>In turn, this separation allows specialization. Newcomers could focus their research and education on the delineated domain, <em>without</em> having to become interdisciplinary specialists. This means a larger quantity of people could devote themselves to it, leading to faster progress.</p>



<p>That dynamic is also bolstered by greater funding. Once the practical implications of a domain become clear, more money pours into it, attracting even <em>more</em> people.</p>



<p>As a <em>very</em> concrete example, we can consider <a href="https://transformer-circuits.pub/2021/framework/index.html#onel-path-expansion">the path-expansion trick</a> in mechanistic interpretability. Figuring out how to mathematically decompose a one-layer transformer into the OV and QK circuits requires high-level reasoning about transformer architecture, and arriving at the very <em>idea</em> of trying to do so requires philosophy-like thinking (to even think to ask, “how can we decompose a ML model into separate building blocks?”). But once this decomposition has been determined, each of these circuits could be studied separately, including by people who don’t have the expertise to derive the decomposition from scratch.</p>



<p>Solving a philosophical problem, then, often allows to greatly upscale the amount of work done in the relevant domain of reality. Sometimes, that quickly turns it into an industry.</p>



<h2 class="wp-block-heading">2. Wisdom</h2>



<p>Let’s consider a wide variety of “wise” behavior or thinking.</p>



<ol class="wp-block-list">
<li>Taking into account “second-order” effects of your actions.
<ul class="wp-block-list">
<li><strong>Example:</strong> <a href="https://simple.wikipedia.org/wiki/Trolley_problem#Emergency_room_case">The “transplant problem”</a>, which examines whether you should cut up a healthy non-consenting person for organs if that would let you save five people whose organs are failing.</li>



<li>“Smart-but-unwise” reasoning does some math and bites the bullet.</li>



<li>“Wise” reasoning points out that if medical professionals engaged in this sort of behavior at scale, people would stop seeking medical attention out of fear/distrust, leading to more suffering in the long run.</li>
</ul>
</li>



<li>Taking into account your history with specific decisions, and updating accordingly.
<ul class="wp-block-list">
<li><strong>Example 1:</strong>
<ul class="wp-block-list">
<li>Suppose you have an early appointment tomorrow, but you’re staying up late, engrossed in a book. Reasoning that you will read “just one more chapter” might seem sensible: going to sleep at 01:00 AM vs. 01:15 AM would likely have no significant impact on your future wakefulness.</li>



<li>However, suppose that you end up making this decision repeatedly, until it’s 6:40 AM and you have barely any time left for sleep at all.</li>



<li>Now suppose that a week later, you’re in a similar situation: it’s 01:00 AM, you’ll need to wake up early, and you’re reading a book. </li>



<li>“Smart-but-unwise” reasoning would repeat your previous mistake: it’d argue that going to sleep fifteen minutes later is fine.</li>



<li>“Wise” reasoning would update on the previous mistake, know not to trust its object-level estimates, and go to sleep immediately.</li>
</ul>
</li>



<li><strong>Example 2:</strong>
<ul class="wp-block-list">
<li>Suppose that someone did something very offensive to you. In the moment, you infer that this means they hate you, and update your beliefs accordingly.</li>



<li>Later, it turns out they weren’t aware that their actions upset you, and they apologize and never repeat that error.</li>



<li>Next time someone offends you, you may consider it “wise” not to trust your instinctive interpretation <em>completely</em>, and at least consider alternate explanations.</li>
</ul>
</li>
</ul>
</li>



<li>Taking into account the impact of the fact that you’re the sort of person to make a specific decision in a specific situation.
<ul class="wp-block-list">
<li><a href="https://www.lesswrong.com/posts/Kbm6QnJv9dgWsPHQP/schelling-fences-on-slippery-slopes"><strong>Example 1</strong></a>:
<ul class="wp-block-list">
<li>Suppose that a staunch pacifist is offered a deal: they take a pill that would decrease their willingness to kill by 1%, and in exchange, they get 1 million dollars. In addition, they could take that deal multiple times, getting an additional 1 million dollars each time, and raising their willingness to kill by 1% each time.</li>



<li>A “smart-but-unwise” pacifist reasons that they’d still be unwilling to kill even if they became, say, 10% more willing to, and that they could spend the 10 million dollars on charitable causes, so they decide to take the deal 10 times.</li>



<li>A “wise” pacifist might consider the fact that, if they take the deal 10 times, the one making the decision on whether to <em>continue</em> would be a 10%-more-willing-to-kill version of them. That version might consider it acceptable to go up to 20%; a 20% version might consider 40% acceptable, and so on until 100%.</li>
</ul>
</li>



<li><strong>Example 2:</strong> Blackmailability.
<ul class="wp-block-list">
<li>Suppose that we have two people, Alice and Carol. Alice is known as a reasonable, measured person who makes decisions carefully, minimizing risk. Carol is known as a very temperamental person who becomes enraged and irrationally violent at the slightest offense.</li>



<li>Suppose that you’re a criminal who wants to blackmail someone. If you’re choosing between Alice and Carol, Alice is a much better target: if you threaten to ruin her life if she doesn’t pay you $10,000, she will tally up the costs and concede. Carol, on the other hand, might see red and attempt to murder you, even if that seals her own fate.</li>



<li>Alice is “smart-but-unwise”. Carol, as stated, isn’t exactly “wise”. But she becomes “wise” under one provision: if she committed to her “irrational” decision policy as a result of rational reasoning about what would make her an unappealing blackmail target. After all, in this setup, she’s certainly the one who ends up better off than Alice!</li>



<li>(<a href="https://en.wikipedia.org/wiki/Functional_Decision_Theory">Functional Decision Theories</a> attempt to formalize this type of reasoning, providing a framework within which it’s strictly rational.)</li>
</ul>
</li>
</ul>
</li>



<li>Erring on the side of deferring to common sense in situations where you think you see an unexploited opportunity.
<ul class="wp-block-list">
<li><strong>Example 1:</strong> Engaging in immoral behavior based on some highly convoluted consequentialist reasoning vs. avoiding deontology violations. See <a href="https://www.lesswrong.com/posts/K9ZaZXDnL3SEmYZqB/ends-don-t-justify-means-among-humans">this article</a> for an extended discussion of the topic.
<ul class="wp-block-list">
<li>This is similar to (1), but in this case, you don’t need to reason through the <em>n</em>th-order effects “manually”. You know that deferring to common sense is <em>usually</em> wise, even if you don’t know why the common sense is the way it is.</li>



<li>It’s also fairly similar to the first example in (3), but the setup here is much more realistic.</li>
</ul>
</li>



<li><strong>Example 2:</strong> Trying to estimate the price of a stock “from scratch”, vs. “<a href="https://thezvi.wordpress.com/2017/11/05/zeroing-out/">zeroing out</a>”, i. e., taking the market value as the baseline and then updating it up/down based on whatever special information you have.</li>



<li><strong>Example 3:</strong> Getting “bad vibes” from a specific workplace environment or group of people, and dismissing these feelings as irrational (“smart-but-unwise”), vs. trying to investigate in-depth what caused them (“wise”). (And discovering, for example, that they were caused by some subtle symptoms of unhealthy social dynamics, which the global culture taught you to spot, but didn’t explain the meaning of.)</li>
</ul>
</li>



<li>Taking the “outside view” into account (in some situations in which it’s appropriate).
<ul class="wp-block-list">
<li><strong>Example:</strong> Being completely convinced of your revolutionary new physics theory or business plan, vs. being excited by it, but skeptical on the meta level, on the reasoning that there’s a decent chance your object-level derivations/plans contain an error.</li>
</ul>
</li>
</ol>



<p><strong>Summing up:</strong> All examples of “wise” behavior here involve (1) generating some candidate plan or inference, which seems reliable or correct while you’re evaluating it using your object-level heuristics, then (2) looking at the appropriate <em>reference class</em> of these plans/inferences, and finally (3) predicting what the <em>actual</em> consequences/accuracy would be using your <em>meta-level</em> heuristics. (“What if everyone acted this way?”, “what happened the previous times I acted/thought this way?”, “what would happen if it were commonly known I’d act this way?”, “if this is so easy, why haven’t others done this already?”, and so on.)</p>



<p>Naturally, it could go even higher. First-order “wise” reasoning might be unwise from a meta-meta-level perspective, and so on.</p>



<p>(For example, “outside-view” reasoning is often overused, and an even more wise kind of reasoning recognizes when inside-view considerations legitimately prevail over outside-view ones. Similarly, the heuristic of “the market is efficient and I can’t beat it” is usually wise, wiser than “my uncle beat the market this one time, which means I can too if I’m clever enough!”, but sometimes there <em>are</em> legitimate market failures.<span id='easy-footnote-6-3634' class='easy-footnote-margin-adjust'></span><span class='easy-footnote'><a href='http://aiimpacts.org/towards-the-operationalization-of-philosophy-wisdom/#easy-footnote-bottom-6-3634' title='Eliezer Yudkowsky’s &lt;a href=&quot;https://equilibriabook.com/&quot;&gt;&lt;em&gt;Inadequate Equilibria&lt;/em&gt;&lt;/a&gt; discusses this topic in great detail.'><sup>6</sup></a></span>)</p>



<p>In other words: “wise” thinking seems to be a two-step process, where you first generate a conclusion that you expect to be accurate, then “go meta”, and predict what would be the <em>actual</em> accuracy rate of a decision procedure that predicts this sort of conclusion to be accurate.</p>



<h3 class="wp-block-heading">2A. Background Formalisms</h3>



<p>To start off, I will need to introduce a toy model of agency. Bear with me.</p>



<p><strong>First: How can we model the inferences from the </strong><strong><em>inputs</em></strong><strong> to an agent’s decisions?&nbsp;</strong></p>



<p>Photons hit our eyes. Our brains draw an image aggregating the information each photon gives us. We interpret this image, decomposing it into objects, and inferring which latent-variable object is responsible for generating which part of the image. Then we wonder further: what process generated each of these objects? For example, if one of the &#8220;objects&#8221; is a news article, what is it talking about? Who wrote it? What events is it trying to capture? What set these events into motion? And so on.</p>



<p>In diagram format, we&#8217;re doing something like this:</p>



<figure class="wp-block-image"><img decoding="async" src="https://lh7-rt.googleusercontent.com/docsz/AD_4nXfVnZJuMJK_8GW9-0hD_bCynEHbju0pWb7R38lYkLf9Dfdqiw7-sobFaDs9Poi_ek9obpJHa3Tp3sNx5sA_8_Eb_ix2v82ejQnr3QXQS6kqh3rwtZ1IQtIc6nQ6mxMnIm8ExHec2zWWwQQmI09ejoV8ADWB?key=ogmKpL6fBJgn1Y18NUTnww" alt=""/></figure>



<p><em>Blue are ground-truth variables, gray is the &#8220;Cartesian boundary&#8221; of our mind from which we read off observations, purple are nodes in our world-model, each of which can be mapped to a ground-truth variable.</em></p>



<p>We take in observations, infer what latent variables generated them, then infer what generated those variables, and so on. We go backwards: from effects to causes, iteratively. The Cartesian boundary of our input can be viewed as a &#8220;mirror&#8221; of a sort, reflecting <em>the Past</em>.</p>



<p>It&#8217;s a bit messier in practice, of course. There are shortcuts, ways to map immediate observations to far-off states. But the general idea mostly checks out – especially given that these &#8220;shortcuts&#8221; probably still <em>implicitly</em> route through all the intermediate variables, just without explicitly computing them. (You can map a news article to the events it&#8217;s describing without explicitly modeling the intermediary steps of witnesses, journalists, editing, and publishing. But your mapping function is still implicitly shaped by the known quirks of those intermediaries.)</p>



<p><strong>Second: Let’s now consider the “output side” of an agent</strong>. I. e., what happens when we&#8217;re planning to achieve some goal, in a consequentialist-like manner.</p>



<p>We envision the target state. What we want to achieve, how the world would look like. Then we ask ourselves: what would cause this? What forces could influence the outcome to align with our desires? And then: how do we control these forces? What actions would we need to take in order to make the network of causes and effects steer the world towards our desires?</p>



<p>In diagram format, we&#8217;re doing something like this:</p>



<figure class="wp-block-image"><img decoding="async" src="https://lh7-rt.googleusercontent.com/docsz/AD_4nXdNxLsICosoGpZG5q868jmZXrVKfSNEja_7wXNOwFrCs3Uqnv5GoN0TczJ-_PEUhpZJrCq5sRcoW2-t5z1502leNjiMFNMa4dUmss1iQXoqbVwWzZp69g3t8fPYjbtE7yQgLNm2RwVAvA_BiNvI8jhCkmUT?key=ogmKpL6fBJgn1Y18NUTnww" alt=""/></figure>



<p><em>Green are goals, purple are intermediary variables we compute, gray is the Cartesian boundary of our actions, red are ground-truth variables through which we influence our target variables.</em></p>



<p>We start from our goals, infer what latent variables control their state in the real world, then infer what controls those latent variables, and so on. We go backwards: from effects to causes, iteratively, until getting to our own actions. The Cartesian boundary of our output can be viewed as a &#8220;mirror&#8221; of a sort, reflecting <em>the Future</em>.</p>



<p>It&#8217;s a bit messier in practice, of course. There are shortcuts, ways to map far-off goals to immediate actions. But the general idea mostly checks out – especially given that these heuristics probably still <em>implicitly</em> route through all the intermediate variables, just without explicitly computing them. (&#8220;Acquire resources&#8221; is a good heuristical starting point for basically any plan. But what <em>counts as</em> resources is something you had to figure out in the first place by mapping from &#8220;what lets me achieve goals in this environment?&#8221;.)</p>



<p>And indeed, that side of this formulation isn&#8217;t novel. From <a href="https://www.lesswrong.com/posts/gEKHX8WKrXGM4roRC/saving-time#Why_Time_">this post</a> by Scott Garrabrant, an agent-foundations researcher:</p>



<p><em>Time is also crucial for thinking about agency. My best short-phrase definition of agency is that agency is time travel. An agent is a mechanism through which the future is able to affect the past. An agent models the future consequences of its actions, and chooses actions on the basis of those consequences. In that sense, the consequence causes the action, in spite of the fact that the action comes earlier in the standard physical sense.</em></p>



<p><strong>Let’s now put both sides together</strong>. An idealized, compute-unbounded &#8220;agent&#8221; could be laid out in this manner:</p>



<figure class="wp-block-image"><img decoding="async" src="https://lh7-rt.googleusercontent.com/docsz/AD_4nXcyJW9xtI4LK0E1YzWzstTzM6vTigNjR1CFSikQUugh9FtV6LS2Q0GHpFIGpxQT0-bd0w1cykE8gnsKm9AkvfQNKdgX5-DrKSUzEGxmguQMAvB-WXQRbrROfCRA5Wai2njjoM2sfesCazYIOH5eIXnbYPOz?key=ogmKpL6fBJgn1Y18NUTnww" alt=""/></figure>



<p>It reflects the past at the input side, and reflects the future at the output side. In the middle, there&#8217;s some &#8220;glue&#8221;/&#8221;bridge&#8221; connecting the past and the future by a forwards-simulation. During that, the agent &#8220;catches up to the present&#8221;: figures out what will happen <em>while</em> it&#8217;s figuring out what to do.</p>



<p>If we consider <a href="https://www.lesswrong.com/posts/voLHQgNncnjjgAPH7/utility-maximization-description-length-minimization">the relation between utility functions and probability distributions</a>, it gets even more formally literal. An utility function over <em>X</em> could be viewed as a target probability distribution over <em>X</em>, and maximizing expected utility is equivalent to minimizing cross-entropy between this target distribution and the real distribution.</p>



<p>That brings the &#8220;planning&#8221; process in alignment with the &#8220;inference&#8221; process: both are about propagating target distributions &#8220;backwards&#8221; in time through the network of causality.</p>



<h3 class="wp-block-heading">2B. Tentative Formalization</h3>



<p>Let’s consider what definition “wisdom” would have, in this framework.</p>



<p>All “object-level” cognitive heuristics here have a form of <em>Y → X</em>, where <em>Y</em> is some environmental variable, and <em>X</em> are the variables that cause <em>Y</em>. I. e., every cognitive heuristic <em>Y → X</em> can be characterized as an inversion of some environmental dynamic <em>X → Y</em>.</p>



<p>“Wisdom”, in this formulation, seems to correspond to <em>inversions of inversions</em>. Its form is</p>



<p>(<em>Y → X</em>)<em> → Y.</em></p>



<p>It takes in some object-level inversion – an object-level cognitive heuristic – and predicts things about <em>the performance of a cognitive policy that uses this heuristic</em>.</p>



<p>Examining this definition from both ends:</p>



<ul class="wp-block-list">
<li>If we’re considering an object-level output-side heuristic <em>E → A,</em> which maps environmental variables <em>E</em> to actions <em>A</em> that need to be executed in order to set <em>E</em> to specific values – i. e., a “planning” heuristic – the corresponding “wisdom” heuristic (<em>E → A</em>)<em> → E</em> tells us what object-level consequences <em>E</em> the reasoning of this type <em>actually</em> results in.</li>



<li>If we’re considering an object-level input-side heuristic <em>O → E</em> mapping observations <em>O</em> to their environmental causes <em>E</em> – i. e., an “inference” heuristic – the corresponding “wisdom” heuristic (<em>O → E</em>)<em> → O</em> tells us what we’d <em>actually</em> expect to see going forward, and whether the new observations would diverge from our object-level inferences. (I. e., whether we expect that the person who offended us would <em>actually</em> start acting like they hate us, going forward.)</li>
</ul>



<p>Admittedly, some of these speculations are fairly shaky. The “input-side” model of wisdom, in particular, seems off to me. Nevertheless, I think this toy formalism does make some intuitive sense.</p>



<p>It’s also clear, from this perspective, why “wisdom” is inherently more complicated / hard-to-compute than “normal” reasoning: it explicitly <em>iterates on</em> object-level reasoning.</p>



<h3 class="wp-block-heading">2C. Crystallized Wisdom</h3>



<p>In humans, cognitive heuristics are often not part of explicit knowledge, but are instead stored as learned instincts, patterns of behavior, or emotional responses – or “<a href="https://www.lesswrong.com/s/nyEFg3AuJpdAozmoX/p/iCfdcxiyr2Kj8m8mT">shards</a>”, in the parlance of one popular framework.</p>



<p>Since wisdom is a subset of cognitive heuristics, that applies to it as well. “Wise” heuristics are often part of “common sense”, tacit knowledge, cultural norms, and hard-to-articulate intuitions and hunches. In some circumstances, they’re stored in a format that doesn’t refer to the initial object-level heuristic at all! Heuristics such as “don’t violate deontology” don’t activate <em>only</em> in response to object-level criminal plans.</p>



<p>(Essentially, wisdom is conceptually/<a href="https://www.lesswrong.com/posts/dKAJqBDZRMMsaaYo5/in-logical-time-all-games-are-iterated-games#Logical_Time">logically</a> downstream of object-level heuristics, but not necessarily <em>cognitively downstream, </em>in the sense of moment-to-moment perceived mental experiences.)</p>



<p>Indeed, wisdom heuristics, by virtue of being more computationally demanding, are likely to be stored in an “implicit” form <em>more often</em> than “object-level” heuristics. Deriving them explicitly often requires looking at “global” properties of the environment or your history in it, considering the whole reference class of the relevant object-level cognitive heuristic. By contrast, object-level heuristics themselves involve a merely “local” inversion of some environmental dynamic.</p>



<p>As the result, “wisdom” usually only accumulates after humanity has engaged with some domain of reality for a while. Similarly, individual people tend to become “wise” only after they personally were submerged in that domain for a while – after they “had some experience” with it.</p>



<p>That said, to the extent that this model of wisdom is correct, wisdom <em>can</em> nevertheless be inferred “manually”, with enough effort. After all, it’s still merely a function of the object-level domain. It <em>could</em> be derived purely from the domain’s object-level model, given enough effort and computational resources, no “practical experience” needed.</p>



<h2 class="wp-block-heading">3. Would AGIs Pursue Wisdom &amp; Philosophical Competence?</h2>



<p>In my view, the answer is a clear “yes”.</p>



<p>To start off, let’s define an “AGI” as “a system which can discover novel abstractions (such as new fields of science) in any environment that has them, and fluently use these abstractions in order to better navigate or optimize its environment in the pursuit of its goals”.</p>



<p>It’s somewhat at odds with the more standard definitions, which tend to characterize AGIs as, for example, “systems that can do most cognitive tasks that a human can”. But I think it captures some intuitions better than the standard definitions. For one, the state-of-the-art LLMs certainly seem to be “capable of doing most cognitive tasks that humans can”, yet most specialists and laymen alike would agree that they are not AGI. Per my definition, it’s because LLMs cannot discover <em>new</em> ontologies: they merely learned vast repositories of abstractions that were pre-computed for them by humans.</p>



<p>As per my arguments, <strong>philosophical reasoning is convergent:</strong></p>



<ul class="wp-block-list">
<li>It’s a subset of general non-paradigmatic research&#8230;</li>



<li>… which is the process of deriving new ontologies…</li>



<li>… which are useful because they allow to decompose the world into domains that can be reasoned about mostly-separately…</li>



<li>… which is useful because it reduces the computational costs needed for making plans or inferences.</li>
</ul>



<p>Any efficient bounded agent, thus, would necessarily become a competent philosopher, and it would engage in philosophical reasoning regarding all domains of reality that (directly or indirectly) concern it.</p>



<p>Consider the opposite: “philosophically incompetent” or incapable reasoners. Such reasoners would only be able to make use of pre-computed <em>H → L</em> relations. They would not be able to derive genuinely <em>new</em> abstractions and create <em>new</em> fields. Thus, they wouldn’t classify as “AGI” in the above-defined sense.</p>



<p>They’d be mundane, <em>non-general</em> software tools. They’d still be able to be quite complex and intelligent, in some ways, up to and including being able to write graduate-level essays or even complete formulaic engineering projects. Nevertheless, they’d fall short of the “AGI” bar. (And would likely represent no existential risk on their own, outside cases of misuse by human actors.)</p>



<p>As a specific edge case, we can consider humans who are capable researchers in their domain – including being able to derive novel ontologies – but are still philosophically incompetent in a broad sense. I’d argue that this corresponds to the split between “general” philosophical reasoning, and “philosophy as a discipline” I’ve discussed in 1E. These people likely <em>could</em> be capable philosophers, but simply have no interest in specializing in high-level reasoning about the-world-in-general, nor in exploring its highest-level ontology.</p>



<p>Something similar <em>could</em> happen with AGIs trained/designed a specific way. But in the limit of superintelligence, it seems likely that <em>all</em> generally intelligent minds converge to being philosophically competent.</p>



<p><strong>Wisdom</strong> is also convergent. When it comes down to it, wisdom seems to just be an additional trick for making correct plans or inferences. “Smart but unwise” reasoning would correspond to cases in which you’re not skeptical of your own decision-making procedures, are mostly not trying to improve them, and only take immediate/local consequences of your action into account. Inasmuch as AGIs would be capable of long-term planning across many domains, they would strive to be “wise”, in the sense I’ve outlined in this essay.</p>



<p>And those AGIs that would have superhuman general-intelligence capabilities, would be able to derive the “wisdom” heuristics <em>quicker</em> than humans, with little or no practical experience in a domain.</p>



<h2 class="wp-block-heading">4. Philosophically Incompetent Human Decision-Makers</h2>



<p>That said, just because AGIs would be philosophically competent, that doesn’t mean they’d by-default address and fix the philosophical incompetence of the humans who created them. <em>Even if</em> these AGIs would be otherwise aligned to human intentions and inclined to follow human commands.</p>



<p>The main difficulty here is that humans store their values in a decompiled/incomplete format. We don’t have explicit utility functions: our values are a combination of explicit consciously-derived preferences, implicit preferences, emotions, subconscious urges, and so on. (Theoretically, <a href="https://www.lesswrong.com/posts/okkEaevbXCSusBoE2/how-would-an-utopia-maximizer-look-like">it may be possible</a> to compile all of that into a utility function, but that’s a very open problem.)</p>



<p>As the result, mere <em>intent alignment</em> – designing an AGI which would do what its human operators “genuinely want” it to do, when they give it some command – still leaves a lot of philosophical difficulties and free parameters.</p>



<p>For example, suppose the AGI&#8217;s operators, in a moment of excitement after they activate their AGI for the first time, tell it to solve world hunger. What should the AGI do?</p>



<ul class="wp-block-list">
<li>Should it read off the surface-level momentary intent of this command, design some sort of highly nutritious and easy-to-produce food, and distribute it across the planet in the specific way the human is currently imagining this?</li>



<li>Should it extrapolate the human&#8217;s values, and execute the command the way the human <em>would have wanted to</em> execute it if they&#8217;d thought about it for a bit, rather than the way they&#8217;re envisioning it in the moment?
<ul class="wp-block-list">
<li>(For example, perhaps the image flashing through the human&#8217;s mind right now is of helicopters literally dropping crates full of food near famished people, but it&#8217;s actually more efficient to do it using airplanes.)</li>
</ul>
</li>



<li>Should it extrapolate the human&#8217;s values a bit, and point out specific issues with this plan that the human might think about later (e. g., that such sudden large-scale activity might provoke rash actions from various geopolitical actors, leading to vast suffering), then give the human a chance to abort?</li>



<li>Should it extrapolate the human&#8217;s values a bit further, and point out issues the human might <em>not</em> have thought of (including teaching the human any novel load-bearing concepts necessary for understanding said potential issues)?</li>



<li>Should it extrapolate the human&#8217;s values a bit further still, and teach them various better cognitive protocols for self-reflection, so that they may better evaluate whether a given plan satisfies their values?</li>



<li>Should it extrapolate the human&#8217;s values <em>far afield</em>, interpret the command as &#8220;maximize eudaimonia&#8221;, and do that, disregarding the specific rough way of how they gestured at the idea?
<ul class="wp-block-list">
<li>In other words: should it directly optimize for the human’s <a href="https://www.lesswrong.com/tag/coherent-extrapolated-volition">coherent extrapolated volition</a> (which is something like the ultimate output of abstracting-over-ethics that I’d gestured at in 1F)?</li>
</ul>
</li>



<li>Should it remind the human that they&#8217;d wanted to be careful regarding how they use the AGI, and to clarify whether they actually want to proceed with something so high-impact right now?</li>



<li>Should it <em>insist</em> that the human is currently too philosophically confused to make such high-impact decisions, and the AGI first needs to teach them a lot of novel concepts, before they can be sure there are no unknown unknowns that’d put their current plans at odds with their extrapolated values?</li>
</ul>



<p>There are many, many drastically different ways to implement something as seemingly intuitive as “Do What I Mean”. And unless “aligning AIs to human intent” is done in the specific way that puts as much emphasis as possible on philosophical competence, including <em>refusing</em> human commands if the AGI judges them unwise/philosophically incompetent – short of that, even an AGI that is intent-aligned (in some strict technical sense) might lead to existentially catastrophic outcomes, up to and including the possibility of <a href="https://www.lesswrong.com/tag/risks-of-astronomical-suffering-s-risks">suffering at astronomical scales</a>.</p>



<p>For example, suppose the AGI is designed to act on the surface-level meaning of commands, and it’s told to “earn as much money as possible, by any means necessary”. As I’ve argued in Section 3, it <em>would</em> derive a wise and philosophically competent understanding of what “obeying the surface-level meaning of a human’s command” means, and how to wisely and philosophically competently execute on this specific command. But it would not question <em>the wisdom and philosophical competence of the command from the perspective of a counterfactual wiser human</em>. Why would it, unless specifically designed to?</p>



<p>Another example: If the AGI is “left to its own devices” regarding how to execute on some concrete goal, it’d likely do everything “correctly” regarding certain philosophically-novel-to-us situations, such as the hypothetical possibility of <a href="https://www.lesswrong.com/tag/acausal-trade">acausal trade</a> with the rest of the multiverse. (If <a href="https://ordinaryideas.wordpress.com/2016/11/30/what-does-the-universal-prior-actually-look-like/">the universal prior is malign</a>, and using it is a bad idea, an actual AGI would just use something else.) However, if the AGI is <a href="https://www.lesswrong.com/tag/corrigibility">corrigible</a>, and it explains the situation to a philosophically incompetent human operator before taking any action, <em>the human</em> might incorrectly decide that giving in to acausal blackmail is the correct thing to do, and order the AGI to do so.</p>



<p>On top of that, there’s a certain Catch-22 at play. Convincing the decision-makers or engineers that the AGI must be designed such that it’d only accept commands from wise philosophically competent people <em>already</em> requires some level of philosophical competence on the designers’ part. They’d need to know that there even <em>are</em> philosophical “unknown unknowns” that they must be wary of, and that faithfully interpreting human commands is more complicated than just reading off the human’s intent at the time they give the command.</p>



<p>How to arrive at that state of affairs is an open question.</p>



<h2 class="wp-block-heading">5. Ecosystem-Building</h2>



<p>As argued in 1G, the best way to upscale the process of attaining philosophical competence and teaching it to people would be to <em>move metaphilosophy outside the domain of philosophy</em>. Figure out the ontology suitable for robustly describing any and all kinds of philosophical reasoning, and decouple from the rest of reality.</p>



<p>This would:</p>



<ul class="wp-block-list">
<li><strong>Allow more people to specialize in metaphilosophy</strong>, since they’d only need to learn about this specific domain of reality, rather than becoming interdisciplinary experts reasoning about the world at a high level.</li>



<li><strong>Simplify the transfer of knowledge and the process of training new people</strong>. Once we have a solid model of metaphilosophy, that’d give us a ground-truth idea of how to translate philosophical projects into concrete steps of actions (i. e., what the <em>H → L</em> functions are). Those could be more easily taught in a standardized format, allowing at-scale teaching and at-scale delegation of project management.</li>



<li><strong>Give us the means to measure philosophical successes and failures</strong>, and therefore, how to steer philosophical projects and keep them on-track. (Which, again, would allow us to scale the size and number of such projects. How well they perform would become <em>legible</em>, giving us the ability to optimize for that clear metric.)</li>



<li><strong>Provide legibility in general.</strong> Once we have a concrete, convergent idea of what philosophical projects are, how they succeed, and what their benefits are, we’d be able to more easily argue the importance of this agenda to other people and organizations, increasing the agenda’s reach and attracting funding.</li>
</ul>



<p>Hopefully this essay and the formalisms in it provide the starting point for operationalizing metaphilosophy in a way suitable for scaling it up.</p>



<p>Similar goes for wisdom – although unlike teaching philosophical competence, this area seems less neglected. (Large-scale projects for “<a href="https://www.lesswrong.com/posts/XqmjdBKa4ZaXJtNmf/raising-the-sanity-waterline">raising the sanity waterline</a>” have been attempted in the past, and I think any hypothetical “wisdom-boosting” project would look more or less the same.)</p>



<h2 class="wp-block-heading">6. Philosophy Automation</h2>



<p>In my view, automating philosophical reasoning is an AGI-complete problem. I think that the ability to engage in qualitative/non-paradigmatic research is what <em>defines</em> a mind as generally intelligent.</p>



<p>This is why LLMs, for example, are so persistently <a href="https://www.lesswrong.com/posts/nQwbDPgYvAbqAmAud/llms-for-alignment-research-a-safety-priority#What_is_wrong_with_current_models_">bad at it</a>, despite their decent competence in other cognitive areas. I would argue that LLMs contain <a href="https://www.lesswrong.com/posts/3JRBqRtHBDyPE3sGa/a-case-for-the-least-forgiving-take-on-alignment#6__The_Case_of_LLMs">vast amounts of crystallized heuristics</a> – that is, <em>H → L</em> functions, in this essay’s terminology – yet no ability to derive new ontologies/abstractions <em>H</em> given a low-level system <em>L</em>. Thus, there are <em>no</em> types of philosophical reasoning they’d be good at; no ability to contribute on their own/autonomously.</p>



<p>On top of that, since we ourselves don’t know the ontology of metaphilosophy either, that likely cripples our ability to use AI tools for philosophy <em>in general</em>. The reason is the same as the barrier to scaling up philosophical projects: we don’t know how the domain of metaphilosophy factorizes, which means we don’t know how to <a href="https://www.lesswrong.com/posts/3gAccKDW6nRKFumpP/why-not-just-outsource-alignment-research-to-an-ai">competently outsource</a> philosophical projects and sub-projects, how to train AIs specialized in this, and how to measure their successes or failures.</p>



<p>One approach that <em>might</em> work is “cyborgism”, as <a href="https://www.lesswrong.com/posts/bxt7uCiHam4QXrQAA/cyborgism">defined by janus</a>. Essentially, it uses LLMs as a brainstorming tool, allowing to scope out vastly larger regions of concept-space for philosophical insights, with the LLMs’ thought-processes steered by a human. In theory, this gives us the best of both worlds: a human’s philosophy-capable algorithms are enhanced by the vast repository of crystallized <em>H → L</em> and <em>L → H </em>functions contained within the LLM. Janus has been able to generate some <a href="https://www.lesswrong.com/posts/vPsupipfyeDoSAirY/language-ex-machina">coherent-ish philosophical artefacts</a> this way. However, this idea has been around for a while, and so far, I haven’t seen any payoff from it.</p>



<p>Overall, I’m very skeptical that LLMs could be of any help here whatsoever, besides their standard mundane-utility role of teaching people new concepts in a user-tailored format. (Which might be helpful, in fact, but it isn’t the main bottleneck here. As I’ve discussed in Section 5, this sort of at-scale distribution of standardized knowledge only becomes possible <em>after</em> the high-level ontology of what we want to teach is nailed down.)</p>



<p>What <em>does</em> offer some hope for automating philosophy is the research agenda focused on the Natural Abstraction Hypothesis. I’ve discussed it above, and my tentative operationalization of philosophy is based on it. The agenda is focused on finding a formal definition for abstractions (i. e., layers of ontology), and what algorithms could at least <em>assist us</em> with deriving new ones.</p>



<p>Thus, inasmuch as my model of philosophy is right, the NAH agenda is precisely focused on operationalizing philosophical reasoning. John Wentworth additionally discusses some of the NAH’s applications for metaphilosophy <a href="https://www.lesswrong.com/posts/HfqbjwpAEGep9mHhc/the-plan-2023-version#How_is_abstraction_a_bottleneck_to_metaphilosophy_">here</a>.</p>



<p><em>Thanks to David Manley, Linh Chi Nguyen, and Bradford Saad for providing extensive helpful critique of an earlier draft, and to John Wentworth for proofreading the final version.</em></p>



<h2 class="wp-block-heading">Notes</h2>
]]></content:encoded>
					
		
		
			</item>
		<item>
		<title>Some Preliminary Notes on the Promise of a Wisdom Explosion</title>
		<link>http://aiimpacts.org/some-preliminary-notes-on-the-promise-of-a-wisdom-explosion/</link>
		
		<dc:creator><![CDATA[Katja Grace]]></dc:creator>
		<pubDate>Sun, 27 Oct 2024 17:08:25 +0000</pubDate>
				<category><![CDATA[Essay Competition on the Automation of Wisdom and Philosophy]]></category>
		<guid isPermaLink="false">https://aiimpacts.org/?p=3633</guid>

					<description><![CDATA[By Chris Leong This was a prize-winning entry into the Essay Competition on the Automation of Wisdom and Philosophy. Notes]]></description>
										<content:encoded><![CDATA[
<p>By Chris Leong</p>



<p><em>This was a prize-winning entry into the Essay Competition on the Automation of Wisdom and Philosophy.</em></p>



<ul class="wp-block-list">
<li>Leading AI labs are aiming to trigger an intelligence explosion, but perhaps this is a grave mistake? Maybe they should be aiming to trigger a “wisdom explosion instead”?:
<ul class="wp-block-list">
<li>Defining this as “pretty much the same thing as an intelligence explosion, but with wisdom instead” is rather vague<span id='easy-footnote-1-3633' class='easy-footnote-margin-adjust'></span><span class='easy-footnote'><a href='http://aiimpacts.org/some-preliminary-notes-on-the-promise-of-a-wisdom-explosion/#easy-footnote-bottom-1-3633' title='And likely even frustrating for some folk! Sorry if this is the case, but my focus here is really on starting a conversation and I understand how this could be annoying if you prefer posts that are written in such a way to make it as quick and easy as possible to determine whether what the post is saying is true.'><sup>1</sup></a></span>, but I honestly think it is good enough for now. I think it’s fine for early-stage exploratory work to focus on opening up a new part of conversational space rather than trying to perfectly pin everything down<span id='easy-footnote-2-3633' class='easy-footnote-margin-adjust'></span><span class='easy-footnote'><a href='http://aiimpacts.org/some-preliminary-notes-on-the-promise-of-a-wisdom-explosion/#easy-footnote-bottom-2-3633' title='I plan to examine this in more detail in part seven (Is a “Wisdom Explosion” a coherent concept?) of my upcoming Less Wrong seqence on Training Wise AI Advisers via Imitation Learning'><sup>2</sup></a></span>.</li>



<li>Regarding my definition of wisdom, I’ll be exploring this in more detail in part six (“What kinds of wisdom are valuable?) of my upcoming Less Wrong sequence, but for now, I’ll just say that I take an expansive definition of what wisdom is and that achieving a “wisdom explosion” would likely require us to train a system that is fairly strong on a number of different subtypes. As an example though, if a coalition of groups focused on AI safety were able to wisely strategize, wisely co-ordinate and wisely pursue methods of non-manipulative persuasion, I’d feel significantly better about humanity&#8217;s chances of surviving.</li>



<li>In any case, I don’t want to center my own understanding of wisdom too much. Instead, I’d encourage you to consider the types of wisdom that you think might be most valuable for achieving a positive future for humanity and whether the arguments below follow given how you conceive of wisdom, rather than how I conceive of wisdom<span id='easy-footnote-3-3633' class='easy-footnote-margin-adjust'></span><span class='easy-footnote'><a href='http://aiimpacts.org/some-preliminary-notes-on-the-promise-of-a-wisdom-explosion/#easy-footnote-bottom-3-3633' title='One of the risks of saying too much about how I conceive of wisdom too early on is that it may have the unintentional effect of accidentally narrowing the conversation or encouraging people to anchor too much on my conceptions.'><sup>3</sup></a></span>.</li>



<li>In an intelligence explosion, the recursive self-improvement occurs within a single AI system. However, in terms of defining a wisdom explosion, I want to take a more expansive view. In particular, instead of requiring that it occur within a single AI, I want to allow the possibility that it may occur within a cybernetic system consisting of both humans and AI’s, either within a single organisation, or within a cluster of collaborating organisations. In fact, I think this is the best path route for pursuing a wisdom explosion.</li>



<li>I find the version involving a cluster of collaborating organisations particularly compelling both because it would enable the pooling of resources<span id='easy-footnote-4-3633' class='easy-footnote-margin-adjust'></span><span class='easy-footnote'><a href='http://aiimpacts.org/some-preliminary-notes-on-the-promise-of-a-wisdom-explosion/#easy-footnote-bottom-4-3633' title='Particularly important since wisdom is a cluster of different things and developing an entirely new paradigm would be a lot of work'><sup>4</sup></a></span> for developing wisdom tech, but also because it would enable pursuing a <a href="https://www.lesswrong.com/posts/etNJcXCsKC6izQQZj/pivotal-outcomes-and-pivotal-processes">pivotal process</a> rather than a pivotal action.</li>
</ul>
</li>



<li>For purposes of simplicity, I’ll talk about “responsible &amp; wise” actors vs. “irresponsible &amp; unwise” actors even though responsibility and wisdom don’t always line up<span id='easy-footnote-5-3633' class='easy-footnote-margin-adjust'></span><span class='easy-footnote'><a href='http://aiimpacts.org/some-preliminary-notes-on-the-promise-of-a-wisdom-explosion/#easy-footnote-bottom-5-3633' title='Arguments always involve some degree of simplification. The question is whether the additional clarity outweighs the reduction in accuracy.'><sup>5</sup></a></span>.</li>



<li>I will develop this argument more fully in my upcoming Less Wrong post “Artificial Intelligence/Capabilities<span id='easy-footnote-6-3633' class='easy-footnote-margin-adjust'></span><span class='easy-footnote'><a href='http://aiimpacts.org/some-preliminary-notes-on-the-promise-of-a-wisdom-explosion/#easy-footnote-bottom-6-3633' title='Intelligent and capabilities aren’t quite the same thing. I’ll explore the distinction in more detail in my upcoming sequence.'><sup>6</sup></a></span> as Potentially Fatal Mistake. Artificial Wisdom as Antidote”, but an outline of the argument I plan to make is below</li>



<li>Firstly I will argue that the pursuit of an intelligence explosion most likely result in catastrophe:
<ul class="wp-block-list">
<li>Capabilities inevitably proliferate: key factors include a strong open-source community, large career incentives for researchers to publish and challenges with preventing espionage</li>



<li>The attack-defense balance strongly favors the attack: attackers only need to get lucky once, defenders need to get lucky every time</li>



<li>The proliferation of capabilities most likely leads to an AI arms race: the diffusion of capabilities levels the playing field which forces actors to race to maintain their lead</li>



<li>Intelligence/Capability tech differentially benefits irresponsible &amp; unwise actors: Recklessly racing ahead increases your access to resources, whilst responsible &amp; wise actors need time to figure out how to act wisely</li>



<li>Society struggles to adapt: Government processes aren’t designed to be able to handle a technology that moves as fast as AI. Reckless &amp; unwise actors will use their political influence to push society to adopt unwise policies. </li>
</ul>
</li>



<li>In contrast, I’ll argue that the pursuit of a wisdom explosion is likely to be much safer:
<ul class="wp-block-list">
<li>Pursuing wisdom tech likely produces less capability externalities
<ul class="wp-block-list">
<li>A wisdom explosion might be achievable with AI’s built on top of relatively weak base models: think of the wisest people you know, they don’t all have massive amounts of cognitive “firepower”</li>
</ul>
</li>



<li>Both malicious and reckless &amp; unwise actors are less likely to pursue such technologies:
<ul class="wp-block-list">
<li>They are less likely to value wisdom, especially given the trade-off with pursuing shiny, shiny capabilities.</li>
</ul>
</li>



<li>Reckless &amp; unwise actors are disadvantaged in pursuing a wisdom explosion:
<ul class="wp-block-list">
<li>There is likely a minimum bar of wisdom required to trigger such an explosion. As they say, garbage in, garbage out.</li>



<li>Even if they were able to trigger such an explosion, it’d likely take them longer and/or require a higher capability level. Remember I’m proposing producing a cybernetic system, so the human operators play a key role here.</li>
</ul>
</li>



<li>Reckless &amp; unwise actors are less likely to know what to do with any wisdom tech that they develop or acquire:
<ul class="wp-block-list">
<li>This is less true at higher capability levels where the system can help them figure out what they should be asking, but they might just ignore it.</li>
</ul>
</li>



<li>Even if reckless &amp; unwise actors actually pursue and then manage to acquire wisdom tech, it may not be harmful:
<ul class="wp-block-list">
<li>Acquiring such technology may make them realise their foolishness.</li>



<li>They may then either delete their model, hand it over to someone more responsible or start working towards becoming a more responsible actor themselves</li>
</ul>
</li>



<li>Responsible actors can use wisdom tech to help them attempt to non-manipulatively persuade irresponsible actors to be more responsible:
<ul class="wp-block-list">
<li>My intuition is that this is much harder for intelligence/capability tech which will likely be superhuman at persuasion soon, but which is not a natural fit for non-manipulative persuasion<span id='easy-footnote-7-3633' class='easy-footnote-margin-adjust'></span><span class='easy-footnote'><a href='http://aiimpacts.org/some-preliminary-notes-on-the-promise-of-a-wisdom-explosion/#easy-footnote-bottom-7-3633' title='I expect most techniques for training wisdom to be adaptable towards this end. Non-manipulative persuasion requires difficult subjective judgements, just like wisdom'><sup>7</sup></a></span></li>
</ul>
</li>
</ul>
</li>



<li>I also think it may be viable. I’ll develop these arguments more fully in the seventh post of my upcoming Less Wrong sequence “Is a “Wisdom Explosion” a coherent concept?”, but my high-level thoughts are as follows:
<ul class="wp-block-list">
<li>Before we begin: What level of wisdom would we need to spiral up to count as having achieved a “wisdom explosion”? We might not need to set the level at too high of a level (insofar as super-human systems go). Saving the world may require superhuman wisdom, but I don’t think it would have to be that superhuman.</li>



<li>Wisdom seems like the kind of thing where having a greater degree of wisdom makes it easier to acquire even more. In particular, you are more likely to be able to discern who is providing wise or unwise advice. You are also more likely to be able to discern which assumptions require questioning.</li>



<li>Insofar as we buy into the argument for an intelligence explosion being viable, one might naively assume that this also increases the chance that a wisdom explosion is viable:
<ul class="wp-block-list">
<li>One could push back against this by noting that intelligence is much easier to train than wisdom because, for intelligence, we can train our system on problems with known solutions or with a simulator. This is true, but it doesn’t mean that we can’t use these kinds of things for training wisdom. Instead, it just means that we have to be more careful in terms of how we go about it.</li>
</ul>
</li>



<li>While a certain level of wisdom would likely be required in order to trigger a wisdom explosion, the level might not be that high:
<ul class="wp-block-list">
<li>It’s less about being wise and more about not being so ideological that you are unable to break out of an attractor</li>
</ul>
</li>



<li>As mentioned before, our base models might not need to be particularly large (by the crazy standards of frontier models). There’s a chance that a wisdom explosion could be triggered at a lower capability level than an intelligence explosion<span id='easy-footnote-8-3633' class='easy-footnote-margin-adjust'></span><span class='easy-footnote'><a href='http://aiimpacts.org/some-preliminary-notes-on-the-promise-of-a-wisdom-explosion/#easy-footnote-bottom-8-3633' title='Admittedly, GPT o1 makes this less likely as it indicates a greater role for inference time scaling going forward.'><sup>8</sup></a></span> if wisdom isn’t really about cognitive firepower:
<ul class="wp-block-list">
<li>If this is true, then we may be able to trigger a wisdom explosion earlier than an intelligence explosion</li>



<li>This may also address some concerns about inner alignment if we believe that smaller models tend to be more controllable<span id='easy-footnote-9-3633' class='easy-footnote-margin-adjust'></span><span class='easy-footnote'><a href='http://aiimpacts.org/some-preliminary-notes-on-the-promise-of-a-wisdom-explosion/#easy-footnote-bottom-9-3633' title='Plausible, but unclear'><sup>9</sup></a></span>.</li>
</ul>
</li>



<li>Some people might think that wisdom is too fuzzy to make any progress at all. I’ll discuss this in “An Overview of “Obvious” Approaches to Training Wise AI and I’ll discuss this further in the third post of my upcoming Less Wrong sequence “Against Learned Helplessness With Training Wise AI”.</li>
</ul>
</li>



<li>“Wisdom explosion” as creative stimuli:
<ul class="wp-block-list">
<li>Even if the concept of a wisdom explosion turns out to be incoherent or triggering a wisdom explosion turns out to be impossible, I still think that investigating and debating these topics would be a valuable use of time. I can’t fully explain this, but certain questions feel like obvious or natural questions to ask. Noticing these questions and following the line of inquiry until you reach a natural conclusion is one of the best ways of developing your ability to think clearly about confusing matters.</li>



<li>The value of gaining a new frame isn’t just in the potential application of the frame itself, but in how it can reveal assumptions within your worldview that you may not even be aware of.</li>
</ul>
</li>
</ul>



<h2 class="wp-block-heading">Notes</h2>
]]></content:encoded>
					
		
		
			</item>
		<item>
		<title>An Overview of “Obvious” Approaches to Training Wise AI Advisors</title>
		<link>http://aiimpacts.org/an-overview-of-obvious-approaches-to-training-wise-ai-advisors/</link>
		
		<dc:creator><![CDATA[Katja Grace]]></dc:creator>
		<pubDate>Fri, 25 Oct 2024 02:53:57 +0000</pubDate>
				<category><![CDATA[Essay Competition on the Automation of Wisdom and Philosophy]]></category>
		<guid isPermaLink="false">https://aiimpacts.org/?p=3632</guid>

					<description><![CDATA[By Chris Leong This was a prize-winning entry into the Essay Competition on the Automation of Wisdom and Philosophy. I consider four different “obvious” high-level approaches to training wise AI advisors. I consider imitation learning <a class="mh-excerpt-more" href="http://aiimpacts.org/an-overview-of-obvious-approaches-to-training-wise-ai-advisors/" title="An Overview of “Obvious” Approaches to Training Wise AI Advisors"></a>]]></description>
										<content:encoded><![CDATA[
<p>By Chris Leong</p>



<p><em>This was a prize-winning entry into the Essay Competition on the Automation of Wisdom and Philosophy.</em></p>



<p>I consider four different “obvious” high-level approaches to training wise AI advisors. I consider imitation learning to be the most promising approach as I’ll argue in an upcoming sequence on Less Wrong, however, I’ve tried to take a more balanced approach in these notes.</p>



<p><strong>Approach</strong>:</p>



<ul class="wp-block-list">
<li>Imitation learning: Training imitation learning agents on a bunch of people the lab considers to be wise.
<ul class="wp-block-list">
<li>We’d be fine-tuning a separate base model for each advisor using human demonstrations. Ideally, we’d avoid using any reinforcement learning, but that might not be possible.</li>



<li>Additional training details &#8211; I don&#8217;t know enough about training frontier models to know if this is a good plan, but this is a rough draft plan:
<ul class="wp-block-list">
<li>Train a model on the distribution of Internet data</li>



<li>Fine-tune it on clean data to remove the tendency to occasionally generate rubbish</li>



<li>Fine-tune it according to the kinds of outputs you want it to produce. Low quality is fine at this stage (articles, chat logs)</li>



<li>Fine-tune it on high-quality data (ie. published philosophy essays, chat logs from people having serious discussions where they actually try to answer the question being asked)</li>



<li>Fine-tune it on your data from everyone you identified as wise</li>



<li>Create specific fine tunes (or specific Lora adapters) for each wise individual</li>
</ul>
</li>



<li>Challenges:
<ul class="wp-block-list">
<li>Some of the steps listed above might interfere with the previous steps. For example, some of the data from people identified as wise might come from non-serious discussions.</li>



<li>Maybe it makes sense to add meta-data at the start (ie. serious discussion, person identified as wise) for both training and inference. This might resolve the previous issue.</li>
</ul>
</li>
</ul>
</li>



<li>The Direct Approach: Training an AI to be wise based on human demonstrations and feedback
<ul class="wp-block-list">
<li>We’d most likely use supervised learning and RLHF on a base model.</li>
</ul>
</li>



<li>The Principled Approach: Attempting to understand what wisdom is at a deep principled level and build an AI that provides advice according to those principles:
<ul class="wp-block-list">
<li>While we’d ideally like to develop a complete principled understanding of wisdom, more realistically we’d probably only be able to manage a partial understanding</li>
</ul>
</li>



<li>The Scattergun Approach: This approach involves just throwing a bunch of potentially relevant wise principles and/or anecdotes (nuggets of wisdom) from a fixed set at the deciders in the hope that reading through it will lead to a wise decision:
<ul class="wp-block-list">
<li>A model would be trained to contextually figure out what nuggets to prioritize based on past user ratings likely by using RLHF on a base model.</li>
</ul>
</li>
</ul>



<p><strong>Definitions:</strong></p>



<ul class="wp-block-list">
<li>Safe LLM: I’m quite worried that if we fine-tune an LLM hard on wisdom we’ll simply end up with an LLM that optimizes against us. A safe LLM would be an LLM where we’ve taken steps to reduce the chance of significant adversarial optimization. Ways of achieving this might include limiting the size of the base model, reducing RLHF or avoiding fine-tuning the model too hard.</li>



<li>Wisdom explosion: When a system is able to recursively self-improve its wisdom. This doesn’t have to continue forever, as long as it caps out at a superhuman level. The self-improving system doesn’t have to be a single AI, but may be a cybernetic system consisting of a bunch of operators and AI’s in an organization, or even a network of such organizations. See Some Preliminary Notes on the Promise of a Wisdom Explosion for more details.</li>
</ul>



<p><strong>Considerations:</strong></p>



<ul class="wp-block-list">
<li>Base Power level: How capable is this method of training extremely wise agents?</li>



<li>Feasibility: How practical is it to make such a system?</li>



<li>Adversarial optimization: To what extent do we have to worry that we may be training a system to adversarially optimize against us?</li>



<li>Application of principles: What kind of support does the system provide in figuring out how to apply the principles?</li>



<li>Generalization: How well does this technique generalize out of distribution?</li>



<li>Wisdom explosion potential: Could this approach be useful for recursive self-wisening?</li>



<li>Holisticity:
<ul class="wp-block-list">
<li>I&#8217;m worried that mixing and matching principles from various systems of wisdom can result in a new system that is incredibly unwise, even if each principle is wise within its original system. As an example, Warren Buffet might be able to provide wise advice on how to become wealthy and the Dali Lama wise advice on spiritual development, but perhaps these are two separate paths and what is wise for pursuing one path would be foolish for the other. There are two reasons why I consider holisticity to be good:
<ul class="wp-block-list">
<li>Consistency: Individual views have the advantage of consistency whilst mixing and matching breaks this assumption.</li>



<li>Commitment: Sometimes there are advantages to picking a path, any path, rather than just averaging everything together. As an example, maybe it&#8217;s better to either completely devote myself to pursuing programming or completely devote myself to pursuing art rather than split myself between the two and succeed at neither.</li>
</ul>
</li>
</ul>
</li>
</ul>



<p><strong>Evaluation:</strong></p>



<p>Please keep in mind that my assessments of these techniques on each of the criteria are essentially hot-takes.</p>



<ul class="wp-block-list">
<li>Imitation Learning:
<ul class="wp-block-list">
<li>Evaluation of base proposal:
<ul class="wp-block-list">
<li>Base Power level: Depends hugely on who you are able to train on. The wisest people are quite wise, but you might not be able to obtain their permission to train on their data or to persuade them to collaborate with you.</li>



<li>Feasibility:
<ul class="wp-block-list">
<li>Standard imitation learning isn’t particularly challenging. However, we may need to advance the state of the art in order to obtain sufficiently accurate results.</li>



<li>Even if we advance the state of the art, obtaining sufficiently high-quality data might pose a significant challenge</li>



<li>There are many historical figures with large amounts of data. The major limitation here is that we can’t obtain more if they’re dead. </li>



<li>However, we might only be able to obtain a sufficient level of accuracy with people who are alive and willing to participate in the project. This has the following advantages:
<ul class="wp-block-list">
<li>We can gather data about their responses to the kinds of questions we&#8217;re interested in</li>



<li>We can search for cases where the model is especially unsure of what they&#8217;d say and collect their responses to these questions</li>



<li>We can ask them to take a second look at places where their thought seems contradictory</li>



<li>We can ask them to produce additional chain of thought data even for things that are so basic that they wouldn&#8217;t normally bother stepping through all their reasoning</li>



<li>Contemporary folk can use Wise AI to become wiser, making them better targets to train on</li>
</ul>
</li>
</ul>
</li>



<li>Adversarial optimization:
<ul class="wp-block-list">
<li>Optimizing hard on imitation learning is less likely to be problematic than for other targets:
<ul class="wp-block-list">
<li>Safer target: Incentivizing the AI to fool us into believing that &#8220;X would say Y&#8221; rather than &#8220;Y is true&#8221; is less likely to be harmful</li>



<li>Easier validation: easier to talk to X and learn that they would never say Y rather than learn that Y is not wise which might take a lot of experience and incur significant costs. Even for historical figures, we can withhold part of the data as a validation set.</li>



<li>More reliable data: it is easier to gather a high-quality dataset on what X said than on what is best on some metric (which tends to be unknown for any situation of reasonable complexity).</li>
</ul>
</li>



<li>Inner alignment might still be an issue</li>



<li>If you imitate folks who are opposed to you for whatever reason, then an imitation learning agent trained on them might act adversarially.</li>



<li>If the figures we are training are being compensated to produce training data, then this might push them towards giving you the answer you want. However, this is better than RLHF as they are being compensated for being themselves rather than attempting to either produce or rate outputs according to the company&#8217;s conception of what high-quality data looks like.</li>
</ul>
</li>



<li>Application of principles:
<ul class="wp-block-list">
<li>As an abstraction, sims provide a natural way to hold principles of wisdom along with information about the particular context in which these principles apply. Simulating dialog between these sims provides a natural way of determining which principles are more applicable to the current scenario.</li>
</ul>
</li>



<li>Holisticity:
<ul class="wp-block-list">
<li>Likely pretty good. Sims encourage us to conceive of wisdom as a holistic system rather than just individual principles. However, skeptics might argue that even the wisest humans are incredibly inconsistent.</li>
</ul>
</li>



<li>Generalization:
<ul class="wp-block-list">
<li>Likely very good.</li>



<li>Consulting multiple advisors reduces the impact from any one advisor generalizing poorly.</li>



<li>Humans can invent new principles on the fly, such that we can better adapt to new and unexpected circumstances or cover gaps in our map. I expect this to carry over to the imitation learning approach.</li>



<li>The principled and direct approaches attempt to figure out what wisdom is across all of time and space. In contrast, the simulator attempts to identify figures who are wise within a particular context and then adapt this to the current context. This is a much less challenging problem particularly since we can have the sims talk through how to adapt to the new circumstances.</li>



<li>One potentially useful frame: When we are selecting a figure, we aren&#8217;t just selecting a certain style of in-distribution reasoning, but a certain style of out-of-distribution reasoning. If our curation choices are good, then we might expect out-of-distribution reasoning to be good, whilst if our curation choices are bad, then we might expect out-of-distribution reasoning to be bad.</li>



<li>Going further: We aren&#8217;t just selecting a certain style of out-of-distribution reasoning, but also a certain style of reasoning about whether you are out of distribution.</li>
</ul>
</li>



<li>Wisdom explosion potential:
<ul class="wp-block-list">
<li>Scalable alignment techniques provide significant opportunities for amplification:
<ul class="wp-block-list">
<li>&#8220;What if you knew X?&#8221; in combination with RAG</li>



<li>Self-consistency</li>



<li>Debate</li>



<li>Iterated distillation and amplification</li>
</ul>
</li>



<li>Imitation-based techniques might actually work better with techniques ported over from humans because they’d be more in distribution.</li>
</ul>
</li>



<li>Other advantages:
<ul class="wp-block-list">
<li>Users are less likely to be overly trusting: People will understand that they need to take the advice of imitation agents with a grain of salt, particularly because of the wide range of disagreements between them, while they will more uncritically accept the advice of an AI trained to be wise.</li>



<li>Given the relative ease of imitation learning, if we need to use either the direct or principled approach, I’d recommend implementing imitation-based techniques first and using them to assist:
<ul class="wp-block-list">
<li>These assistants could help us make wise decisions about all aspects of the project, including high-level approach, planning and personal selection</li>



<li>These assistants could help us produce training data for the direct approach or figure out the principles for the principled approach.</li>



<li>These assistants could help us make wise decisions about how to utilize these models and work around their limitations.</li>
</ul>
</li>
</ul>
</li>
</ul>
</li>



<li>Potential mitigations:
<ul class="wp-block-list">
<li>Fixing the lack of historical data:
<ul class="wp-block-list">
<li>If there are different interpretations of a figure&#8217;s work, we can train different agents for the main schools of thought on what they meant</li>



<li>We can ask an expert on these figures to speculate about what they may have said in relation to some of the kinds of questions we&#8217;re interested in. This could be used to reduce the chance of out-of-distribution errors.</li>
</ul>
</li>



<li>Speculative: We might be able to mitigate inner alignment by averaging the weights of a bunch of models. We can then use this as a starting point and do a tiny bit of additional training to get to the real parameters for the model we’re training:
<ul class="wp-block-list">
<li>The average baseline is likely better for imitation learning vs. optimization b/c the average is more likely to be near the ideal solution for the former rather than the latter. I expect that this would make the &#8216;average biasing&#8217; more effective at mitigating inner alignment issues</li>
</ul>
</li>
</ul>
</li>



<li>Most promising variant:
<ul class="wp-block-list">
<li>I’m most optimistic about a variant where swarms of AI advisors are allowed to dynamically self-organize rather than using a fixed structure like debate for amplification.</li>
</ul>
</li>
</ul>
</li>



<li>The Direct Approach:
<ul class="wp-block-list">
<li>Evaluation of base proposal:
<ul class="wp-block-list">
<li>Base Power level: Optimisation is very powerful</li>



<li>Feasibility: Very feasible. This is the standard way of training AI</li>



<li>Adversarial optimization:
<ul class="wp-block-list">
<li>The standard issues of Goodhart’s law are exacerbated when the training target is wisdom.</li>



<li>Wisdom is extremely hard to evaluate:
<ul class="wp-block-list">
<li>Wisdom is highly contested</li>



<li>Wisdom can typically only be validated by examining many different kinds of situations over long periods of time</li>



<li>It&#8217;s very easy to accidentally impose assumptions on a situation without even realizing that you are doing it. The assumptions don&#8217;t even make it to the level of consideration.</li>
</ul>
</li>



<li>Sycophancy:
<ul class="wp-block-list">
<li>The phrasing is especially likely to leak information about the user’s views on questions about wisdom</li>
</ul>
</li>



<li>Ambiguity of meaning: This can have advantages as a wise decision is still wise even if the wisdom mostly came from the user. However, it can go wrong as follows: Adam rates Y as wise assuming it will be understood as Z. Bob interprets as Z&#8217; which is a reasonable interpretation, but incredibly unwise.</li>
</ul>
</li>



<li>Application of principles: Pretty good. You can just get the model to generate outputs. </li>



<li>Holisticity: Quite poor. If we aren’t trusting any one person, we will need many different raters and this will likely merge their views together inconsistently</li>



<li>Generalization: Debatable. Some people might think that this will generalize better because it merges a lot of different views. Others might argue that there will be issues because we’re training it on inconsistent data.</li>



<li>Wisdom explosion potential: Maybe, but I’m dubious. I expect that triggering a wisdom explosion requires embracing a certain degree of subjectivity rather than trying to be objective. </li>
</ul>
</li>



<li>Potential mitigations:
<ul class="wp-block-list">
<li>We could aggressively filter the text used to train the base model to remove</li>



<li>We could produce a number of fine-tunes and use weight averaging to attempt to reduce adversarial optimization.</li>



<li>We could train another model to comment on the model outputs and attempt to identify situations where the model is being sycophantic or manipulative. This could be directly trained or we could provide it with a bunch of rules.</li>



<li>We could train a classifier on the latents to detect sycophancy.</li>



<li>We could attempt to use activation vectors in order to reduce sycophancy.</li>



<li>We could use some kind of self-consistency training to reduce the inconsistency created by training on data coming from multiple individuals.</li>
</ul>
</li>



<li>Most promising variant:
<ul class="wp-block-list">
<li>I suspect that the most promising approach would be a form of defense-in-depth where we just smash all of these different methods together and hope for the best.</li>
</ul>
</li>
</ul>
</li>



<li>The Principled Approach:
<ul class="wp-block-list">
<li>Evaluation of base proposal:
<ul class="wp-block-list">
<li>Base power level: Theoretically quite powerful if you were able to reverse engineer wisdom. Partial solutions are likely much less powerful.</li>



<li>Feasibility:
<ul class="wp-block-list">
<li>Feasibility challenges: wisdom is likely too multifarious to reverse engineer. Most likely result is that the team never gets anywhere near finishing, even by its own standards. It would be easy to spend an entire lifetime studying wisdom:</li>



<li>The issue isn&#8217;t just that the task is massive, it&#8217;s also that it&#8217;s very hard to have a complete map of wisdom without having experienced a huge diversity of different contexts.</li>



<li>My intuition is that this would be a challenge, even if we had fifty years, which we don&#8217;t have. I expect that we would need time to go through multiple paradigms of foundational wisdom research, with each subsequent paradigm identifying massive blind spots in the previous paradigm. Without time to iterate through paradigms, we’ll likely be too localized to the current context and unable to adapt to new circumstances.</li>
</ul>
</li>



<li>Adversarial optimization:
<ul class="wp-block-list">
<li>Much better than in the direct approach, however, unless we develop a method of inserting the principles into an AI directly, we’d still need humans to rate how well the AI is following these principles. I’m pretty worried that this would be too much exposure.</li>



<li>Inner alignment might present a problem.</li>
</ul>
</li>



<li>Application of principles: Likely pretty good since we’re training the AI to learn the principles. </li>



<li>Holisticity: Actually solving wisdom principally would be the best approach in terms of ensuring holistically coherent advice.</li>



<li>Wisdom explosion potential: Decent. There’s a chance that we don’t have to solve all of wisdom, but that identifying some core principles of wisdom would allow us to produce a seed system that could trigger a wisdom explosion.</li>



<li>Generalization:
<ul class="wp-block-list">
<li>Potentially the best if you were actually able to reverse engineer wisdom, but as I said, that’s unlikely.</li>



<li>A partial solution to the principled approach would likely have huge blindspots.</li>
</ul>
</li>
</ul>
</li>



<li>Potential mitigations:
<ul class="wp-block-list">
<li>We could merge the direct approach and the principled approach to cover any gaps by generating new principles. The downside is that this would also allow the AI to directly optimize against us. This would work as follows: use supervised learning on our list of principles and then use RLHF to train the model to produce outputs that will be highly rated. The obvious worry is that introducing RL leaves us vulnerable to being adversarially optimized against, however, there’s a chance that this is safer than the direct approach if we are able to get away with less RL<span id='easy-footnote-1-3632' class='easy-footnote-margin-adjust'></span><span class='easy-footnote'><a href='http://aiimpacts.org/an-overview-of-obvious-approaches-to-training-wise-ai-advisors/#easy-footnote-bottom-1-3632' title='It isn’t clear if this is actually the case. See the discussion &lt;a href=&quot;https://www.lesswrong.com/posts/rZ6wam9gFGFQrCWHc/does-reducing-the-amount-of-rl-for-a-given-capability-level&quot;&gt;here&lt;/a&gt;'><sup>1</sup></a></span>.</li>



<li>One way to reduce the amount of exposure to adversarial optimization would be to limit the AI to identify the most contextually relevant principles, rather than allowing it to generate text explaining how to do this. However, this would greatly limit the ability of the AI to assist with figuring out how to apply the principles (we could use a safe LLM for assistance instead, but this would be less powerful).</li>
</ul>
</li>



<li>Most promising variant:
<ul class="wp-block-list">
<li>Given that you are unlikely to succesfully reverse engineering all of wisdom, I believe that the most promising variant would be aiming to decipher enough principles of wisdom such that you could build a seed AI that could recursively self-wisen.</li>



<li>I’m uncertain whether it would be better to attempt to find a way to directly insert the principles into an AI (I suspect this is basically impossible) or to let the model generate text advising you on how to apply the principles based on human ratings (unlikely to go well due to exposing yourself to adversarial optimization)</li>
</ul>
</li>
</ul>
</li>



<li>The Scattergun Approach
<ul class="wp-block-list">
<li>Evaluation of base proposal:
<ul class="wp-block-list">
<li>Base Power level: Pretty weak. Limited to a set of specific nuggets of wisdom</li>



<li>Feasibility: Very feasible. Not a particularly complicated thing to train.</li>



<li>Adversarial optimization: Even though the optimizer can only select particular nuggets of text it can still adversarially optimize again you to a degree. However, it is much more limited than if it were able to freely generate text<span id='easy-footnote-2-3632' class='easy-footnote-margin-adjust'></span><span class='easy-footnote'><a href='http://aiimpacts.org/an-overview-of-obvious-approaches-to-training-wise-ai-advisors/#easy-footnote-bottom-2-3632' title='Likely comparable to the extent that a model which was able to priortise different imitation agents would be able to optimise against you.'><sup>2</sup></a></span>.</li>



<li>Application of principles: The base proposal provides very limited support in terms of figuring out how to apply these principles compared to the other approaches. It just provides a bunch of disconnected principles.</li>



<li>Holisticity: Provides disconnected nuggets of wisdom. Scores pretty poorly here.</li>



<li>Wisdom explosion potential: Very limited. Such a system like this might be useful for helping us pursue one of the other approaches, but limiting the nuggets of wisdom to a fixed set is a crippling limitation.</li>



<li>Generalization: Rather poor. Has a fixed set of principles.</li>
</ul>
</li>



<li>Potential mitigations:
<ul class="wp-block-list">
<li>We could tilt the optimizer towards favoring advice that would be coherent with the advice already provided. I expect that would help to a degree, but this honestly seems like a fundamental problem with this approach</li>



<li>We could annotate the content with details about the kind of context in which it might be useful. Mitigates it a bit, but this is a very limited solution.</li>



<li>We could allow an LLM to freely generate text advising you on how to apply one of these principles to your particular situation. If this were done, I would have a strong preference for using a safe LLM.  <br>The whole point of the scattergun approach as far as I’m concerned is to limit the set of responses as to mitigate adversarial optimization. At the point where you allow an LLM to optimize hard, I feel that you may as well go with the direct approach, as you’ve exposed yourself to adversarial optimization.</li>
</ul>
</li>



<li>Most promising variant:
<ul class="wp-block-list">
<li>Using a safe LLM to contextually annotate the nuggets of wisdom with notes on how to apply them seems like the most viable variant of this approach.</li>
</ul>
</li>
</ul>
</li>
</ul>



<p><strong>Appendix on the Imitation Learning Approach:</strong><br><br>Because imitation learning approach is difficult to understand I’ve added answers for three of the most common questions. I’ll be explaining this approach in a lot more detail in my upcoming Less Wrong sequence:</p>



<ul class="wp-block-list">
<li>Isn&#8217;t this approach a bit obvious?:
<ul class="wp-block-list">
<li>Yes. That doesn’t mean that it wouldn’t be effective though.</li>
</ul>
</li>



<li>What kind of figures are you talking about?:
<ul class="wp-block-list">
<li>Depends on the exact use case, but there&#8217;s wisdom in all kinds of places. There&#8217;s wise philosophers, wise scientists, wise policy advisors, wise communicators ect.</li>
</ul>
</li>



<li>Isn&#8217;t the subjectivity in selecting figures bad?
<ul class="wp-block-list">
<li>The subjectivity is already there in the direct approach. The fact that we&#8217;re selecting figures just makes this more obvious because humans are highly attuned to anything involving status. Making this more salient is good. These are big decisions and people should be aware of this subjectivity.</li>



<li>Different actors can choose to make use of different subsets of figures. Whilst we could produce multiple different AI&#8217;s with the direct approach, imitation learning has the advantage of being extremely legible in how the result is being produced. As soon as we move to some kind of averaging, we have to deal with the question of how your sample was produced.</li>



<li>Further, if there are multiple projects, each project can make their own selection</li>



<li>After we&#8217;ve chosen some initial figures, we can take advantage of their wisdom to help us figure out who we&#8217;ve missed or what our blindspots are.</li>



<li>If we end up simply using these figures to help us train a wise AI, I would expect many of these choices to wash out and many different figures &#8211; all of whom are wise &#8211; would make similar recommendations. Running self-consistency on the AI might further remove some of these differences.</li>



<li>Framing this slightly differently, if we use techniques like debate well, poor choices are unlikely to have much of an impact.</li>
</ul>
</li>
</ul>



<h2 class="wp-block-heading">Notes</h2>
]]></content:encoded>
					
		
		
			</item>
		<item>
		<title>AI Impacts Quarterly Newsletter, Jan-Mar 2023</title>
		<link>http://aiimpacts.org/ai-impacts-quarterly-newsletter-jan-mar-2023/</link>
					<comments>http://aiimpacts.org/ai-impacts-quarterly-newsletter-jan-mar-2023/#comments</comments>
		
		<dc:creator><![CDATA[Harlan Stewart]]></dc:creator>
		<pubDate>Mon, 17 Apr 2023 22:02:42 +0000</pubDate>
				<category><![CDATA[Blog]]></category>
		<category><![CDATA[blog]]></category>
		<guid isPermaLink="false">https://aiimpacts.org/?p=3539</guid>

					<description><![CDATA[Updates, research, and fundraising <a class="mh-excerpt-more" href="http://aiimpacts.org/ai-impacts-quarterly-newsletter-jan-mar-2023/" title="AI Impacts Quarterly Newsletter, Jan-Mar 2023"></a>]]></description>
										<content:encoded><![CDATA[
<p><em>Harlan Stewart, 17 April 2023</em></p>



<h1 class="wp-block-heading">News</h1>



<h3 class="wp-block-heading">AI Impacts blog</h3>



<p>We moved our <a href="https://blog.aiimpacts.org/">blog</a> to Substack! We think this platform has many advantages, and we’re excited for the blog to live here. You can now easily <a href="https://blog.aiimpacts.org/subscribe">subscribe</a> to the blog to receive regular newsletters as well as various <a href="https://blog.aiimpacts.org/p/we-dont-trade-with-ants">thoughts</a> and <a href="https://blog.aiimpacts.org/p/how-popular-is-chatgpt-part-2-slower">observations</a> related to AI.</p>



<h3 class="wp-block-heading">AI Impacts wiki</h3>



<p>All AI Impacts research pages now reside on the <a href="https://wiki.aiimpacts.org/">AI Impacts Wiki</a>. The wiki aims to document what we know so far about decision-relevant questions about the future of AI. Our pages have always been wiki-like: updatable reference pages organized by topic. We hope that making it an actual wiki will make it clearer to everyone what&#8217;s going on, as well as better to use for this purpose, for both us and readers. We are actively looking for ways to make the wiki even better, and you can help with this by sharing your thoughts in our <a href="https://aiimpacts.org/feedback/">feedback form</a> or in the comments of this blog post!</p>



<h3 class="wp-block-heading">New office</h3>



<p>We recently moved to a new office that we are sharing with <a href="https://far.ai/">FAR AI</a> and other partner organizations. We’re extremely grateful to the team at FAR for organizing this office space, as well as to the Lightcone team for hosting us over the last year and a half.</p>



<h3 class="wp-block-heading">Katja Grace talks about forecasting AI risk at EA Global</h3>



<p>At EA Global Bay Area 2023, Katja gave a talk titled <a href="https://youtu.be/j5Lu01pEDWA">Will AI end everything? A guide to guessing</a> in which she outlined a way to roughly estimate the extent of AI risk.</p>



<h3 class="wp-block-heading">AI Impacts in the Media</h3>



<ul class="wp-block-list">
<li>AI Impacts’ <a href="https://aiimpacts.org/2022-expert-survey-on-progress-in-ai/">2022 Expert Survey on Progress in AI</a> was cited in an <a href="https://youtu.be/qRLrE2tkr2Y">NBC Nightly News segment</a>, an <a href="https://www.bloomberg.com/opinion/articles/2023-04-02/regulating-ai-might-require-a-new-federal-agency">op-ed in Bloomberg</a>, an <a href="https://www.nytimes.com/2023/03/27/opinion/ai-chatgpt-chatbots.html">op-ed in The New York Times</a>, an <a href="https://ourworldindata.org/ai-timelines">article in Our World in Data</a>, and an <a href="https://www.nytimes.com/2023/03/21/podcasts/ezra-klein-podcast-transcript-kelsey-piper.html">interview with Kelsey Piper</a>.</li>



<li>Ezra Klein quoted Katja and separately cited the survey in his New York Times op-ed <a href="https://www.nytimes.com/2023/03/12/opinion/chatbots-artificial-intelligence-future-weirdness.html">This Changes Everything</a>.</li>



<li>Sigal Samuel interviewed Katja for the Vox article <a href="https://www.vox.com/the-highlight/23621198/artificial-intelligence-chatgpt-openai-existential-risk-china-ai-safety-technology">The case for slowing down AI</a>.</li>
</ul>



<h1 class="wp-block-heading">Research and writing highlights</h1>



<h3 class="wp-block-heading">AI Strategy</h3>



<ul class="wp-block-list">
<li>“<a href="https://blog.aiimpacts.org/p/lets-think-about-slowing-down-ai">Let&#8217;s think about slowing down AI</a>” argues that those who are concerned about existential risks from AI should think about strategies that could slow the progress of AI (Katja)</li>



<li>“<a href="https://blog.aiimpacts.org/p/framing-ai-strategy">Framing AI strategy</a>” discusses ten frameworks for thinking about AI strategy. (Zach)</li>



<li>“<a href="https://blog.aiimpacts.org/p/product-safety-is-a-poor-model-for-ai-governance">Product safety is a poor model for AI governance</a>” argues that a common type of policy proposal is inadequate to address the risks of AI. (Rick)</li>



<li>“<a href="https://aiimpacts.org/wp-content/uploads/2023/04/Alexander_Fleming__antibiotic_resistance__and_relevant_lessons_for_the_mitigation_of_risk_from_advanced_artificial_intelligence.pdf">Alexander Fleming and Antibiotic Resistance</a>” is a research report about early efforts to prevent antibiotic resistance and relevant lessons for AI risk. (Harlan)</li>
</ul>



<h3 class="wp-block-heading">Resisted technological temptations: how much economic value has been forgone for safety and ethics in past technologies?</h3>



<ul class="wp-block-list">
<li>“<a href="https://blog.aiimpacts.org/p/what-weve-learned-so-far-from-technological">What we’ve learned so far from our technological temptations project</a>” is a blog post that summarizes the Technological Temptations project and some possible takeaways. (Rick)</li>



<li><a href="https://wiki.aiimpacts.org/doku.php?id=responses_to_ai:technological_inevitability:incentivized_technologies_not_pursued:geoengineering">Geoengineering</a>, <a href="https://wiki.aiimpacts.org/doku.php?id=responses_to_ai:technological_inevitability:incentivized_technologies_not_pursued:nuclear_power">nuclear power</a>, and <a href="https://wiki.aiimpacts.org/doku.php?id=responses_to_ai:technological_inevitability:incentivized_technologies_not_pursued:vaccine_challenge_trials">vaccine challenge trials</a> were evaluated for the amount of value that may have been forgone by not using them. (Jeffrey)</li>
</ul>



<h3 class="wp-block-heading">Public awareness and opinions about AI</h3>



<ul class="wp-block-list">
<li>“<a href="https://blog.aiimpacts.org/p/the-public-supports-regulating-ai-for-safety">The public supports regulating AI for safety</a>” summarizes the results from a survey of the American public about AI. (Zach)</li>



<li>“How popular is ChatGPT?”: <a href="https://blog.aiimpacts.org/p/how-popular-is-chatgpt-part-1-more-popular-than-taylor-swift">Part 1</a> looks at trends in AI-related search volume, and <a href="https://blog.aiimpacts.org/p/how-popular-is-chatgpt-part-2-slower">Part 2</a> refutes a widespread claim about the growth of ChatGPT. (Harlan and Rick)</li>
</ul>



<h3 class="wp-block-heading">The state of AI today: funding, hardware, and capabilities</h3>



<ul class="wp-block-list">
<li>“<a href="https://wiki.aiimpacts.org/doku.php?id=wiki:ai_timelines:ai_inputs:recent_trends_in_ai_investment">Recent trends in funding for AI companies</a>” analyzes data about the amount of funding AI companies have received. (Rick)</li>



<li>“<a href="https://wiki.aiimpacts.org/doku.php?id=ai_timelines:hardware_and_ai_timelines:computing_capacity_of_all_gpus_and_tpus">How much computing capacity exists in GPUs and TPUs in Q1 2023?</a>” uses a back-of-the-envelope calculation to estimate the total amount of compute that exists on all GPUs and TPUs. (Harlan)</li>



<li>“<a href="https://wiki.aiimpacts.org/doku.php?id=uncategorized:capabilities_of_sota_ai">Capabilities of state-of-the-art AI, 2023</a>” is a list of some noteworthy things that state-of-the-art AI can do. (Harlan and Zach)</li>
</ul>



<h3 class="wp-block-heading">Arguments for AI risk</h3>



<ul class="wp-block-list">
<li>Still in progress, “<a href="https://wiki.aiimpacts.org/doku.php?id=arguments_for_ai_risk:is_ai_an_existential_threat_to_humanity:start">Is AI an existential risk to humanity?</a>” is a partially complete page summarizing various arguments for concern about existential risk from AI. A couple of specific arguments are examined more closely in “<a href="https://wiki.aiimpacts.org/doku.php?id=arguments_for_ai_risk:is_ai_an_existential_threat_to_humanity:will_malign_ai_agents_control_the_future:argument_for_ai_x-risk_from_competent_malign_agents:start">Argument for AI x-risk from competent malign agents</a>” and “<a href="https://wiki.aiimpacts.org/doku.php?id=arguments_for_ai_risk:is_ai_an_existential_threat_to_humanity:argument_for_ai_x-risk_from_large_impacts">Argument for AI x-risk from large impacts</a>” (Katja)</li>
</ul>



<h3 class="wp-block-heading">Chaos theory and what it means for AI safety</h3>



<ul class="wp-block-list">
<li>“<a href="https://wiki.aiimpacts.org/doku.php?id=uncategorized:ai_safety_arguments_affected_by_chaos">AI Safety Arguments Affected by Chaos</a>” reasons about ways in which chaos theory could be relevant to predictions about AI, and “<a href="https://wiki.aiimpacts.org/doku.php?id=uncategorized:ai_safety_arguments_affected_by_chaos:chaos_in_humans">Chaos in Humans</a>” explores the theoretical limits to predicting human behavior. The report “<a href="http://aiimpacts.org/wp-content/uploads/2023/04/Chaos-and-Intrinsic-Unpredictability.pdf">Chaos and Intrinsic Unpredictability</a>” provides background, and a <a href="https://blog.aiimpacts.org/p/superintelligence-is-not-omniscience">blog post</a> summarizes the project. (Jeffrey and Aysja)</li>
</ul>



<h3 class="wp-block-heading">Miscellany</h3>



<ul class="wp-block-list">
<li>“<a href="https://aiimpacts.org/how-bad-a-future-do-ml-researchers-expect/">How bad a future do ML researchers expect?</a>” compares experts’ answers in 2016 and 2022 to the question “How positive or negative will the impacts of high-level machine intelligence on humanity be in the long run?” (Katja)</li>



<li>“<a href="https://blog.aiimpacts.org/p/we-dont-trade-with-ants">We don’t trade with ants</a>” (crosspost) disputes the common claim that advanced AI systems won’t trade with humans for the same reason that humans don’t trade with ants. (Katja)</li>
</ul>



<h1 class="wp-block-heading">Funding</h1>



<p>We&#8217;re actively seeking financial support to continue our research and operations for the rest of the year. Previous funding allowed us to expand our research team and hold a summer internship program.</p>



<p>If you want to talk to us about why we should be funded or hear more details about what we would do with money, please write to Elizabeth, Rick, or Katja at [firstname]@aiimpacts.org.</p>



<p>If you&#8217;d like to donate to AI Impacts, you can do so <a href="https://aiimpacts.org/donate/">here</a>. (And we thank you!)<br><br><em>Image credit: Midjourney</em></p>
]]></content:encoded>
					
					<wfw:commentRss>http://aiimpacts.org/ai-impacts-quarterly-newsletter-jan-mar-2023/feed/</wfw:commentRss>
			<slash:comments>3</slash:comments>
		
		
			</item>
		<item>
		<title>What we’ve learned so far from our technological temptations project</title>
		<link>http://aiimpacts.org/what-weve-learned-so-far-from-our-technological-temptations-project/</link>
		
		<dc:creator><![CDATA[richardkorzekwa]]></dc:creator>
		<pubDate>Fri, 14 Apr 2023 00:04:40 +0000</pubDate>
				<category><![CDATA[Blog]]></category>
		<category><![CDATA[blog]]></category>
		<guid isPermaLink="false">https://aiimpacts.org/?p=3536</guid>

					<description><![CDATA[The history of geoengineering, nuclear power, and human challenge trials suggest that social norms and regulation exert powerful forces on the use of technology. <a class="mh-excerpt-more" href="http://aiimpacts.org/what-weve-learned-so-far-from-our-technological-temptations-project/" title="What we’ve learned so far from our technological temptations project"></a>]]></description>
										<content:encoded><![CDATA[
<p><em>Rick Korzekwa, 11 April 2023, updated 13 April 2023</em></p>



<p>At AI Impacts, we’ve been looking into how people, institutions, and society approach novel, powerful technologies. One part of this is our&nbsp;<a href="https://wiki.aiimpacts.org/doku.php?id=responses_to_ai:technological_inevitability:incentivized_technologies_not_pursued:resisted_technological_temptations_project">technological temptations project</a>, in which we are looking into&nbsp;<a href="https://wiki.aiimpacts.org/doku.php?id=responses_to_ai:technological_inevitability:incentivized_technologies_not_pursued:start">cases</a>&nbsp;where some actors had a strong incentive to develop or deploy a technology, but chose not to or showed hesitation or caution in their approach. Our researcher Jeffrey Heninger has recently finished some case studies on this topic, covering geoengineering, nuclear power, and human challenge trials.</p>



<p>This document summarizes the lessons I think we can take from these case studies. Much of it is borrowed directly from Jeffrey’s written analysis or conversations I had with him, some of it is my independent take, and some of it is a mix of the two, which Jeffrey may or may not agree with. All of it relies heavily on his research.</p>



<p>The writing is somewhat more confident than my beliefs. Some of this is very speculative, though I tried to flag the most speculative parts as such.</p>



<h1 class="wp-block-heading">Summary</h1>



<p>Jeffrey Heninger investigated three cases of technologies that create substantial value, but were not pursued or pursued more slowly</p>



<p><strong>The overall scale of value at stake was very large</strong> for these cases, on the order of hundreds of billions to trillions of dollars. But it’s not clear who could capture that value, so it’s not clear whether the temptation was closer to $10B or $1T.</p>



<p><strong>Social norms can generate strong disincentives</strong>&nbsp;for pursuing a technology, especially when combined with enforceable regulation.</p>



<p><strong>Scientific communities</strong>&nbsp;and individuals within those communities seem to have particularly high leverage in steering technological development at early stages.</p>



<p><strong>Inhibiting deployment can inhibit development</strong>&nbsp;for a technology over the long term, at least by slowing cost reductions.</p>



<p><strong>Some of these lessons are transferable to AI</strong>, at least enough to be worth keeping in mind.</p>



<h1 class="wp-block-heading">Overview of cases</h1>



<ol class="wp-block-list">
<li><a href="https://wiki.aiimpacts.org/doku.php?id=responses_to_ai:technological_inevitability:incentivized_technologies_not_pursued:geoengineering">Geoengineering</a>&nbsp;could feasibly provide benefits of $1-10 trillion per year through global warming mitigation, at a cost of $1-10 billion per year, but actors who stand to gain the most have not pursued it, citing a lack of research into its feasibility and safety. Research has been effectively prevented by climate scientists and social activist groups.</li>



<li><a href="https://wiki.aiimpacts.org/doku.php?id=responses_to_ai:technological_inevitability:incentivized_technologies_not_pursued:nuclear_power">Nuclear power</a>&nbsp;has proliferated globally since the 1950s, but many countries have prevented or inhibited the construction of nuclear power plants, sometimes at an annual cost of tens of billions of dollars and thousands of lives. This is primarily done through legislation, like Italy’s ban on all nuclear power, or through costly regulations, like safety oversight in the US that has increased the cost of plant construction in the US by a factor of ten.</li>



<li><a href="https://wiki.aiimpacts.org/doku.php?id=responses_to_ai:technological_inevitability:incentivized_technologies_not_pursued:vaccine_challenge_trials">Human challenge trials</a>&nbsp;may have accelerated deployment of covid vaccines by more than a month, saving many thousands of lives and billions or trillions of dollars. Despite this, the first challenge trial for a covid vaccine was not performed until after several vaccines had been tested and approved using traditional methods. This is consistent with the historical rarity of challenge trials, which seems to be driven by ethical concerns and enforced by institutional review boards.</li>
</ol>



<h1 class="wp-block-heading">Scale</h1>



<p>The first thing to notice about these cases is the scale of value at stake. Mitigating climate change could be worth hundreds of billions or trillions of dollars per year, and deploying covid vaccines a month sooner could have saved many thousands of lives. While these numbers do not represent a major fraction of the global economy or the overall burden of disease, they are large compared to many relevant scales for AI risk. The world’s most valuable companies have market caps of a few trillion dollars, and the entire world spends around two trillion dollars per year on defense. In comparison, annual funding for AI is on the order of $100B.<span id='easy-footnote-1-3536' class='easy-footnote-margin-adjust'></span><span class='easy-footnote'><a href='http://aiimpacts.org/what-weve-learned-so-far-from-our-technological-temptations-project/#easy-footnote-bottom-1-3536' title='See our page on &lt;a href=&quot;https://wiki.aiimpacts.org/doku.php?id=wiki:ai_timelines:ai_inputs:recent_trends_in_ai_investment&quot;&gt;funding for AI companies&lt;/a&gt; and the &lt;a href=&quot;https://aiindex.stanford.edu/wp-content/uploads/2023/04/HAI_AI-Index-Report_2023.pdf&quot;&gt;2023 AI Index report&lt;/a&gt;.'><sup>1</sup></a></span>



<figure class="wp-block-image"><a href="https://substackcdn.com/image/fetch/f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbc055821-ecf1-4261-b3be-268c312627ce_2288x1240.png" target="_blank" rel="noreferrer noopener"><img decoding="async" src="https://substackcdn.com/image/fetch/w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbc055821-ecf1-4261-b3be-268c312627ce_2288x1240.png" alt=""/></a><figcaption class="wp-element-caption">Comparison between the potential gains from mitigating global warming and deploying covid vaccines faster. These items were somewhat arbitrarily chosen, and most of the numbers were not carefully researched, but they should be in the right ballpark.</figcaption></figure>



<p>Setting aside for the moment who could capture the value from a technology and whether the reasons for delaying or forgoing its development are rational or justified, I think it is worth recognizing that the potential upsides are large enough to create strong incentives.</p>



<h1 class="wp-block-heading">Social norms</h1>



<p>My read on these cases is that a strong determinant for whether a technology will be pursued is social attitudes toward the technology and its regulation. I’m not sure what would have happened if Pfizer had, in defiance of FDA standards and medical ethics norms, infected volunteers with covid as part of their vaccine testing, but I imagine it would have been more severe than fines or difficulty obtaining FDA approval. They would have lost standing in the medical community and possibly been unable to continue existing as a company. This goes similarly for other technologies and actors. Building nuclear power plants without adhering to safety standards is so far outside the range of acceptable actions that even&nbsp;<em>suggesting</em>&nbsp;it as a strategy for running a business or addressing climate change is a serious risk to reputation for a CEO or public official. An oil company executive who finances a project to disperse aerosols into the upper atmosphere to reduce global warming and protect his business sounds like a Bond movie villain.</p>



<figure class="wp-block-image is-resized"><a href="https://substackcdn.com/image/fetch/f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F046fcc4a-46a1-4b98-9475-739de80813c2_826x1116.png" target="_blank" rel="noreferrer noopener"><img loading="lazy" decoding="async" src="https://substackcdn.com/image/fetch/w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F046fcc4a-46a1-4b98-9475-739de80813c2_826x1116.png" alt="" width="423" height="572"/></a></figure>



<p>This is not to suggest that social norms are infinitely strong or that they are always well-aligned with society’s interests. Governments and corporations will do things that are widely viewed as unethical if they think they can get away with it, for example, by doing it in secret.<span id='easy-footnote-2-3536' class='easy-footnote-margin-adjust'></span><span class='easy-footnote'><a href='http://aiimpacts.org/what-weve-learned-so-far-from-our-technological-temptations-project/#easy-footnote-bottom-2-3536' title='Biological weapons research by the USSR is the best example of this that comes to mind.'><sup>2</sup></a></span> And I think that public support for our current nuclear safety regime is gravely mistaken. But strong social norms, either against a technology or against breaking regulations do seem able, at least in some cases, to create incentives strong enough to constrain valuable technologies.</p>



<h2 class="wp-block-heading">The public</h2>



<p>The public plays a major role in defining and enforcing the range of acceptable paths for technology. Public backlash in response to early challenge trials set the stage for our current ethics standards, and nuclear power faces crippling safety regulations in large part because of public outcry in response to a perceived lack of acceptable safety standards. In both of these cases, the result was not just the creation of regulations, but strong buy-in and a souring of public opinion on a broad category of technologies.<span id='easy-footnote-3-3536' class='easy-footnote-margin-adjust'></span><span class='easy-footnote'><a href='http://aiimpacts.org/what-weve-learned-so-far-from-our-technological-temptations-project/#easy-footnote-bottom-3-3536' title='More speculatively, this may be important for geoengineering. Small advocacy groups were able to stop experiments with solar radiation management for reasons that are still not completely clear to me, but I think part of it is public suspicion toward attempts to manipulate the environment.'><sup>3</sup></a></span>



<p>Although public opposition can be a powerful force in expelling things from the Overton window, it does not seem easy to predict or steer. The Chernobyl disaster made a strong case for designing reactors in a responsible way, but it was instead viewed by much of the public as a demonstration that nuclear power should be abolished entirely. I do not have a strong take on how hard this problem is in general, but I do think it is important and should be investigated further.</p>



<h2 class="wp-block-heading">The scientific community</h2>



<p>The precise boundaries of acceptable technology are defined in part by the scientific community, especially when technologies are very early in development. Policy makers and the public tend to defer to what they understand to be the official, legible scientific view when deciding what is or is not okay. This does not always match with actual views of scientists.</p>



<p>Geoengineering as an approach to reducing global warming has not been recommended by the IPCC, and a minority of climate scientists support research into geoengineering. Presumably the advocacy groups opposing geoengineering experiments would have faced a tougher battle if the official stance from the climate science community were in favor of geoengineering.</p>



<p>One interesting aspect of this is that scientific communities are small and heavily influenced by individual prestigious scientists. The taboo on geoengineering research was broken by the editor of a major climate journal, after which the number of papers on the topic increased by more than a factor of 20 after two years.<span id='easy-footnote-4-3536' class='easy-footnote-margin-adjust'></span><span class='easy-footnote'><a href='http://aiimpacts.org/what-weve-learned-so-far-from-our-technological-temptations-project/#easy-footnote-bottom-4-3536' title='Oldham, Paul, Bronislaw Szerszynski, Jack Stilgoe, Calum Brown, Bella Eacott, and Andy Yuille. &amp;#8220;Mapping the landscape of climate engineering.&amp;#8221; &lt;em&gt;Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences&lt;/em&gt; 372, no. 2031 (2014): 20140065.'><sup>4</sup></a></span>



<figure class="wp-block-image is-resized"><a href="https://substackcdn.com/image/fetch/f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fac7108f6-6580-4936-85fa-97def23222c1_1060x656.png" target="_blank" rel="noreferrer noopener"><img loading="lazy" decoding="async" src="https://substackcdn.com/image/fetch/w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fac7108f6-6580-4936-85fa-97def23222c1_1060x656.png" alt="" width="744" height="460"/></a><figcaption class="wp-element-caption">Scientific papers published on solar radiation management by year. Paul Crutzen, an influential climate scientist, published a highly-cited paper on the use of aerosols to mitigate global warming in 2006. Oldham, et al 2014.</figcaption></figure>



<p>I suspect the public and policymakers are not always able to tell the difference between the official stance of regulatory bodies and the consensus of scientific communities. My impression is that scientific consensus is not in favor of radiation health models used by the Nuclear Regulatory Commission, but many people nonetheless believe that such models are sound science.</p>



<h2 class="wp-block-heading">Warning shots</h2>



<p>Past incidents like the Fukushima disaster and the Tuskegee syphilis study are frequently cited by opponents of nuclear power and human challenge trials. I think this may be significant, because it suggests that these “warning shots” have done a lot to shape perception of these technologies, even decades later. One interpretation of this is that, regardless of why someone is opposed to something, they benefit from citing memorable events when making their case. Another, non-competing interpretation is that these events are causally important in the trajectory of these technologies’ development and the public’s perception of them.</p>



<p>I’m not sure how to untangle the relative contribution of these effects, but either way, it suggests that such incidents are important for shaping and preserving norms around the deployment of technology.</p>



<h2 class="wp-block-heading">Locality</h2>



<p>In general, social norms are local. Building power plants is much more acceptable in France than it is in Italy. Even if two countries allow the construction of nuclear power plants and have similarly strong norms against breaking nuclear safety regulations, those safety regulations may be different enough to create a large difference in plant construction between countries, as seen with the US and France.</p>



<p>Because scientific communities have members and influence across international borders, they may have more sway over what happens globally (as we’ve seen with geoengineering), but this may be limited by local differences in the acceptability of going against scientific consensus.</p>



<h1 class="wp-block-heading">Development trajectories</h1>



<p>A common feature of these cases is that preventing or limiting deployment of the technology inhibited its development. Because less developed technologies are less useful and harder to trust, this seems to have helped reduce deployment.</p>



<p>Normally, things become cheaper to make as we make more of them in a somewhat predictable way. The cost goes down with the total amount that has been produced, following a power law. This is what has been happening with solar and wind power.<span id='easy-footnote-5-3536' class='easy-footnote-margin-adjust'></span><span class='easy-footnote'><a href='http://aiimpacts.org/what-weve-learned-so-far-from-our-technological-temptations-project/#easy-footnote-bottom-5-3536' title='Bolinger, Mark, Ryan Wiser, and Eric O&amp;#8217;Shaughnessy. &amp;#8220;Levelized cost-based learning analysis of utility-scale wind and solar in the United States.&amp;#8221; &lt;em&gt;Iscience&lt;/em&gt; 25, no. 6 (2022): 104378.'><sup>5</sup></a></span>



<figure class="wp-block-image"><a href="https://substackcdn.com/image/fetch/f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd0da2e9a-ee8d-41ab-a85b-d813aa76b1a3_2208x1094.jpeg" target="_blank" rel="noreferrer noopener"><img decoding="async" src="https://substackcdn.com/image/fetch/w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd0da2e9a-ee8d-41ab-a85b-d813aa76b1a3_2208x1094.jpeg" alt=""/></a><figcaption class="wp-element-caption">Levelized cost of energy for wind and solar power, as a function of total capacity built. Levelized cost includes cost building, operating, and maintaining wind and solar farms. Bolinger 2022</figcaption></figure>



<p>Initially, building nuclear power plants seems to have become cheaper in the usual way for new technology—doubling the total capacity of nuclear power plants reduced the cost per kilowatt by a constant fraction. Starting around 1970, regulations and public opposition to building plants did more than increase construction costs in the near term. By reducing the number of plants built and inhibiting small-scale design experiments, it slowed the development of the technology, and correspondingly reduced the rate at which we learned to build plants cheaply and safely.<span id='easy-footnote-6-3536' class='easy-footnote-margin-adjust'></span><span class='easy-footnote'><a href='http://aiimpacts.org/what-weve-learned-so-far-from-our-technological-temptations-project/#easy-footnote-bottom-6-3536' title='Lang, Peter A. 2017. &amp;#8220;Nuclear Power Learning and Deployment Rates; Disruption and Global Benefits Forgone&amp;#8221; &lt;em&gt;Energies&lt;/em&gt; 10, no. 12: 2169. https://doi.org/10.3390/en10122169'><sup>6</sup></a></span> Absent reductions in cost, they continue to be uncompetitive with other power generating technologies in many contexts.</p>



<figure class="wp-block-image is-resized"><a href="https://substackcdn.com/image/fetch/f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe22c137b-de49-4734-9a2d-cdf06cd290f4_1863x2426.png" target="_blank" rel="noreferrer noopener"><img loading="lazy" decoding="async" src="https://substackcdn.com/image/fetch/w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe22c137b-de49-4734-9a2d-cdf06cd290f4_1863x2426.png" alt="" width="567" height="738"/></a><figcaption class="wp-element-caption">Nuclear power in France and the US followed typical cost reduction curves until roughly 1970, after which they showed the opposite behavior. However, France showed a much more gradual increase. Lang 2017.</figcaption></figure>



<p>Because solar radiation management acts on a scale of months-to-years and the costs of global warming are not yet very high, I am not surprised that we have still not deployed it. But this does not explain the lack of research, and one of the reasons given for opposition to experiments is that it has not been shown to be safe. But the reason we lack evidence on safety is because research has been opposed, even at small scales.</p>



<p>It is less clear to me how much the relative lack of human challenge trials in the past<span id='easy-footnote-7-3536' class='easy-footnote-margin-adjust'></span><span class='easy-footnote'><a href='http://aiimpacts.org/what-weve-learned-so-far-from-our-technological-temptations-project/#easy-footnote-bottom-7-3536' title='There were at least 60 challenge trials globally between 1970 and 2018 spread across 25 pathogens. According to the WHO, there have been 6,000 intervention-based clinical trials just for covid (though keep in mind the fraction of these that would benefit from deliberately infecting patients may be fairly small)'><sup>7</sup></a></span> has made us less able to do them well now. I’m also not sure how much a stronger past record of challenge trials would cause them to be viewed more positively. Still, absent evidence that medical research methodology does not improve in the usual way with quantity of research, I expect we are at least somewhat less effective at performing human challenge trials than we otherwise would be.</p>



<h1 class="wp-block-heading">Separating safety decisions from gains of deployment</h1>



<p>I think it’s impressive that regulatory bodies are able to prevent use of technology even when the cost of doing so is on the scale of many billions, plausibly&nbsp;<em>trillions</em>&nbsp;of dollars. One of the reasons this works seems to be that regulators will be blamed if they approve something and it goes poorly, but they will not receive much credit if things go well. Similarly, they will not be held accountable for failing to approve something good. This creates strong incentives for avoiding negative outcomes while creating little incentive to seek positive outcomes. I’m not sure if this asymmetry was deliberately built into the system or if it is a side effect of other incentive structures (e.g, at the level of politics, there is more benefit from placing blame than there is from giving credit), but it is a force to be reckoned with, especially in contexts where there is a strong social norm against disregarding the judgment of regulators.</p>



<h1 class="wp-block-heading">Who stands to gain</h1>



<p>It is hard to assess which actors are actually tempted by a technology. While society at large could benefit from building more nuclear power plants, much of the benefit would be dispersed as public health gains, and it is difficult for any particular actor to capture that value. Similarly, while many deaths could have been prevented if the covid vaccines had been available two months earlier, it is not clear if this value could have been captured by Pfizer or Moderna–demand for vaccines was not changing that quickly.</p>



<p>On the other hand, not all the benefits are external–switching from coal to nuclear power in the US could save tens of billions of dollars a year, and drug companies pay billions of dollars per year for trials. Some government institutions and officials have the&nbsp;<em>stated</em>&nbsp;goal of creating benefits like public health, in addition to economic and reputational stakes in outcomes like the quick deployment of vaccines during a pandemic. These institutions pay costs and make decisions on the basis of economic and health gains from technology (for example, subsidizing photovoltaics and obesity research), suggesting they have incentive to create that value.</p>



<p>Overall, I think this lack of clarity around incentives and capture of value is the biggest reason for doubt that these cases demonstrate strong resistance to technological temptation.</p>



<h1 class="wp-block-heading">What this means for AI</h1>



<p>How well these cases generalize to AI will depend on facts about AI that are not yet known. For example, if powerful AI requires large facilities and easily-trackable equipment, I think we can expect lessons from nuclear power to be more transferable than if it can be done at a smaller scale with commonly-available materials. Still, I think some of what we’ve seen in these cases will transfer to AI, either because of similarity with AI or because they reflect more general principles.</p>



<h2 class="wp-block-heading">Social norms</h2>



<p>The main thing I expect to generalize is the power of social norms to constrain technological development. While it is far from guaranteed to prevent irresponsible AI development, especially if building dangerous AI is not seen as a major transgression everywhere that AI is being developed, it does seem like the world is much safer if building AI in defiance of regulations is seen as similarly villainous to building nuclear reactors or infecting study participants without authorization. We are not at that point, but the public does seem prepared to support concrete limits on AI development.</p>



<figure class="wp-block-image is-resized"><a href="https://substackcdn.com/image/fetch/f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F93425a8d-eba1-4bf6-9ab6-4b877ceb4728_1454x1994.png" target="_blank" rel="noreferrer noopener"><img loading="lazy" decoding="async" src="https://substackcdn.com/image/fetch/w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F93425a8d-eba1-4bf6-9ab6-4b877ceb4728_1454x1994.png" alt="" width="403" height="552"/></a><figcaption class="wp-element-caption"><a href="https://twitter.com/YouGovAmerica/status/1642972200746602499">Source</a>                                                                  </figcaption></figure>



<p>I do think there are reasons for pessimism about norms constraining AI. For geoengineering, the norms worked by tabooing a particular topic in a research community, but I’m not sure if this will work with a technology that is no longer in such an early stage. AI already has a large body of research and many people who have already invested their careers in it. For medical and nuclear technology, the norms are powerful because they enforce adherence to regulations, and those regulations define the constraints. But it can be hard to build regulations that create the right boundaries around technology, especially something as imprecise-defined as AI. If someone starts building a nuclear power plant in the US, it will become clear relatively early on that this is what they are doing, but a datacenter training an AI and a datacenter updating a search engine may be difficult to tell apart.</p>



<p>Another reason for pessimism is tolerance for failure. Past technologies have mostly carried risks that scaled with how much of the technology was built. For example, if you’re worried about nuclear waste, you probably think two power plants are about twice as bad as one. While risk from AI may turn out this way, it may be that a single powerful system poses a global risk. If this does turn out to be the case, then even if strong norms combine with strong regulation to achieve the same level of success as for nuclear power, it still will not be adequate.</p>



<h2 class="wp-block-heading">Development gains from deployment</h2>



<p>I’m very uncertain how much development of dangerous AI will be hindered by constraints on deployment. I think approximately all technologies face some limitations like this, in some cases very severe limitations, as we’ve seen with nuclear power. But we’re mainly interested in the gains to development toward dangerous systems, which may be possible to advance with little deployment. Adding to the uncertainty, there is ambiguity where the line is drawn between testing and deployment or whether allowing the deployment of verifiably safe systems will provide the gains needed to create dangerous systems.</p>



<h2 class="wp-block-heading">Separating safety decisions from gains</h2>



<p>I do not see any particular reason to think that asymmetric justice will operate differently with AI, but I am uncertain whether regulatory systems around AI, if created, will have such incentives. I think it is worth thinking about IRB-like models for AI safety.</p>



<h2 class="wp-block-heading">Capture of value</h2>



<p>It is obvious there are actors who believe they can capture substantial value from AI (for example Microsoft recently invested $10B in OpenAI), but I’m not sure how this will go as AI advances. By default, I expect the value created by AI to be more straightforwardly capturable than for nuclear power or geoengineering, but I’m not sure how it differs from drug development.</p>



<p><em>Social preview image: German anti-nuclear power protesters in 2012. Used under Creative Commons license from </em><a href="https://www.flickr.com/photos/gruene_bawue/6982014963/">Bündnis 90/Die Grünen Baden-Württemberg Flickr</a></p>



<h2 class="wp-block-heading">Notes</h2>
]]></content:encoded>
					
		
		
			</item>
		<item>
		<title>Superintelligence Is Not Omniscience</title>
		<link>http://aiimpacts.org/superintelligence-is-not-omniscience/</link>
					<comments>http://aiimpacts.org/superintelligence-is-not-omniscience/#comments</comments>
		
		<dc:creator><![CDATA[Jeffrey Heninger]]></dc:creator>
		<pubDate>Fri, 07 Apr 2023 16:25:58 +0000</pubDate>
				<category><![CDATA[Blog]]></category>
		<category><![CDATA[blog]]></category>
		<guid isPermaLink="false">https://aiimpacts.org/?p=3530</guid>

					<description><![CDATA[Chaos theory allows us to rigorously show that there are ceilings on our abilities to make some prediction. This post introduces an investigation which explores the relationship between chaos and intelligence in more detail.  <a class="mh-excerpt-more" href="http://aiimpacts.org/superintelligence-is-not-omniscience/" title="Superintelligence Is Not Omniscience"></a>]]></description>
										<content:encoded><![CDATA[
<p><em>Jeffrey Heninger and Aysja Johnson, 7 April 2023</em></p>



<h3 class="wp-block-heading">The Power of Intelligence</h3>



<p>It is often implicitly assumed that the power of a superintelligence will be practically unbounded. There seems like there could be &#8220;ample headroom&#8221; above humans, i.e. that a superintelligence will be able to vastly outperform us across virtually all domains.</p>



<p>By &#8220;superintelligence,&#8221; I mean something which has arbitrarily high cognitive ability, or an arbitrarily large amount of compute, memory, bandwidth, etc., but which is bound by the physical laws of our universe.<span id='easy-footnote-1-3530' class='easy-footnote-margin-adjust'></span><span class='easy-footnote'><a href='http://aiimpacts.org/superintelligence-is-not-omniscience/#easy-footnote-bottom-1-3530' title='In this post, &amp;#8220;we&amp;#8221; refers to humanity, while &amp;#8220;I&amp;#8221; refers to the authors: Jeffrey Heninger and Aysja Johnson.'><sup>1</sup></a></span> There are other notions of &#8220;superintelligence&#8221; which are weaker than this. Limitations of the abilities of this superintelligence would also apply to anything less intelligent.</p>



<p>There are some reasons to believe this assumption. For one, it seems a bit suspicious to assume that humans have close to the maximal possible intelligence. Secondly, AI systems already outperform us in some tasks,<span id='easy-footnote-2-3530' class='easy-footnote-margin-adjust'></span><span class='easy-footnote'><a href='http://aiimpacts.org/superintelligence-is-not-omniscience/#easy-footnote-bottom-2-3530' title='&lt;a href=&quot;https://wiki.aiimpacts.org/doku.php?id=uncategorized:capabilities_of_sota_ai&quot;&gt;&lt;em&gt;Capabilities of state-of-the-art AI, 2023&lt;/em&gt;&lt;/a&gt;&lt;em&gt;.&lt;/em&gt;'><sup>2</sup></a></span> so why not suspect that they will be able to outperform us in almost all of them? Finally, there is a more fundamental notion about the predictability of the world, described most famously by Laplace in 1814:</p>



<blockquote class="wp-block-quote is-layout-flow wp-block-quote-is-layout-flow">
<p>Given for one instant an intelligence which could comprehend all the forces by which nature is animated and the respective situation of the beings who compose it &#8211; an intelligence sufficiently vast to submit this data to analysis &#8211; it would embrace in the same formula the movements of the greatest bodies of the universe and those of the lightest atom; for it, nothing would be uncertain and the future, as the past, would be present in its eyes.<span id='easy-footnote-3-3530' class='easy-footnote-margin-adjust'></span><span class='easy-footnote'><a href='http://aiimpacts.org/superintelligence-is-not-omniscience/#easy-footnote-bottom-3-3530' title='The quote continues: &amp;#8220;The human mind offers, in the perfection which it has been able to give to astronomy, a feeble idea of this intelligence. Its discoveries in mechanics and geometry, added to that of universal gravity, have enabled it to comprehend in the same analytic expressions the past and future states of the system of the world. Applying the same method to some other objects of its knowledge, it has succeeded in referring to general laws observed phenomena and in foreseeing those which given circumstances ought to produce. All these efforts in the search for truth tend to lead it back continually to the vast intelligence which we have just mentioned, but from which it will always remain infinitely removed. This tendency, peculiar to the human race, is that which renders it superior to animals; and their progress in this respect distinguishes nations and ages and constitutes their true glory.&amp;#8221;&lt;br&gt;Laplace. &lt;em&gt;Philosophical Essay on Probabilities.&lt;/em&gt; (1814) p. 4. &lt;a href=&quot;https://en.wikisource.org/wiki/A_Philosophical_Essay_on_Probabilities&quot;&gt;https://en.wikisource.org/wiki/A_Philosophical_Essay_on_Probabilities&lt;/a&gt;.'><sup>3</sup></a></span>
</blockquote>



<p>We are very far from completely understanding, and being able to manipulate, everything we care about. But if the world is as predictable as Laplace suggests, then we should expect that a sufficiently intelligent agent would be able to take advantage of that regularity and use it to excel at any domain.</p>



<p>This investigation questions that assumption. Is it actually the case that a superintelligence has practically unbounded intelligence, or are there &#8220;ceilings&#8221; on what intelligence is capable of? To foreshadow a bit, there are ceilings in some domains that we care about, for instance, in predictions about the behavior of the human brain. Even unbounded cognitive ability does not imply unbounded skill when interacting with the world. For this investigation, I focus on cognitive skills, especially predicting the future. This seems like a realm where a superintelligence would have an unusually large advantage (compared to e.g. skills requiring dexterity), so restrictions on its skill here are more surprising.</p>



<p>There are two ways for there to be only a small amount of headroom above human intelligence. The first is that the task is so easy that humans can do it almost perfectly, like playing tic-tac-toe. The second is that the task is so hard that there is a &#8220;low ceiling&#8221;: even a superintelligence is incapable of being very good at it. This investigation focuses on the second.</p>



<p>There are undoubtedly many tasks where there is still ample headroom above humans. But there are also some tasks for which we can prove that there is a low ceiling. These tasks provide some limitations on what is possible, even with arbitrarily high intelligence.</p>



<h3 class="wp-block-heading">Chaos Theory</h3>



<p>The main tool used in this investigation is chaos theory. Chaotic systems are things for which uncertainty grows exponentially in time. Most of the information measured initially is lost after a finite amount of time, so reliable predictions about its future behavior are impossible.</p>



<p>A classic example of chaos is the weather. Weather is fairly predictable for a few days. Large simulations of the atmosphere have gotten consistently better for these short-time predictions.<span id='easy-footnote-4-3530' class='easy-footnote-margin-adjust'></span><span class='easy-footnote'><a href='http://aiimpacts.org/superintelligence-is-not-omniscience/#easy-footnote-bottom-4-3530' title='Interestingly, the trend appears linear. My guess is that the linear trend is a combination of exponentially more compute being used and the problem getting exponentially harder.&lt;br&gt;Nate Silver. &lt;em&gt;The Signal and the Noise. &lt;/em&gt;(2012) p. 126-132.'><sup>4</sup></a></span>



<p> After about 10 days, these simulations become useless. The predictions from the simulations are worse than guessing what the weather might be using historical climate data from that location.</p>



<p>Chaos theory provides a response to Laplace. Even if it were possible to exactly predict the future given exact initial conditions and equations of motion,<span id='easy-footnote-5-3530' class='easy-footnote-margin-adjust'></span><span class='easy-footnote'><a href='http://aiimpacts.org/superintelligence-is-not-omniscience/#easy-footnote-bottom-5-3530' title=' Whether or not this statement of determinism is true is a perennial debate among scholars. I will not go into it here.'><sup>5</sup></a></span> chaos makes it impossible to approximately predict the future using approximate initial conditions and equations of motion. Reliable predictions can only be made for a short period of time, but not once the uncertainty has grown large enough.</p>



<p>There is always some small uncertainty. Normally, we do not care: approximations are good enough. But when there is chaos, the small uncertainties matter. There are many ways small uncertainties can arise: Every measuring device has a finite precision.<span id='easy-footnote-6-3530' class='easy-footnote-margin-adjust'></span><span class='easy-footnote'><a href='http://aiimpacts.org/superintelligence-is-not-omniscience/#easy-footnote-bottom-6-3530' title='The most precise measurement ever is of the magnetic moment of the electron, with 9 significant digits.&lt;br&gt;&lt;em&gt;NIST Reference on Constants, Units, and Uncertainty. &lt;/em&gt;&lt;a href=&quot;https://physics.nist.gov/cgi-bin/cuu/Value?muem&quot;&gt;https://physics.nist.gov/cgi-bin/cuu/Value?muem&lt;/a&gt;.'><sup>6</sup></a></span> Every theory should only be trusted in the regimes where it has been tested. Every algorithm for evaluating the solution has some numerical error. There are external forces you are not considering that the system is not fully isolated from. At small enough scales, thermal noise and quantum effects provide their own uncertainties. Some of this uncertainty could be reduced, allowing reliable predictions to be made for a bit longer.<span id='easy-footnote-7-3530' class='easy-footnote-margin-adjust'></span><span class='easy-footnote'><a href='http://aiimpacts.org/superintelligence-is-not-omniscience/#easy-footnote-bottom-7-3530' title='Because the uncertainty grows exponentially with time, if you try to make longer-term predictions by reducing the initial uncertainty, you will only get logarithmic returns.'><sup>7</sup></a></span> Other sources of this uncertainty cannot be reduced. Once these microscopic uncertainties have grown to a macroscopic scale, the motion of the chaos is inherently unpredictable.</p>



<p>Completely eliminating the uncertainty would require making measurements with perfect precision, which does not seem to be possible in our universe. We can prove that fundamental sources of uncertainty make it impossible to know important things about the future, even with arbitrarily high intelligence. Atomic scale uncertainty, which is guaranteed to exist by Heisenberg’s Uncertainty Principle, can make macroscopic motion unpredictable in a surprisingly short amount of time. Superintelligence is not omniscience.</p>



<p>Chaos theory thus allows us to rigorously show that there are ceilings on some particular abilities. If we can prove that a system is chaotic, then we can conclude that the system offers diminishing returns to intelligence. Most predictions of the future of a chaotic system are impossible to make reliably. Without the ability to make better predictions, and plan on the basis of these predictions, intelligence becomes much less useful.</p>



<p>This does not mean that intelligence becomes useless, or that there is nothing about chaos which can be reliably predicted.&nbsp;</p>



<p>For relatively simple chaotic systems, even when what in particular will happen is unpredictable, it is possible to reliably predict the statistics of the motion.<span id='easy-footnote-8-3530' class='easy-footnote-margin-adjust'></span><span class='easy-footnote'><a href='http://aiimpacts.org/superintelligence-is-not-omniscience/#easy-footnote-bottom-8-3530' title='If the statistics are predictable, this can allow us to make a coarse-grained model for the behavior at a larger scale which is not affected by the uncertainties amplified by the chaos.'><sup>8</sup></a></span> We have learned sophisticated ways of predicting the statistics of chaotic motion,<span id='easy-footnote-9-3530' class='easy-footnote-margin-adjust'></span><span class='easy-footnote'><a href='http://aiimpacts.org/superintelligence-is-not-omniscience/#easy-footnote-bottom-9-3530' title='Described in the report &lt;a href=&quot;http://aiimpacts.org/wp-content/uploads/2023/04/Chaos-and-Intrinsic-Unpredictability.pdf&quot;&gt;Chaos and Intrinsic Unpredictability&lt;/a&gt;.'><sup>9</sup></a></span> and a superintelligence could be better at this than we are. It is also relatively easy to sample from this distribution to emulate behavior which is qualitatively similar to the motion of the original chaotic system.</p>



<p>But chaos can also be more complicated than this. The chaos might be non-stationary, which means that the statistical distribution and qualitative description of the motion themselves change unpredictably in time. The chaos might be multistable, which means that it can do statistically and qualitatively different things depending on how it starts. In these cases, it is also impossible to reliably predict the statistics of the motion, or to emulate a typical example of a distribution which is itself changing chaotically. Even in these cases, there are sometimes still patterns in the chaos which allow a few predictions to be made, like the energy spectra of fluids.<span id='easy-footnote-10-3530' class='easy-footnote-margin-adjust'></span><span class='easy-footnote'><a href='http://aiimpacts.org/superintelligence-is-not-omniscience/#easy-footnote-bottom-10-3530' title='Also described in &lt;a href=&quot;http://aiimpacts.org/wp-content/uploads/2023/04/Chaos-and-Intrinsic-Unpredictability.pdf&quot;&gt;Chaos and Intrinsic Unpredictability&lt;/a&gt;.'><sup>10</sup></a></span> These patterns are hard to find, and it is possible that a superintelligence could find patterns that we have missed. But it is not possible for the superintelligence to recover the vast amount of information rendered unpredictable by the chaos.</p>



<h3 class="wp-block-heading">This Investigation</h3>



<p>This blog post is the introduction to an investigation which explores these points in more detail. I will describe what chaos is, how humanity has learned to deal with chaos, and where chaos appears in things we care about &#8211; including in the human brain itself. Links to the other pages, blog posts, and report that constitute this investigation can be found below.</p>



<p>Most of the systems we care about are considerably messier than the simple examples we use to explain chaos. It is more difficult to prove claims about the inherent unpredictability of these systems, although it is still possible to make some arguments about how chaos affects them.</p>



<p>For example, I will show that individual neurons, small networks of neurons, and <em>in vivo</em> neurons in sense organs can behave chaotically.<span id='easy-footnote-11-3530' class='easy-footnote-margin-adjust'></span><span class='easy-footnote'><a href='http://aiimpacts.org/superintelligence-is-not-omniscience/#easy-footnote-bottom-11-3530' title='The evidence for this can be found in &lt;a href=&quot;https://wiki.aiimpacts.org/doku.php?id=uncategorized:ai_safety_arguments_affected_by_chaos:chaos_in_humans&quot;&gt;Chaos in Humans&lt;/a&gt;.'><sup>11</sup></a></span> Each of these can also behave non-chaotically in other circumstances. But we are more interested in the human brain as a whole. Is the brain mostly chaotic or mostly non-chaotic? Does the chaos in the brain amplify uncertainty all the way from the atomic scale to the macroscopic, or is the chain of amplifying uncertainty broken at some non-chaotic mesoscale? How does chaos in the brain actually impact human behavior? Are there some things that brains do for which chaos is essential?</p>



<p>These are hard questions to answer, and they are, at least in part, currently unsolved. They are worth investigating nevertheless. For instance, it seems likely to me that the chaos in the brain does render some important aspects of human behavior inherently unpredictable and plausible that chaotic amplification of atomic-level uncertainty is essential for some of the things humans are capable of doing.</p>



<p>This has implications for how humans might interact with a superintelligence and for how difficult it might be to build artificial general intelligence.</p>



<p>If some aspects of human behavior are inherently unpredictable, that might make it harder for a superintelligence to manipulate us. Manipulation is easier if it is possible to predict how a human will respond to anything you show or say to them. If even a superintelligence cannot predict how a human will respond in some circumstances, then it is harder for the superintelligence to hack the human and gain precise, long-term control over them.</p>



<p>So far, I have been considering the possibility that a superintelligence will exist and asking what limitations there are on its abilities.<span id='easy-footnote-12-3530' class='easy-footnote-margin-adjust'></span><span class='easy-footnote'><a href='http://aiimpacts.org/superintelligence-is-not-omniscience/#easy-footnote-bottom-12-3530' title='This possibility probably takes up too much of our thinking, even prior to these arguments.&lt;br&gt;Wulfson. &lt;em&gt;The tyranny of the god scenario. &lt;/em&gt;AI Impacts. (2018) &lt;a href=&quot;https://aiimpacts.org/the-tyranny-of-the-god-scenario/&quot;&gt;https://aiimpacts.org/the-tyranny-of-the-god-scenario/&lt;/a&gt;.'><sup>12</sup></a></span> But chaos theory might also change our estimates of the difficulty of making artificial general intelligence (AGI) that leads to superintelligence. Chaos in the brain makes whole brain emulation on a classical computer wildly more difficult &#8211; or perhaps even impossible.</p>



<p>When making a model of a brain, you want to coarse-grain it at some scale, perhaps at the scale of individual neurons. The coarse-grained model of a neuron should be much simpler than a real neuron, involving only a few variables, while still being good enough to capture the behavior relevant for the larger scale motion. If a neuron is behaving chaotically itself, especially if it is non-stationary or multistable, then no good enough coarse-grained model will exist. The neuron needs to be resolved at a finer scale, perhaps at the scale of proteins. If a protein itself amplifies smaller uncertainties, then you would have to resolve it at a finer scale, which might require a quantum mechanical calculation of atomic behavior.&nbsp;</p>



<p>Whole brain emulation provides an upper bound on the difficulty of AGI. If this upper bound ends up being farther away than you expected, then that suggests that there should be more probability mass associated with AGI being extremely hard.</p>



<h2 class="wp-block-heading">Links</h2>



<p>I will explore these arguments, and others, in the remainder of this investigation. Currently, this investigation consists of one report, two Wiki pages, and three blog posts.</p>



<p>Report:</p>



<ul class="wp-block-list">
<li><a href="http://aiimpacts.org/wp-content/uploads/2023/04/Chaos-and-Intrinsic-Unpredictability.pdf"><strong>Chaos and Intrinsic Unpredictability</strong></a>. Background reading for the investigation. An explanation of what chaos is, some other ways something can be intrinsically unpredictable, different varieties of chaos, and how humanity has learned to deal with chaos.</li>
</ul>



<p>Wiki Pages:</p>



<ul class="wp-block-list">
<li><a href="https://wiki.aiimpacts.org/doku.php?id=uncategorized:ai_safety_arguments_affected_by_chaos:chaos_in_humans"><strong>Chaos in Humans</strong></a>. Some of the most interesting things to try to predict are other humans. I discuss whether humans are chaotic, from the scale of a single neuron to society as a whole.</li>
</ul>



<ul class="wp-block-list">
<li><a href="https://wiki.aiimpacts.org/doku.php?id=uncategorized:ai_safety_arguments_affected_by_chaos"><strong>AI Safety Arguments Affected by Chaos</strong></a>. A list of the arguments I have seen within the AI safety community which our understanding of chaos might affect.</li>
</ul>



<p>Blog Posts:</p>



<ul class="wp-block-list">
<li><strong>Superintelligence Is Not Omniscience</strong>. This post.</li>
</ul>



<ul class="wp-block-list">
<li><a href="https://blog.aiimpacts.org/p/you-cant-predict-a-game-of-pinball"><strong>You Can’t Predict a Game of Pinball</strong></a>. A simple and familiar example which I describe in detail to help build intuition for the rest of the investigation.</li>
</ul>



<ul class="wp-block-list">
<li><a href="https://blog.aiimpacts.org/p/whole-bird-emulation-requires-quantum-mechanics"><strong>Whole Bird Emulation Requires Quantum Mechanics</strong></a>. A humorous discussion of one example of a quantum mechanical effect being relevant for an animal’s behavior.</li>
</ul>



<h3 class="wp-block-heading">Other Resources</h3>



<p>If you want to learn more about chaos theory in general, outside of this investigation, here are some sources that I endorse:</p>



<ul class="wp-block-list">
<li>Undergraduate Level Textbook:<br>S. Strogatz. <em>Nonlinear Dynamics And Chaos: With Applications To Physics, Biology, Chemistry, and Engineering. </em>(CRC Press, 2000).</li>
</ul>



<ul class="wp-block-list">
<li>Graduate Level Textbook:<br>P. Cvitanović, R. Artuso, R. Mainieri, G. Tanner and G. Vattay, <em>Chaos: Classical and Quantum. </em><a href="https://chaosbook.org/">ChaosBook.org</a>. (Niels Bohr Institute, Copenhagen 2020).</li>
</ul>



<ul class="wp-block-list">
<li><a href="https://en.wikipedia.org/wiki/Chaos_theory">Wikipedia</a> has a good introductory article on chaos. <a href="http://www.scholarpedia.org/article/Category:Chaos">Scholarpedia</a> also has multiple good articles, although no one obvious place to start.</li>
</ul>



<ul class="wp-block-list">
<li><a href="https://thechaostician.com/what-is-chaos-part-i-introduction/">What is Chaos?</a> sequence of blog posts by The Chaostician.</li>
</ul>



<hr class="wp-block-separator has-alpha-channel-opacity"/>



<h2 class="wp-block-heading">Notes</h2>
]]></content:encoded>
					
					<wfw:commentRss>http://aiimpacts.org/superintelligence-is-not-omniscience/feed/</wfw:commentRss>
			<slash:comments>3</slash:comments>
		
		
			</item>
		<item>
		<title>A policy guaranteed to increase AI timelines</title>
		<link>http://aiimpacts.org/a-policy-guaranteed-to-increase-ai-timelines/</link>
		
		<dc:creator><![CDATA[richardkorzekwa]]></dc:creator>
		<pubDate>Sat, 01 Apr 2023 20:41:43 +0000</pubDate>
				<category><![CDATA[Blog]]></category>
		<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[blog]]></category>
		<guid isPermaLink="false">https://aiimpacts.org/?p=3482</guid>

					<description><![CDATA[A redefinition of the second is a foolproof way to increase the number of years between nearly any two events. <a class="mh-excerpt-more" href="http://aiimpacts.org/a-policy-guaranteed-to-increase-ai-timelines/" title="A policy guaranteed to increase AI timelines"></a>]]></description>
										<content:encoded><![CDATA[
<p><em>Rick Korzekwa, April 1, 2023</em></p>



<p>The number of years until the creation of powerful AI is a major input to our thinking about risk from AI and which approaches are most promising for mitigating that risk. While there are downsides to transformative AI arriving many years from now, rather than few years from now, most people seem to agree that it is safer for AI to arrive in 2060 than in 2030. Given this, there is a lot of discussion about what we can do to increase the number of years until we see such powerful systems. While existing proposals have their merits, none of them can ensure that AI will arrive later than 2030, much less 2060.</p>



<p>There is a policy that is guaranteed to increase the number of years between now and the arrival of transformative AI. The General Conference on Weights and Measures defines one second to be 9,192,631,770 cycles of the optical radiation emitted during a hyperfine transition in the ground state of a cesium 133 atom. Redefining the second to instead be 919,263,177 cycles of this radiation will increase the number of years between now and transformative AI by a factor of ten. The reason this policy works is the same reason that defining a time standard works&#8211;the microscopic behavior of atoms and photons is ultimately governed by the same physical laws as everything else, including computers, AI labs, and financial markets, and those laws are unaffected by our time standards. Thus fewer cycles of cesium radiation per year implies proportionately fewer other things happening per year.</p>



<p>Making such a change might not sound politically tractable, but there is already precedent for making radical changes to the definition of a second. Previously it was defined in terms of Earth&#8217;s solar orbit, and before that in terms of Earth&#8217;s rotation. These physical processes and their implementations as time standards bear little resemblance to the present-day quantum mechanical standard. In contrast, a change that preserves nearly the entire standard, including all significant figures in the relevant numerical definition, is straightforward.</p>



<p>One possible objection to this policy is that our time standards are not entirely causally disconnected from the rest of the world. For example, redefining the time standard might create a sense of urgency among AI labs and the people investing in them. It&#8217;s not hard to imagine that the leaders and researchers within companies advancing the state of the art in AI might increase their efforts after noticing it is taking ten times as long to generate the same amount of research. While this is a reasonable concern, it seems unlikely that AI labs can increase their rate of progress by a full order of magnitude. Why would they currently be leaving so much on the table if they were? Futhermore, there are similar effects that might push in the other direction. Once politicians and executives realize they will live to be hundreds of years old, they may take risks to the longterm future more seriously.</p>



<p>Still, it does seem that the policy might have undesirable side effects. Changing all of our textbooks, clocks, software, calendars, and habits is costly. One solution to this challenge is to change the standard either in secret or in a way that allows most people to continue using the old &#8220;unofficial&#8221; standard. After all, what matters is the actual number of years required to create AI, not the number of years as measured by some deprecated standard.</p>



<p>In conclusion, while there are many policies for increasing the number of years before the arrival of advanced artificial intelligence, until now, none of them has guaranteed a large increase in this number. This policy, if implemented promptly and thoughtfully, is essentially guaranteed to cause a large increase the number of years before we see systems capable of posing a serious risk to humanity.</p>
]]></content:encoded>
					
		
		
			</item>
	</channel>
</rss>
