<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Daniel Lemire&#039;s blog</title>
	<atom:link href="https://lemire.me/blog/feed/" rel="self" type="application/rss+xml" />
	<link>https://lemire.me/blog</link>
	<description>Daniel Lemire is a software performance expert. He ranks among the top 2% of scientists globally (Stanford/Elsevier 2025) and is one of GitHub&#039;s top 1000 most followed developers. </description>
	<lastBuildDate>Sun, 12 Apr 2026 21:35:11 +0000</lastBuildDate>
	<language>en-US</language>
	<sy:updatePeriod>
	hourly	</sy:updatePeriod>
	<sy:updateFrequency>
	1	</sy:updateFrequency>
	<generator>https://wordpress.org/?v=6.9.4</generator>

<image>
	<url>https://lemire.me/blog/wp-content/uploads/2015/10/profile2011_152-150x150.jpg</url>
	<title>Daniel Lemire&#039;s blog</title>
	<link>https://lemire.me/blog</link>
	<width>32</width>
	<height>32</height>
</image> 
	<item>
		<title>A brief history of C/C++ programming languages</title>
		<link>https://lemire.me/blog/2026/04/09/a-brief-history-of-c-c-programming-languages/</link>
					<comments>https://lemire.me/blog/2026/04/09/a-brief-history-of-c-c-programming-languages/#comments</comments>
		
		<dc:creator><![CDATA[Daniel Lemire]]></dc:creator>
		<pubDate>Thu, 09 Apr 2026 14:58:53 +0000</pubDate>
				<category><![CDATA[]]></category>
		<guid isPermaLink="false">https://lemire.me/blog/?p=22574</guid>

					<description><![CDATA[<img width="150" height="150" src="https://lemire.me/blog/wp-content/uploads/2026/04/Capture-decran-le-2026-04-09-a-10.54.35-150x150.png" class="webfeedsFeaturedVisual wp-post-image" alt="" style="display: block; margin-bottom: 5px; clear:both;max-width: 100%;" link_thumbnail="" decoding="async" />Initially, we had languages like Fortran (1957), Pascal (1970), and C (1972). Fortran was designed for number crunching and scientific computing. Pascal was restrictive with respect to low-level access (it was deliberately &#8220;safe&#8221;, as meant for teaching structured programming). So C won out as a language that allowed low-level/unsafe programming (pointer arithmetic, direct memory access) &#8230; <a href="https://lemire.me/blog/2026/04/09/a-brief-history-of-c-c-programming-languages/" class="more-link">Continue reading <span class="screen-reader-text">A brief history of C/C++ programming languages</span></a>]]></description>
										<content:encoded><![CDATA[<img width="150" height="150" src="https://lemire.me/blog/wp-content/uploads/2026/04/Capture-decran-le-2026-04-09-a-10.54.35-150x150.png" class="webfeedsFeaturedVisual wp-post-image" alt="" style="display: block; margin-bottom: 5px; clear:both;max-width: 100%;" link_thumbnail="" decoding="async" /><div data-rbd-draggable-context-id="2" data-rbd-draggable-id="867nr">
<div class="longform-unstyled" data-block="true" data-editor="fhuhe" data-offset-key="867nr-0-0">
<div class="public-DraftStyleDefault-block public-DraftStyleDefault-ltr" data-offset-key="867nr-0-0"><span data-offset-key="867nr-0-0">Initially, we had languages like Fortran (1957), Pascal (1970), and C (1972). Fortran was designed for number crunching and scientific computing. Pascal was restrictive with respect to low-level access (it was deliberately &#8220;safe&#8221;, as meant for teaching structured programming). So C won out as a language that allowed low-level/unsafe programming (pointer arithmetic, direct memory access) while remaining general-purpose enough for systems work like Unix. To be fair, Pascal had descendants that are still around, but C clearly dominated.</span></div>
</div>
</div>
<div data-rbd-draggable-context-id="2" data-rbd-draggable-id="789jh">
<div class="longform-unstyled" data-block="true" data-editor="fhuhe" data-offset-key="789jh-0-0">
<div class="public-DraftStyleDefault-block public-DraftStyleDefault-ltr" data-offset-key="789jh-0-0"><span data-offset-key="789jh-0-0">Object-oriented programming became viewed as the future in the 1980s and 1990s. It turned into some kind of sect.</span></div>
</div>
</div>
<div data-rbd-draggable-context-id="2" data-rbd-draggable-id="coano">
<div class="longform-unstyled" data-block="true" data-editor="fhuhe" data-offset-key="coano-0-0">
<div class="public-DraftStyleDefault-block public-DraftStyleDefault-ltr" data-offset-key="coano-0-0"><span data-offset-key="coano-0-0">But C was not object-oriented.</span></div>
</div>
</div>
<div data-rbd-draggable-context-id="2" data-rbd-draggable-id="b8gkl">
<div class="longform-unstyled" data-block="true" data-editor="fhuhe" data-offset-key="b8gkl-0-0">
<div class="public-DraftStyleDefault-block public-DraftStyleDefault-ltr" data-offset-key="b8gkl-0-0"><span data-offset-key="b8gkl-0-0">So we got C++, which began as &#8220;C with Classes&#8221;. C++ had templates, enabling generic programming and compile-time metaprogramming. This part of the language makes C++ quite powerful, but somewhat difficult to master (with crazy error messages).</span></div>
</div>
</div>
<div data-rbd-draggable-context-id="2" data-rbd-draggable-id="35kf6">
<div class="longform-unstyled" data-block="true" data-editor="fhuhe" data-offset-key="35kf6-0-0">
<div class="public-DraftStyleDefault-block public-DraftStyleDefault-ltr" data-offset-key="35kf6-0-0"><span data-offset-key="35kf6-0-0">Both C and C++ became wildly successful, but writing portable applications remained difficult — you often had to target Windows or a specific Unix variant. This was a problem for a company like Sun Microsystems that sold Unix boxes and wanted to compete against the juggernaut that Microsoft was becoming.</span></div>
</div>
</div>
<div data-rbd-draggable-context-id="2" data-rbd-draggable-id="4te26">
<div class="longform-unstyled" data-block="true" data-editor="fhuhe" data-offset-key="4te26-0-0">
<div class="public-DraftStyleDefault-block public-DraftStyleDefault-ltr" data-offset-key="4te26-0-0"><span data-offset-key="4te26-0-0">So Java came along in 1995. It was positioned as a safe, portable alternative to C++: it eliminated raw pointer arithmetic, added mandatory garbage collection, array bounds checking everywhere, and ran on a virtual machine (JVM) with just-in-time compilation for performance.</span></div>
</div>
</div>
<div data-rbd-draggable-context-id="2" data-rbd-draggable-id="3u5f8">
<div class="longform-unstyled" data-block="true" data-editor="fhuhe" data-offset-key="3u5f8-0-0">
<div class="public-DraftStyleDefault-block public-DraftStyleDefault-ltr" data-offset-key="3u5f8-0-0"><span data-offset-key="3u5f8-0-0">The &#8220;write once, run anywhere&#8221; promise addressed C/C++ portability pain points directly. To this day, Java remains a strong solution for writing portable enterprise and server-side code.</span></div>
</div>
</div>
<div data-rbd-draggable-context-id="2" data-rbd-draggable-id="cmrnd">
<div class="longform-unstyled" data-block="true" data-editor="fhuhe" data-offset-key="cmrnd-0-0">
<div class="public-DraftStyleDefault-block public-DraftStyleDefault-ltr" data-offset-key="cmrnd-0-0"><span data-offset-key="cmrnd-0-0">We also got JavaScript in 1995. Despite the name, it has almost nothing in common with Java semantically. It is best viewed as separate from the C/C++ branch. Python is similarly quite different.</span></div>
</div>
</div>
<div data-rbd-draggable-context-id="2" data-rbd-draggable-id="2kbg">
<div class="longform-unstyled" data-block="true" data-editor="fhuhe" data-offset-key="2kbg-0-0">
<div class="public-DraftStyleDefault-block public-DraftStyleDefault-ltr" data-offset-key="2kbg-0-0"><span data-offset-key="2kbg-0-0">Microsoft would eventually come up with C# in 2000. It belongs to the same C-family syntax tradition as C++ and Java, but with support for ahead-of-time compilation in modern .NET. It also allows guarded pointer access within explicitly marked unsafe scopes. At this point, C# can be seen as &#8220;C++ with garbage collection&#8221; in spirit. It even competes against C++ in the game industry thanks to Unity.</span></div>
</div>
</div>
<div data-rbd-draggable-context-id="2" data-rbd-draggable-id="3lqi1">
<div class="longform-unstyled" data-block="true" data-editor="fhuhe" data-offset-key="3lqi1-0-0">
<div class="public-DraftStyleDefault-block public-DraftStyleDefault-ltr" data-offset-key="3lqi1-0-0"><span data-offset-key="3lqi1-0-0">Google came up with Go. It is much like a simpler, modern C: garbage-collected, with built-in bounds checking on slices/arrays, and pointers allowed but without arbitrary arithmetic in safe code (the unsafe package exists for low-level needs).</span></div>
</div>
</div>
<div data-rbd-draggable-context-id="2" data-rbd-draggable-id="79ppr">
<div class="longform-unstyled" data-block="true" data-editor="fhuhe" data-offset-key="79ppr-0-0">
<div class="public-DraftStyleDefault-block public-DraftStyleDefault-ltr" data-offset-key="79ppr-0-0"><span data-offset-key="79ppr-0-0">Later, Apple came up with Swift. It has C++-like performance and syntax goals but adds modern safety features (bounds checking by default, integer overflow panics in debug mode) and uses Automatic Reference Counting (ARC) for memory management. Swift replaced Objective-C but I still view it as a C++ successor.</span></div>
</div>
</div>
<div data-rbd-draggable-context-id="2" data-rbd-draggable-id="5b6nn">
<div class="longform-unstyled" data-block="true" data-editor="fhuhe" data-offset-key="5b6nn-0-0">
<div class="public-DraftStyleDefault-block public-DraftStyleDefault-ltr" data-offset-key="5b6nn-0-0"><span data-offset-key="5b6nn-0-0">At about the same time, we got Rust. Like Swift, it drops the generational garbage collection from Java, C# and Go. It relies instead on compile-time ownership and borrowing rules, with the tradeoff that you can leak memory with reference cycles. We also got Zig  which makes memory usage fully explicit.</span></div>
</div>
</div>
<div data-rbd-draggable-context-id="2" data-rbd-draggable-id="dhqmp">
<div class="longform-unstyled" data-block="true" data-editor="fhuhe" data-offset-key="dhqmp-0-0">
<div class="public-DraftStyleDefault-block public-DraftStyleDefault-ltr" data-offset-key="dhqmp-0-0"><span data-offset-key="dhqmp-0-0">I think that it is fairer to describe Rust and Zig as descendants of C rather than C++. Both are much more powerful than C, of course&#8230; and the evolution of programming languages is complex. Still. They are C-like programming languages.</span></div>
</div>
</div>
<div data-rbd-draggable-context-id="2" data-rbd-draggable-id="34ia5">
<div class="longform-unstyled" data-block="true" data-editor="fhuhe" data-offset-key="34ia5-0-0">
<div class="public-DraftStyleDefault-block public-DraftStyleDefault-ltr" data-offset-key="34ia5-0-0"><span data-offset-key="34ia5-0-0">To this day, in much of the industry, the dominant programming languages for performance-critical, systems, enterprise, and infrastructure work remain C, C++, Java, and C#. By the Lindy effect (the longer something has survived, the longer it is likely to continue surviving), these languages, especially C, now over 50 years old, are still going to be around for a long time.</span></div>
</div>
</div>
]]></content:encoded>
					
					<wfw:commentRss>https://lemire.me/blog/2026/04/09/a-brief-history-of-c-c-programming-languages/feed/</wfw:commentRss>
			<slash:comments>12</slash:comments>
		
		
			</item>
		<item>
		<title>Can your AI rewrite your code in assembly?</title>
		<link>https://lemire.me/blog/2026/04/05/can-your-ai-rewrite-your-code-in-assembly/</link>
					<comments>https://lemire.me/blog/2026/04/05/can-your-ai-rewrite-your-code-in-assembly/#comments</comments>
		
		<dc:creator><![CDATA[Daniel Lemire]]></dc:creator>
		<pubDate>Sun, 05 Apr 2026 21:16:14 +0000</pubDate>
				<category><![CDATA[]]></category>
		<guid isPermaLink="false">https://lemire.me/blog/?p=22568</guid>

					<description><![CDATA[<img width="150" height="150" src="https://lemire.me/blog/wp-content/uploads/2026/04/Capture-decran-le-2026-04-05-a-17.13.58-150x150.png" class="webfeedsFeaturedVisual wp-post-image" alt="" style="display: block; margin-bottom: 5px; clear:both;max-width: 100%;" link_thumbnail="" decoding="async" />Suppose you have several strings and you want to count the number of instances of the character ! in your strings. In C++, you might solve the problem as follows if you are an old-school programmer. size_t c = 0; for (const auto &#38;str : strings) { c += std::count(str.begin(), str.end(), '!'); } You can &#8230; <a href="https://lemire.me/blog/2026/04/05/can-your-ai-rewrite-your-code-in-assembly/" class="more-link">Continue reading <span class="screen-reader-text">Can your AI rewrite your code in assembly?</span></a>]]></description>
										<content:encoded><![CDATA[<img width="150" height="150" src="https://lemire.me/blog/wp-content/uploads/2026/04/Capture-decran-le-2026-04-05-a-17.13.58-150x150.png" class="webfeedsFeaturedVisual wp-post-image" alt="" style="display: block; margin-bottom: 5px; clear:both;max-width: 100%;" link_thumbnail="" decoding="async" loading="lazy" /><p>Suppose you have several strings and you want to count the number of instances of the character <code>!</code> in your strings. In C++, you might solve the problem as follows if you are an old-school programmer.</p>
<div class="highlight" style="background: #f8f8f8;">
<pre style="line-height: 125%;"><code><span style="color: #204a87; font-weight: bold;">size_t</span> <span style="color: #000;">c</span> <span style="color: #ce5c00; font-weight: bold;">=</span> <span style="color: #0000cf; font-weight: bold;">0</span><span style="color: #000; font-weight: bold;">;</span>
<span style="color: #204a87; font-weight: bold;">for</span> <span style="color: #000; font-weight: bold;">(</span><span style="color: #204a87; font-weight: bold;">const</span> <span style="color: #204a87; font-weight: bold;">auto</span> <span style="color: #ce5c00; font-weight: bold;">&amp;</span><span style="color: #000;">str</span> <span style="color: #ce5c00; font-weight: bold;">:</span> <span style="color: #000;">strings</span><span style="color: #000; font-weight: bold;">)</span> <span style="color: #000; font-weight: bold;">{</span>
    <span style="color: #000;">c</span> <span style="color: #ce5c00; font-weight: bold;">+=</span> <span style="color: #000;">std</span><span style="color: #ce5c00; font-weight: bold;">::</span><span style="color: #000;">count</span><span style="color: #000; font-weight: bold;">(</span><span style="color: #000;">str</span><span style="color: #000; font-weight: bold;">.</span><span style="color: #000;">begin</span><span style="color: #000; font-weight: bold;">(),</span> <span style="color: #000;">str</span><span style="color: #000; font-weight: bold;">.</span><span style="color: #000;">end</span><span style="color: #000; font-weight: bold;">(),</span> <span style="color: #4e9a06;">'!'</span><span style="color: #000; font-weight: bold;">);</span>
<span style="color: #000; font-weight: bold;">}</span>
</code></pre>
</div>
<p>You can also get fancier with ranges.</p>
<div class="highlight" style="background: #f8f8f8;">
<pre style="line-height: 125%;"><code><span style="color: #204a87; font-weight: bold;">for</span> <span style="color: #000; font-weight: bold;">(</span><span style="color: #204a87; font-weight: bold;">const</span> <span style="color: #204a87; font-weight: bold;">auto</span> <span style="color: #ce5c00; font-weight: bold;">&amp;</span><span style="color: #000;">str</span> <span style="color: #ce5c00; font-weight: bold;">:</span> <span style="color: #000;">strings</span><span style="color: #000; font-weight: bold;">)</span> <span style="color: #000; font-weight: bold;">{</span>
    <span style="color: #000;">c</span> <span style="color: #ce5c00; font-weight: bold;">+=</span> <span style="color: #000;">std</span><span style="color: #ce5c00; font-weight: bold;">::</span><span style="color: #000;">ranges</span><span style="color: #ce5c00; font-weight: bold;">::</span><span style="color: #000;">count</span><span style="color: #000; font-weight: bold;">(</span><span style="color: #000;">str</span><span style="color: #000; font-weight: bold;">,</span> <span style="color: #4e9a06;">'!'</span><span style="color: #000; font-weight: bold;">);</span>
<span style="color: #000; font-weight: bold;">}</span>
</code></pre>
</div>
<p>And so forth.</p>
<p>But what if you want to go faster? Maybe you&#8217;d want to rewrite this function in assembly. I decided to do so, and to have fun using both Grok and Claude as my AIs, setting up a friendly competition.</p>
<p>I started with my function and then I asked AIs to optimize it in assembly. Importantly, they knew which machine I was on, so they started to write ARM assembly.</p>
<p>By repeated prompting, I got the following functions.</p>
<ul>
<li><code>count_classic</code>: Uses C++ standard library <code>std::count</code> for reference.</li>
<li><code>count_assembly</code>: A basic ARM64 assembly loop (byte-by-byte comparison). Written by Grok.</li>
<li><code>count_assembly_claude</code>: Claude&#8217;s SIMD-optimized version using NEON instructions (16-byte chunks).</li>
<li><code>count_assembly_grok</code>: Grok&#8217;s optimized version (32-byte chunks).</li>
<li><code>count_assembly_claude_2</code>: Claude&#8217;s further optimized version (64-byte chunks with multiple accumulators).</li>
<li><code>count_assembly_grok_2</code>: Grok&#8217;s latest version (64-byte chunks with improved accumulator handling).</li>
<li><code>count_assembly_claude_3</code>: Claude&#8217;s most advanced version with additional optimizations.</li>
</ul>
<p>You get the idea.</p>
<p>So, how is the performance? I use random strings of up to 1 kilobyte. In all cases, I test that the functions provide the correct count. I did not closely examine the code, so it is possible that mistakes could be hiding in the code.</p>
<p>I record the average number of instructions per string.</p>
<table>
<thead>
<tr>
<th>name</th>
<th>instructions/string</th>
</tr>
</thead>
<tbody>
<tr>
<td>classic C++</td>
<td>1200</td>
</tr>
<tr>
<td>claude assembly</td>
<td>250</td>
</tr>
<tr>
<td>grok assembly</td>
<td>204</td>
</tr>
<tr>
<td>claude assembly 2</td>
<td>183</td>
</tr>
<tr>
<td>grok assembly 2</td>
<td>176</td>
</tr>
<tr>
<td>claude assembly 3</td>
<td>154</td>
</tr>
</tbody>
</table>
<p>By repeated optimization, I reduced the number of instructions by a factor of eight. The running time decreases similarly.</p>
<p>Can we get the AIs to rewrite the best option in C? Yes, although you need SIMD intrinsics. So there is no benefit to leaving the code in assembly in this instance.</p>
<p>An open question is whether the AIs could find optimizations that are not possible if we use a higher-level language like C or C++. It is an intriguing question that I will seek to answer later. For the time being, the AIs can beat my C++ compiler!</p>
<p><a href="https://github.com/lemire/Code-used-on-Daniel-Lemire-s-blog/blob/master/2026/04/02/benchmark/benchmarks/benchmark.cpp">My source code is available</a>.</p>
]]></content:encoded>
					
					<wfw:commentRss>https://lemire.me/blog/2026/04/05/can-your-ai-rewrite-your-code-in-assembly/feed/</wfw:commentRss>
			<slash:comments>19</slash:comments>
		
		
			</item>
		<item>
		<title>A Fast Immutable Map in Go</title>
		<link>https://lemire.me/blog/2026/03/29/a-fast-immutable-map-in-go/</link>
					<comments>https://lemire.me/blog/2026/03/29/a-fast-immutable-map-in-go/#comments</comments>
		
		<dc:creator><![CDATA[Daniel Lemire]]></dc:creator>
		<pubDate>Sun, 29 Mar 2026 18:18:01 +0000</pubDate>
				<category><![CDATA[]]></category>
		<guid isPermaLink="false">https://lemire.me/blog/?p=22564</guid>

					<description><![CDATA[<img width="150" height="150" src="https://lemire.me/blog/wp-content/uploads/2026/03/Capture-decran-le-2026-03-29-a-14.16.33-150x150.png" class="webfeedsFeaturedVisual wp-post-image" alt="" style="display: block; margin-bottom: 5px; clear:both;max-width: 100%;" link_thumbnail="" decoding="async" loading="lazy" />Consider the following problem. You have a large set of strings, maybe millions. You need to map these strings to 8-byte integers (uint64). These integers are given to you. If you are working in Go, the standard solution is to create a map. The construction is trivial, something like the following loop. m := make(map[string]uint64, &#8230; <a href="https://lemire.me/blog/2026/03/29/a-fast-immutable-map-in-go/" class="more-link">Continue reading <span class="screen-reader-text">A Fast Immutable Map in Go</span></a>]]></description>
										<content:encoded><![CDATA[<img width="150" height="150" src="https://lemire.me/blog/wp-content/uploads/2026/03/Capture-decran-le-2026-03-29-a-14.16.33-150x150.png" class="webfeedsFeaturedVisual wp-post-image" alt="" style="display: block; margin-bottom: 5px; clear:both;max-width: 100%;" link_thumbnail="" decoding="async" loading="lazy" /><p>Consider the following problem. You have a large set of strings, maybe millions. You need to map these strings to 8-byte integers (<code>uint64</code>). These integers are given to you.</p>
<p>If you are working in Go, the standard solution is to create a map. The construction is trivial, something like the following loop.</p>
<div class="highlight" style="background: #f8f8f8;">
<pre style="line-height: 125%;"><code><span style="color: #000;">m</span> <span style="color: #ce5c00; font-weight: bold;">:=</span> <span style="color: #204a87;">make</span><span style="color: #000; font-weight: bold;">(</span><span style="color: #204a87; font-weight: bold;">map</span><span style="color: #000; font-weight: bold;">[</span><span style="color: #204a87; font-weight: bold;">string</span><span style="color: #000; font-weight: bold;">]</span><span style="color: #204a87; font-weight: bold;">uint64</span><span style="color: #000; font-weight: bold;">,</span> <span style="color: #000;">N</span><span style="color: #000; font-weight: bold;">)</span>
<span style="color: #204a87; font-weight: bold;">for</span> <span style="color: #000;">i</span><span style="color: #000; font-weight: bold;">,</span> <span style="color: #000;">k</span> <span style="color: #ce5c00; font-weight: bold;">:=</span> <span style="color: #204a87; font-weight: bold;">range</span> <span style="color: #000;">keys</span> <span style="color: #000; font-weight: bold;">{</span>
    <span style="color: #000;">m</span><span style="color: #000; font-weight: bold;">[</span><span style="color: #000;">k</span><span style="color: #000; font-weight: bold;">]</span> <span style="color: #000; font-weight: bold;">=</span> <span style="color: #000;">values</span><span style="color: #000; font-weight: bold;">[</span><span style="color: #000;">i</span><span style="color: #000; font-weight: bold;">]</span>
<span style="color: #000; font-weight: bold;">}</span>
</code></pre>
</div>
<p>One downside is that the map may use over 50 bytes per entry.</p>
<p>In important scenarios, we might have the following conditions. The map is large (a million of entries or more), you do not need to modify it dynamically (it is immutable), and all queried keys are in the set. In such conditions, you can reduce the memory usage down to almost the size of the keys, so about 8 bytes per entry. One fast technique is the <a href="https://arxiv.org/abs/2201.01174">binary fuse filters</a>.</p>
<p>I implemented it as a Go library called <a href="https://github.com/lemire/constmap">constmap</a> that provides an immutable map from strings to <code>uint64</code> values using binary fuse filters. This data structure is ideal when you have a fixed set of keys at construction time and need fast, memory-efficient lookups afterward. You can even construct the map once, save it to disk so you do not pay the cost of constructing the map each time you need it.</p>
<p>The usage is just as simple.</p>
<div class="highlight" style="background: #f8f8f8;">
<pre style="line-height: 125%;"><code><span style="color: #204a87; font-weight: bold;">package</span> <span style="color: #000;">main</span>

<span style="color: #204a87; font-weight: bold;">import</span> <span style="color: #000; font-weight: bold;">(</span>
    <span style="color: #4e9a06;">"fmt"</span>
    <span style="color: #4e9a06;">"log"</span>

    <span style="color: #4e9a06;">"github.com/lemire/constmap"</span>
<span style="color: #000; font-weight: bold;">)</span>

<span style="color: #204a87; font-weight: bold;">func</span> <span style="color: #000;">main</span><span style="color: #000; font-weight: bold;">()</span> <span style="color: #000; font-weight: bold;">{</span>
    <span style="color: #000;">keys</span> <span style="color: #ce5c00; font-weight: bold;">:=</span> <span style="color: #000; font-weight: bold;">[]</span><span style="color: #204a87; font-weight: bold;">string</span><span style="color: #000; font-weight: bold;">{</span><span style="color: #4e9a06;">"apple"</span><span style="color: #000; font-weight: bold;">,</span> <span style="color: #4e9a06;">"banana"</span><span style="color: #000; font-weight: bold;">,</span> <span style="color: #4e9a06;">"cherry"</span><span style="color: #000; font-weight: bold;">}</span>
    <span style="color: #000;">values</span> <span style="color: #ce5c00; font-weight: bold;">:=</span> <span style="color: #000; font-weight: bold;">[]</span><span style="color: #204a87; font-weight: bold;">uint64</span><span style="color: #000; font-weight: bold;">{</span><span style="color: #0000cf; font-weight: bold;">100</span><span style="color: #000; font-weight: bold;">,</span> <span style="color: #0000cf; font-weight: bold;">200</span><span style="color: #000; font-weight: bold;">,</span> <span style="color: #0000cf; font-weight: bold;">300</span><span style="color: #000; font-weight: bold;">}</span>

    <span style="color: #000;">cm</span><span style="color: #000; font-weight: bold;">,</span> <span style="color: #000;">err</span> <span style="color: #ce5c00; font-weight: bold;">:=</span> <span style="color: #000;">constmap</span><span style="color: #000; font-weight: bold;">.</span><span style="color: #000;">New</span><span style="color: #000; font-weight: bold;">(</span><span style="color: #000;">keys</span><span style="color: #000; font-weight: bold;">,</span> <span style="color: #000;">values</span><span style="color: #000; font-weight: bold;">)</span>
    <span style="color: #204a87; font-weight: bold;">if</span> <span style="color: #000;">err</span> <span style="color: #ce5c00; font-weight: bold;">!=</span> <span style="color: #204a87; font-weight: bold;">nil</span> <span style="color: #000; font-weight: bold;">{</span>
        <span style="color: #000;">log</span><span style="color: #000; font-weight: bold;">.</span><span style="color: #000;">Fatal</span><span style="color: #000; font-weight: bold;">(</span><span style="color: #000;">err</span><span style="color: #000; font-weight: bold;">)</span>
    <span style="color: #000; font-weight: bold;">}</span>

    <span style="color: #000;">fmt</span><span style="color: #000; font-weight: bold;">.</span><span style="color: #000;">Println</span><span style="color: #000; font-weight: bold;">(</span><span style="color: #000;">cm</span><span style="color: #000; font-weight: bold;">.</span><span style="color: #000;">Map</span><span style="color: #000; font-weight: bold;">(</span><span style="color: #4e9a06;">"banana"</span><span style="color: #000; font-weight: bold;">))</span> <span style="color: #8f5902; font-style: italic;">// 200</span>
<span style="color: #000; font-weight: bold;">}</span>
</code></pre>
</div>
<p>The construction time is higher (as expected for any compact data structure), but lookups are optimized for speed. I ran benchmarks on my Apple M4 Max processor to compare constmap lookups against Go&#8217;s built-in <code>map[string]uint64</code>. The test uses 1 million keys.</p>
<table>
<thead>
<tr>
<th>Data Structure</th>
<th>Lookup Time</th>
<th>Memory Usage</th>
</tr>
</thead>
<tbody>
<tr>
<td>ConstMap</td>
<td>7.4 ns/op</td>
<td>9 bytes/key</td>
</tr>
<tr>
<td>Go Map</td>
<td>20 ns/op</td>
<td>56 bytes/key</td>
</tr>
</tbody>
</table>
<p>ConstMap is nearly 3 times faster than Go&#8217;s standard map for lookups! And we reduced the memory usage by a factor of 6.</p>
<p>The ConstMap may not always be faster, but it should always use significantly less memory. If it can reside in CPU cache while the map cannot, then it will be significantly faster.</p>
<p><strong>Source Code</strong> The implementation is available on GitHub: <a href="https://github.com/lemire/constmap">github.com/lemire/constmap</a>.</p>
]]></content:encoded>
					
					<wfw:commentRss>https://lemire.me/blog/2026/03/29/a-fast-immutable-map-in-go/feed/</wfw:commentRss>
			<slash:comments>15</slash:comments>
		
		
			</item>
		<item>
		<title>JSON and C++26 compile-time reflection: a talk</title>
		<link>https://lemire.me/blog/2026/03/26/json-and-c26-compile-time-reflection-a-talk/</link>
					<comments>https://lemire.me/blog/2026/03/26/json-and-c26-compile-time-reflection-a-talk/#respond</comments>
		
		<dc:creator><![CDATA[Daniel Lemire]]></dc:creator>
		<pubDate>Thu, 26 Mar 2026 00:29:38 +0000</pubDate>
				<category><![CDATA[]]></category>
		<guid isPermaLink="false">https://lemire.me/blog/?p=22553</guid>

					<description><![CDATA[<img width="150" height="150" src="https://lemire.me/blog/wp-content/uploads/2026/03/Capture-decran-le-2026-03-25-a-20.36.07-150x150.png" class="webfeedsFeaturedVisual wp-post-image" alt="" style="display: block; margin-bottom: 5px; clear:both;max-width: 100%;" link_thumbnail="" decoding="async" loading="lazy" />The next C++ standard (C++26) is getting exciting new features. One of these features is compile-time reflection. It is ideally suited to serialize and deserialize data at high speed. To test it out, we extended our fast JSON library (simdjson) and we gave a talk at CppCon 2025. The video is out on YouTube. Our &#8230; <a href="https://lemire.me/blog/2026/03/26/json-and-c26-compile-time-reflection-a-talk/" class="more-link">Continue reading <span class="screen-reader-text">JSON and C++26 compile-time reflection: a talk</span></a>]]></description>
										<content:encoded><![CDATA[<img width="150" height="150" src="https://lemire.me/blog/wp-content/uploads/2026/03/Capture-decran-le-2026-03-25-a-20.36.07-150x150.png" class="webfeedsFeaturedVisual wp-post-image" alt="" style="display: block; margin-bottom: 5px; clear:both;max-width: 100%;" link_thumbnail="" decoding="async" loading="lazy" /><p>The next C++ standard (C++26) is getting exciting new features. <a href="https://lemire.me/blog/2025/06/22/c26-will-include-compile-time-reflection-why-should-you-care/">One of these features is compile-time reflection</a>. It is ideally suited to serialize and deserialize data at high speed. To test it out, we extended our fast JSON library (<a href="https://github.com/simdjson/simdjson">simdjson</a>) and we gave a talk at CppCon 2025. The video is out on YouTube.</p>
<p><iframe loading="lazy" title="YouTube video player" src="https://www.youtube.com/embed/Mcgk3CxHYMs?si=NylINo46stKT8-SD&amp;start=26" width="560" height="315" frameborder="0" allowfullscreen="allowfullscreen"></iframe></p>
<p><a href="https://simdjson.github.io/simdjson_talks/cppcon2025/cppcon_2025_slides.html">Our slides are also available</a>.</p>
]]></content:encoded>
					
					<wfw:commentRss>https://lemire.me/blog/2026/03/26/json-and-c26-compile-time-reflection-a-talk/feed/</wfw:commentRss>
			<slash:comments>0</slash:comments>
		
		
			</item>
		<item>
		<title>How many branches can your CPU predict?</title>
		<link>https://lemire.me/blog/2026/03/18/how-many-branches-can-your-cpu-predict/</link>
					<comments>https://lemire.me/blog/2026/03/18/how-many-branches-can-your-cpu-predict/#comments</comments>
		
		<dc:creator><![CDATA[Daniel Lemire]]></dc:creator>
		<pubDate>Wed, 18 Mar 2026 21:52:53 +0000</pubDate>
				<category><![CDATA[]]></category>
		<guid isPermaLink="false">https://lemire.me/blog/?p=22543</guid>

					<description><![CDATA[<img width="150" height="150" src="https://lemire.me/blog/wp-content/uploads/2026/03/Capture-decran-le-2026-03-18-a-17.52.22-150x150.png" class="webfeedsFeaturedVisual wp-post-image" alt="" style="display: block; margin-bottom: 5px; clear:both;max-width: 100%;" link_thumbnail="" decoding="async" loading="lazy" />Modern processors have the ability to execute many instructions per cycle, on a single core. To be able to execute many instructions per cycle in practice, processors predict branches. I have made the point over the years that modern CPUs have an incredible ability to predict branches. It makes benchmarking difficult because if you test &#8230; <a href="https://lemire.me/blog/2026/03/18/how-many-branches-can-your-cpu-predict/" class="more-link">Continue reading <span class="screen-reader-text">How many branches can your CPU predict?</span></a>]]></description>
										<content:encoded><![CDATA[<img width="150" height="150" src="https://lemire.me/blog/wp-content/uploads/2026/03/Capture-decran-le-2026-03-18-a-17.52.22-150x150.png" class="webfeedsFeaturedVisual wp-post-image" alt="" style="display: block; margin-bottom: 5px; clear:both;max-width: 100%;" link_thumbnail="" decoding="async" loading="lazy" /><p>Modern processors have the ability to execute many instructions per cycle, on a single core. To be able to execute many instructions per cycle in practice, processors predict branches. I have made the point over the years that modern CPUs have an incredible ability to <a href="https://lemire.me/blog/2019/10/16/benchmarking-is-hard-processors-learn-to-predict-branches/">predict branches</a>.</p>
<p>It makes benchmarking difficult because if you test on small datasets, you can get surprising results that might not work on real data.</p>
<p>My go-to benchmark is a function like so:</p>
<div class="highlight" style="background: #f8f8f8;">
<pre style="line-height: 125%;"><code>while (howmany != 0) {
    val = generate_random_value()
    if(val is odd) write to buffer
    decrement howmany
}
</code></pre>
</div>
<p>The processor tries to predict the branch (<code>if</code> clause). Because we use random values, the processor should mispredict one time out of two.</p>
<p>However, if we repeat multiple times the benchmark, always using the same random values, the processor learns the branches. How many can processors learn? I test using three recent processors.</p>
<ul>
<li>The AMD Zen 5 processor can predict perfectly 30,000 branches.</li>
<li>The Apple M4 processor can predict perfectly 10,000 branches.</li>
<li>Intel Emerald Rapids can predict perfectly 5,000 branches.</li>
</ul>
<p>Once more I am disappointed by Intel. AMD is doing wonderfully well on this benchmark.</p>
<p><a href="http://lemire.me/blog/wp-content/uploads/2026/03/branch_mispredictions-2.png"><img loading="lazy" decoding="async" class="alignnone size-full wp-image-22549" src="http://lemire.me/blog/wp-content/uploads/2026/03/branch_mispredictions-2.png" alt="" width="1000" height="600" srcset="https://lemire.me/blog/wp-content/uploads/2026/03/branch_mispredictions-2.png 1000w, https://lemire.me/blog/wp-content/uploads/2026/03/branch_mispredictions-2-300x180.png 300w, https://lemire.me/blog/wp-content/uploads/2026/03/branch_mispredictions-2-768x461.png 768w" sizes="auto, (max-width: 1000px) 100vw, 1000px" /></a><img decoding="async" src="branch_mispredictions.png" alt="" /></p>
<p><a href="https://github.com/lemire/Code-used-on-Daniel-Lemire-s-blog/tree/master/2026/03/18/benchmark">My source code is available</a>.</p>
]]></content:encoded>
					
					<wfw:commentRss>https://lemire.me/blog/2026/03/18/how-many-branches-can-your-cpu-predict/feed/</wfw:commentRss>
			<slash:comments>16</slash:comments>
		
		
			</item>
		<item>
		<title>Prefix sums at tens of gigabytes per second with ARM NEON</title>
		<link>https://lemire.me/blog/2026/03/08/prefix-sums-at-tens-of-gigabytes-per-second-with-arm-neon/</link>
					<comments>https://lemire.me/blog/2026/03/08/prefix-sums-at-tens-of-gigabytes-per-second-with-arm-neon/#comments</comments>
		
		<dc:creator><![CDATA[Daniel Lemire]]></dc:creator>
		<pubDate>Sun, 08 Mar 2026 20:09:48 +0000</pubDate>
				<category><![CDATA[]]></category>
		<guid isPermaLink="false">https://lemire.me/blog/?p=22530</guid>

					<description><![CDATA[<img width="150" height="150" src="https://lemire.me/blog/wp-content/uploads/2026/03/Capture-decran-le-2026-03-08-a-16.07.29-150x150.png" class="webfeedsFeaturedVisual wp-post-image" alt="" style="display: block; margin-bottom: 5px; clear:both;max-width: 100%;" link_thumbnail="" decoding="async" loading="lazy" />Suppose that you have a record of your sales per day. You might want to get a running record where, for each day, you are told how many sales you have made since the start of the year. day sales per day running sales 1 10$ 10 $ 2 15$ 25 $ 3 5$ 30 &#8230; <a href="https://lemire.me/blog/2026/03/08/prefix-sums-at-tens-of-gigabytes-per-second-with-arm-neon/" class="more-link">Continue reading <span class="screen-reader-text">Prefix sums at tens of gigabytes per second with ARM NEON</span></a>]]></description>
										<content:encoded><![CDATA[<img width="150" height="150" src="https://lemire.me/blog/wp-content/uploads/2026/03/Capture-decran-le-2026-03-08-a-16.07.29-150x150.png" class="webfeedsFeaturedVisual wp-post-image" alt="" style="display: block; margin-bottom: 5px; clear:both;max-width: 100%;" link_thumbnail="" decoding="async" loading="lazy" /><p>Suppose that you have a record of your sales per day. You might want to get a running record where, for each day, you are told how many sales you have made since the start of the year.</p>
<table>
<thead>
<tr>
<th>day</th>
<th>sales per day</th>
<th>running sales</th>
</tr>
</thead>
<tbody>
<tr>
<td>1</td>
<td>10$</td>
<td>10 $</td>
</tr>
<tr>
<td>2</td>
<td>15$</td>
<td>25 $</td>
</tr>
<tr>
<td>3</td>
<td>5$</td>
<td>30 $</td>
</tr>
</tbody>
</table>
<p>Such an operation is called a prefix sum or a scan.</p>
<p>Implementing it in C is not difficult. It is a simple loop.</p>
<div class="highlight" style="background: #f8f8f8;">
<pre style="line-height: 125%;"><code>  <span style="color: #204a87; font-weight: bold;">for</span> <span style="color: #000; font-weight: bold;">(</span><span style="color: #204a87; font-weight: bold;">size_t</span> <span style="color: #000;">i</span> <span style="color: #ce5c00; font-weight: bold;">=</span> <span style="color: #0000cf; font-weight: bold;">1</span><span style="color: #000; font-weight: bold;">;</span> <span style="color: #000;">i</span> <span style="color: #ce5c00; font-weight: bold;">&lt;</span> <span style="color: #000;">length</span><span style="color: #000; font-weight: bold;">;</span> <span style="color: #000;">i</span><span style="color: #ce5c00; font-weight: bold;">++</span><span style="color: #000; font-weight: bold;">)</span> <span style="color: #000; font-weight: bold;">{</span>
    <span style="color: #000;">data</span><span style="color: #000; font-weight: bold;">[</span><span style="color: #000;">i</span><span style="color: #000; font-weight: bold;">]</span> <span style="color: #ce5c00; font-weight: bold;">+=</span> <span style="color: #000;">data</span><span style="color: #000; font-weight: bold;">[</span><span style="color: #000;">i</span> <span style="color: #ce5c00; font-weight: bold;">-</span> <span style="color: #0000cf; font-weight: bold;">1</span><span style="color: #000; font-weight: bold;">];</span>
  <span style="color: #000; font-weight: bold;">}</span>
</code></pre>
</div>
<p>How fast can this function be? We can derive a speed limit rather simply: to compute the current value, you must have computed the previous one, and so forth.</p>
<div class="highlight" style="background: #f8f8f8;">
<pre style="line-height: 125%;"><code><span style="color: #000;">data</span><span style="color: #000; font-weight: bold;">[</span><span style="color: #0000cf; font-weight: bold;">0</span><span style="color: #000; font-weight: bold;">]</span> <span style="color: #ce5c00; font-weight: bold;">-&gt;</span> <span style="color: #000;">data</span><span style="color: #000; font-weight: bold;">[</span><span style="color: #0000cf; font-weight: bold;">1</span><span style="color: #000; font-weight: bold;">]</span> <span style="color: #ce5c00; font-weight: bold;">-&gt;</span> <span style="color: #000;">data</span><span style="color: #000; font-weight: bold;">[</span><span style="color: #0000cf; font-weight: bold;">2</span><span style="color: #000; font-weight: bold;">]</span> <span style="color: #ce5c00; font-weight: bold;">-&gt;</span> <span style="color: #000; font-weight: bold;">...</span>
</code></pre>
</div>
<p>At best, you require one CPU cycle per entry in your table. Thus, on a 4 GHz processor, you might process 4 billion integer values per second. It is an upper bound but you might be able to reach close to it in practice on many modern systems. Of course, there are other instructions involved such as loads, stores and branching, but our processors can execute many instructions per cycle and they can predict branches effectively. So you should be able to process billions of integers per second on most processors today.</p>
<p>Not bad! But can we do better?</p>
<p>We can use SIMD instructions. SIMD instructions are special instructions that process several values at once. All 64-bit ARM processors support NEON instructions. NEON instructions can process four integers at once, if they are packed in one SIMD register.</p>
<p>But how do you do the prefix sum on a 4-value register? You can do it with two shifts and two additions. In theory, it scales as log(N) where N is the number elements in a vector register.</p>
<div class="highlight" style="background: #f8f8f8;">
<pre style="line-height: 125%;"><code>input   = [A B   C     D]
shift1  = [0 A   B     C]
sum1    = [A A+B B+C   C+D]
shift2  = [0 0   A     B+A]
result  = [A A+B A+B+C A+B+C+D]
</code></pre>
</div>
<p>You can then extract the last value (<code>A+B+C+D</code>) and broadcast it to all positions so that you can add it to the next value.</p>
<p>Is this faster than the scalar approach? We have 4 instructions in sequence, plus at least one instruction if you want to use the total sum in the next block of four values.</p>
<p>Thus the SIMD approach might be worse. It is disappointing.</p>
<p>A solution might be the scale up over many more integer values.</p>
<p>Consider ARM NEON which has interleaved load and store instructions. If you can load 16 values at once, and get all of the first values together, all of the second values together, and so forth.</p>
<div class="highlight" style="background: #f8f8f8;">
<pre style="line-height: 125%;"><code><span style="color: #000;">original</span> <span style="color: #000;">data</span> <span style="color: #000; font-weight: bold;">:</span> <span style="color: #000;">ABCD</span> <span style="color: #000;">EFGH</span> <span style="color: #000;">IJKL</span> <span style="color: #000;">MNOP</span>
<span style="color: #000;">loaded</span> <span style="color: #000;">data</span>   <span style="color: #000; font-weight: bold;">:</span> <span style="color: #000;">AEIM</span> <span style="color: #000;">BFJN</span> <span style="color: #000;">CGKO</span> <span style="color: #000;">DHLP</span>
</code></pre>
</div>
<p>Then I can do a prefix sum over the four blocks in parallel. It takes three instructions. At the end of the three instructions, we have one register which contains the local sums:</p>
<div class="highlight" style="background: #f8f8f8;">
<pre style="line-height: 125%;"><code>A+B+C+D E+F+G+H I+J+K+L M+N+O+P
</code></pre>
</div>
<p>And then we can apply our prefix sum recipe on this register (4 instructions). You might end up with something like 8 sequential instructions per block of 16 values.</p>
<p>It is theoretically twice as fast as the scalar approach.</p>
<p>In C with instrinsics, you might code it as follows.</p>
<div class="highlight" style="background: #f8f8f8;">
<pre style="line-height: 125%;"><code><span style="color: #204a87; font-weight: bold;">void</span> <span style="color: #000;">neon_prefixsum_fast</span><span style="color: #000; font-weight: bold;">(</span><span style="color: #204a87; font-weight: bold;">uint32_t</span> <span style="color: #ce5c00; font-weight: bold;">*</span><span style="color: #000;">data</span><span style="color: #000; font-weight: bold;">,</span> <span style="color: #204a87; font-weight: bold;">size_t</span> <span style="color: #000;">length</span><span style="color: #000; font-weight: bold;">)</span> <span style="color: #000; font-weight: bold;">{</span>
  <span style="color: #000;">uint32x4_t</span> <span style="color: #000;">zero</span> <span style="color: #ce5c00; font-weight: bold;">=</span> <span style="color: #000; font-weight: bold;">{</span><span style="color: #0000cf; font-weight: bold;">0</span><span style="color: #000; font-weight: bold;">,</span> <span style="color: #0000cf; font-weight: bold;">0</span><span style="color: #000; font-weight: bold;">,</span> <span style="color: #0000cf; font-weight: bold;">0</span><span style="color: #000; font-weight: bold;">,</span> <span style="color: #0000cf; font-weight: bold;">0</span><span style="color: #000; font-weight: bold;">};</span>
  <span style="color: #000;">uint32x4_t</span> <span style="color: #000;">prev</span> <span style="color: #ce5c00; font-weight: bold;">=</span> <span style="color: #000; font-weight: bold;">{</span><span style="color: #0000cf; font-weight: bold;">0</span><span style="color: #000; font-weight: bold;">,</span> <span style="color: #0000cf; font-weight: bold;">0</span><span style="color: #000; font-weight: bold;">,</span> <span style="color: #0000cf; font-weight: bold;">0</span><span style="color: #000; font-weight: bold;">,</span> <span style="color: #0000cf; font-weight: bold;">0</span><span style="color: #000; font-weight: bold;">};</span>

  <span style="color: #204a87; font-weight: bold;">for</span> <span style="color: #000; font-weight: bold;">(</span><span style="color: #204a87; font-weight: bold;">size_t</span> <span style="color: #000;">i</span> <span style="color: #ce5c00; font-weight: bold;">=</span> <span style="color: #0000cf; font-weight: bold;">0</span><span style="color: #000; font-weight: bold;">;</span> <span style="color: #000;">i</span> <span style="color: #ce5c00; font-weight: bold;">&lt;</span> <span style="color: #000;">length</span> <span style="color: #ce5c00; font-weight: bold;">/</span> <span style="color: #0000cf; font-weight: bold;">16</span><span style="color: #000; font-weight: bold;">;</span> <span style="color: #000;">i</span><span style="color: #ce5c00; font-weight: bold;">++</span><span style="color: #000; font-weight: bold;">)</span> <span style="color: #000; font-weight: bold;">{</span>
    <span style="color: #000;">uint32x4x4_t</span> <span style="color: #000;">vals</span> <span style="color: #ce5c00; font-weight: bold;">=</span> <span style="color: #000;">vld4q_u32</span><span style="color: #000; font-weight: bold;">(</span><span style="color: #000;">data</span> <span style="color: #ce5c00; font-weight: bold;">+</span> <span style="color: #0000cf; font-weight: bold;">16</span> <span style="color: #ce5c00; font-weight: bold;">*</span> <span style="color: #000;">i</span><span style="color: #000; font-weight: bold;">);</span>

    <span style="color: #8f5902; font-style: italic;">// Prefix sum inside each transposed ("vertical") lane</span>
    <span style="color: #000;">vals</span><span style="color: #000; font-weight: bold;">.</span><span style="color: #000;">val</span><span style="color: #000; font-weight: bold;">[</span><span style="color: #0000cf; font-weight: bold;">1</span><span style="color: #000; font-weight: bold;">]</span> <span style="color: #ce5c00; font-weight: bold;">=</span> <span style="color: #000;">vaddq_u32</span><span style="color: #000; font-weight: bold;">(</span><span style="color: #000;">vals</span><span style="color: #000; font-weight: bold;">.</span><span style="color: #000;">val</span><span style="color: #000; font-weight: bold;">[</span><span style="color: #0000cf; font-weight: bold;">1</span><span style="color: #000; font-weight: bold;">],</span> <span style="color: #000;">vals</span><span style="color: #000; font-weight: bold;">.</span><span style="color: #000;">val</span><span style="color: #000; font-weight: bold;">[</span><span style="color: #0000cf; font-weight: bold;">0</span><span style="color: #000; font-weight: bold;">]);</span>
    <span style="color: #000;">vals</span><span style="color: #000; font-weight: bold;">.</span><span style="color: #000;">val</span><span style="color: #000; font-weight: bold;">[</span><span style="color: #0000cf; font-weight: bold;">2</span><span style="color: #000; font-weight: bold;">]</span> <span style="color: #ce5c00; font-weight: bold;">=</span> <span style="color: #000;">vaddq_u32</span><span style="color: #000; font-weight: bold;">(</span><span style="color: #000;">vals</span><span style="color: #000; font-weight: bold;">.</span><span style="color: #000;">val</span><span style="color: #000; font-weight: bold;">[</span><span style="color: #0000cf; font-weight: bold;">2</span><span style="color: #000; font-weight: bold;">],</span> <span style="color: #000;">vals</span><span style="color: #000; font-weight: bold;">.</span><span style="color: #000;">val</span><span style="color: #000; font-weight: bold;">[</span><span style="color: #0000cf; font-weight: bold;">1</span><span style="color: #000; font-weight: bold;">]);</span>
    <span style="color: #000;">vals</span><span style="color: #000; font-weight: bold;">.</span><span style="color: #000;">val</span><span style="color: #000; font-weight: bold;">[</span><span style="color: #0000cf; font-weight: bold;">3</span><span style="color: #000; font-weight: bold;">]</span> <span style="color: #ce5c00; font-weight: bold;">=</span> <span style="color: #000;">vaddq_u32</span><span style="color: #000; font-weight: bold;">(</span><span style="color: #000;">vals</span><span style="color: #000; font-weight: bold;">.</span><span style="color: #000;">val</span><span style="color: #000; font-weight: bold;">[</span><span style="color: #0000cf; font-weight: bold;">3</span><span style="color: #000; font-weight: bold;">],</span> <span style="color: #000;">vals</span><span style="color: #000; font-weight: bold;">.</span><span style="color: #000;">val</span><span style="color: #000; font-weight: bold;">[</span><span style="color: #0000cf; font-weight: bold;">2</span><span style="color: #000; font-weight: bold;">]);</span>

    <span style="color: #8f5902; font-style: italic;">// Now vals.val[3] contains the four local prefix sums:</span>
    <span style="color: #8f5902; font-style: italic;">//   vals.val[3] = [s0=A+B+C+D, s1=E+F+G+H, </span>
    <span style="color: #8f5902; font-style: italic;">//                  s2=I+J+K+L, s3=M+N+O+P]</span>

    <span style="color: #8f5902; font-style: italic;">// Compute prefix sum across the four local sums </span>
    <span style="color: #000;">uint32x4_t</span> <span style="color: #000;">off</span> <span style="color: #ce5c00; font-weight: bold;">=</span> <span style="color: #000;">vextq_u32</span><span style="color: #000; font-weight: bold;">(</span><span style="color: #000;">zero</span><span style="color: #000; font-weight: bold;">,</span> <span style="color: #000;">vals</span><span style="color: #000; font-weight: bold;">.</span><span style="color: #000;">val</span><span style="color: #000; font-weight: bold;">[</span><span style="color: #0000cf; font-weight: bold;">3</span><span style="color: #000; font-weight: bold;">],</span> <span style="color: #0000cf; font-weight: bold;">3</span><span style="color: #000; font-weight: bold;">);</span>
    <span style="color: #000;">uint32x4_t</span> <span style="color: #000;">ps</span> <span style="color: #ce5c00; font-weight: bold;">=</span> <span style="color: #000;">vaddq_u32</span><span style="color: #000; font-weight: bold;">(</span><span style="color: #000;">vals</span><span style="color: #000; font-weight: bold;">.</span><span style="color: #000;">val</span><span style="color: #000; font-weight: bold;">[</span><span style="color: #0000cf; font-weight: bold;">3</span><span style="color: #000; font-weight: bold;">],</span> <span style="color: #000;">off</span><span style="color: #000; font-weight: bold;">);</span>       
    <span style="color: #000;">off</span> <span style="color: #ce5c00; font-weight: bold;">=</span> <span style="color: #000;">vextq_u32</span><span style="color: #000; font-weight: bold;">(</span><span style="color: #000;">zero</span><span style="color: #000; font-weight: bold;">,</span> <span style="color: #000;">ps</span><span style="color: #000; font-weight: bold;">,</span> <span style="color: #0000cf; font-weight: bold;">2</span><span style="color: #000; font-weight: bold;">);</span>                      
    <span style="color: #000;">ps</span> <span style="color: #ce5c00; font-weight: bold;">=</span> <span style="color: #000;">vaddq_u32</span><span style="color: #000; font-weight: bold;">(</span><span style="color: #000;">ps</span><span style="color: #000; font-weight: bold;">,</span> <span style="color: #000;">off</span><span style="color: #000; font-weight: bold;">);</span>

    <span style="color: #8f5902; font-style: italic;">// Now ps contains cumulative sums across the four groups</span>
    <span style="color: #8f5902; font-style: italic;">// Add the incoming carry from the previous 16-element block</span>
    <span style="color: #000;">ps</span> <span style="color: #ce5c00; font-weight: bold;">=</span> <span style="color: #000;">vaddq_u32</span><span style="color: #000; font-weight: bold;">(</span><span style="color: #000;">ps</span><span style="color: #000; font-weight: bold;">,</span> <span style="color: #000;">prev</span><span style="color: #000; font-weight: bold;">);</span>

    <span style="color: #8f5902; font-style: italic;">// Prepare carry for next block: broadcast the last lane of ps</span>
    <span style="color: #000;">prev</span> <span style="color: #ce5c00; font-weight: bold;">=</span> <span style="color: #000;">vdupq_laneq_u32</span><span style="color: #000; font-weight: bold;">(</span><span style="color: #000;">ps</span><span style="color: #000; font-weight: bold;">,</span> <span style="color: #0000cf; font-weight: bold;">3</span><span style="color: #000; font-weight: bold;">);</span>

    <span style="color: #8f5902; font-style: italic;">// The add vector to apply to the original lanes is the </span>
    <span style="color: #8f5902; font-style: italic;">// prefix up to previous group</span>
    <span style="color: #000;">uint32x4_t</span> <span style="color: #000;">add</span> <span style="color: #ce5c00; font-weight: bold;">=</span> <span style="color: #000;">vextq_u32</span><span style="color: #000; font-weight: bold;">(</span><span style="color: #000;">prev</span><span style="color: #000; font-weight: bold;">,</span> <span style="color: #000;">ps</span><span style="color: #000; font-weight: bold;">,</span> <span style="color: #0000cf; font-weight: bold;">3</span><span style="color: #000; font-weight: bold;">);</span>  

    <span style="color: #8f5902; font-style: italic;">// Apply carry/offset to each of the four transposed lanes</span>
    <span style="color: #000;">vals</span><span style="color: #000; font-weight: bold;">.</span><span style="color: #000;">val</span><span style="color: #000; font-weight: bold;">[</span><span style="color: #0000cf; font-weight: bold;">0</span><span style="color: #000; font-weight: bold;">]</span> <span style="color: #ce5c00; font-weight: bold;">=</span> <span style="color: #000;">vaddq_u32</span><span style="color: #000; font-weight: bold;">(</span><span style="color: #000;">vals</span><span style="color: #000; font-weight: bold;">.</span><span style="color: #000;">val</span><span style="color: #000; font-weight: bold;">[</span><span style="color: #0000cf; font-weight: bold;">0</span><span style="color: #000; font-weight: bold;">],</span> <span style="color: #000;">add</span><span style="color: #000; font-weight: bold;">);</span>
    <span style="color: #000;">vals</span><span style="color: #000; font-weight: bold;">.</span><span style="color: #000;">val</span><span style="color: #000; font-weight: bold;">[</span><span style="color: #0000cf; font-weight: bold;">1</span><span style="color: #000; font-weight: bold;">]</span> <span style="color: #ce5c00; font-weight: bold;">=</span> <span style="color: #000;">vaddq_u32</span><span style="color: #000; font-weight: bold;">(</span><span style="color: #000;">vals</span><span style="color: #000; font-weight: bold;">.</span><span style="color: #000;">val</span><span style="color: #000; font-weight: bold;">[</span><span style="color: #0000cf; font-weight: bold;">1</span><span style="color: #000; font-weight: bold;">],</span> <span style="color: #000;">add</span><span style="color: #000; font-weight: bold;">);</span>
    <span style="color: #000;">vals</span><span style="color: #000; font-weight: bold;">.</span><span style="color: #000;">val</span><span style="color: #000; font-weight: bold;">[</span><span style="color: #0000cf; font-weight: bold;">2</span><span style="color: #000; font-weight: bold;">]</span> <span style="color: #ce5c00; font-weight: bold;">=</span> <span style="color: #000;">vaddq_u32</span><span style="color: #000; font-weight: bold;">(</span><span style="color: #000;">vals</span><span style="color: #000; font-weight: bold;">.</span><span style="color: #000;">val</span><span style="color: #000; font-weight: bold;">[</span><span style="color: #0000cf; font-weight: bold;">2</span><span style="color: #000; font-weight: bold;">],</span> <span style="color: #000;">add</span><span style="color: #000; font-weight: bold;">);</span>
    <span style="color: #000;">vals</span><span style="color: #000; font-weight: bold;">.</span><span style="color: #000;">val</span><span style="color: #000; font-weight: bold;">[</span><span style="color: #0000cf; font-weight: bold;">3</span><span style="color: #000; font-weight: bold;">]</span> <span style="color: #ce5c00; font-weight: bold;">=</span> <span style="color: #000;">vaddq_u32</span><span style="color: #000; font-weight: bold;">(</span><span style="color: #000;">vals</span><span style="color: #000; font-weight: bold;">.</span><span style="color: #000;">val</span><span style="color: #000; font-weight: bold;">[</span><span style="color: #0000cf; font-weight: bold;">3</span><span style="color: #000; font-weight: bold;">],</span> <span style="color: #000;">add</span><span style="color: #000; font-weight: bold;">);</span>

    <span style="color: #8f5902; font-style: italic;">// Store back the four lanes (interleaved)</span>
    <span style="color: #000;">vst4q_u32</span><span style="color: #000; font-weight: bold;">(</span><span style="color: #000;">data</span> <span style="color: #ce5c00; font-weight: bold;">+</span> <span style="color: #0000cf; font-weight: bold;">16</span> <span style="color: #ce5c00; font-weight: bold;">*</span> <span style="color: #000;">i</span><span style="color: #000; font-weight: bold;">,</span> <span style="color: #000;">vals</span><span style="color: #000; font-weight: bold;">);</span>
  <span style="color: #000; font-weight: bold;">}</span>

  <span style="color: #000;">scalar_prefixsum_leftover</span><span style="color: #000; font-weight: bold;">(</span><span style="color: #000;">data</span><span style="color: #000; font-weight: bold;">,</span> <span style="color: #000;">length</span><span style="color: #000; font-weight: bold;">,</span> <span style="color: #0000cf; font-weight: bold;">16</span><span style="color: #000; font-weight: bold;">);</span>
<span style="color: #000; font-weight: bold;">}</span>
</code></pre>
</div>
<p>Let us try it out on an Apple M4 processor (4.5 GHz).</p>
<table>
<thead>
<tr>
<th>method</th>
<th>billions of values/s</th>
</tr>
</thead>
<tbody>
<tr>
<td>scalar</td>
<td>3.9</td>
</tr>
<tr>
<td>naive SIMD</td>
<td>3.6</td>
</tr>
<tr>
<td>fast SIMD</td>
<td>8.9</td>
</tr>
</tbody>
</table>
<p>So the SIMD approach is about 2.3 times faster than the scalar approach. Not bad.</p>
<p><a href="https://github.com/lemire/Code-used-on-Daniel-Lemire-s-blog/tree/master/2026/03/06/benchmark">My source code is available on GitHub</a>.</p>
<p><strong>Appendix. Instrinsics</strong></p>
<table>
<thead>
<tr>
<th>Intrinsic</th>
<th>What it does</th>
</tr>
</thead>
<tbody>
<tr>
<td><code>vld4q_u32</code></td>
<td>Loads 16 consecutive 32-bit unsigned integers from memory and deinterleaves them into 4 separate <code>uint32x4_t</code> vectors (lane 0 = elements 0,4,8,12,…; lane 1 = 1,5,9,13,… etc.).</td>
</tr>
<tr>
<td><code>vaddq_u32</code></td>
<td>Adds corresponding 32-bit unsigned integer lanes from two vectors (<code>a[i] + b[i]</code> for each of 4 lanes).</td>
</tr>
<tr>
<td><code>vextq_u32</code></td>
<td>Extracts (concatenates a and b, then takes 4 lanes starting from lane <code>n</code> of the 8-lane concatenation). Used to implement shifts/rotates by inserting zeros (when <code>a</code> is zero vector).</td>
</tr>
<tr>
<td><code>vdupq_laneq_u32</code></td>
<td>Broadcasts (duplicates) the value from the specified lane (0–3) of the input vector to all 4 lanes of the result.</td>
</tr>
<tr>
<td><code>vdupq_n_u32</code> (implied usage)</td>
<td>Sets all 4 lanes of the result to the same scalar value (commonly used for zero or broadcast).</td>
</tr>
</tbody>
</table>
]]></content:encoded>
					
					<wfw:commentRss>https://lemire.me/blog/2026/03/08/prefix-sums-at-tens-of-gigabytes-per-second-with-arm-neon/feed/</wfw:commentRss>
			<slash:comments>3</slash:comments>
		
		
			</item>
		<item>
		<title>Text formats are everywhere. Why?</title>
		<link>https://lemire.me/blog/2026/03/05/text-formats-are-everywhere-why/</link>
					<comments>https://lemire.me/blog/2026/03/05/text-formats-are-everywhere-why/#comments</comments>
		
		<dc:creator><![CDATA[Daniel Lemire]]></dc:creator>
		<pubDate>Thu, 05 Mar 2026 14:40:58 +0000</pubDate>
				<category><![CDATA[]]></category>
		<guid isPermaLink="false">https://lemire.me/blog/?p=22526</guid>

					<description><![CDATA[<img width="150" height="150" src="https://lemire.me/blog/wp-content/uploads/2026/03/HCp4uWQX0AAhxh0-150x150.jpeg" class="webfeedsFeaturedVisual wp-post-image" alt="" style="display: block; margin-bottom: 5px; clear:both;max-width: 100%;" link_thumbnail="" decoding="async" loading="lazy" />The Internet relies on text formats. Thus, we spend a lot of time producing and consuming data encoded in text. Your web pages are HTML. The code running in them is JavaScript, sent as text (JavaScript source), not as already-parsed code. Your emails, including their attachments, are sent as text (your binary files are sent &#8230; <a href="https://lemire.me/blog/2026/03/05/text-formats-are-everywhere-why/" class="more-link">Continue reading <span class="screen-reader-text">Text formats are everywhere. Why?</span></a>]]></description>
										<content:encoded><![CDATA[<img width="150" height="150" src="https://lemire.me/blog/wp-content/uploads/2026/03/HCp4uWQX0AAhxh0-150x150.jpeg" class="webfeedsFeaturedVisual wp-post-image" alt="" style="display: block; margin-bottom: 5px; clear:both;max-width: 100%;" link_thumbnail="" decoding="async" loading="lazy" /><p>The Internet relies on text formats. Thus, we spend a lot of time producing and consuming data encoded in text.</p>
<p>Your web pages are HTML. The code running in them is JavaScript, sent as text (JavaScript source), not as already-parsed code. Your emails, including their attachments, are sent as text (your binary files are sent as text).</p>
<p>It does not stop there. The Python code that runs your server is stored as text. It queries data by sending text queries. It often gets back the answer as text that must then be decoded.</p>
<p>JSON is the universal data interchange format online today. We share maps as JSON (GeoJSON).</p>
<p>Not everything is text, of course. There is no common video or image format that is shared as text. Transmissions over the Internet are routinely compressed to binary formats. There are popular binary formats that compete with JSON.<br />
But why is text dominant?</p>
<p>It is not because, back in the 1970s, programmers did not know about binary formats.</p>
<p>In fact, we did not start with text formats. Initially, we worked with raw binary data. Those of us old enough will remember programming in assembly using raw byte values.</p>
<p>Why text won?</p>
<p>1.Text is efficient.</p>
<p>In the XML era, when everything had to be XML, there were countless proposals for binary formats. People were sometimes surprised to find that the binary approach was not much faster in practice. Remember that many text formats date back to an era when computers were much slower. Had text been a performance bottleneck, it would not have spread. Of course, there are cases where text makes things slower. You then have a choice: optimize your code further or transition to another format. Often, both are viable.</p>
<p>It is easy to make wrong assumptions about binary formats, such as that you can consume them without any parsing or validation. If you pick up data from the Internet, you must assume that it could have been sent by an adversary or someone who does not follow your conventions.</p>
<p>2.Text is easy to work with.</p>
<p>If you receive text from a remote source, you can often transform it, index it, search it, quote it, version it… with little effort and without in-depth knowledge of the format. Text is often self-documenting.</p>
<p>In an open world, when you will never speak with the person producing the data, text often makes everything easier and smoother.</p>
<p>If there is an issue to report and the data is in text, you can usually copy-paste the relevant section into a message. Things are much harder with a binary format.</p>
]]></content:encoded>
					
					<wfw:commentRss>https://lemire.me/blog/2026/03/05/text-formats-are-everywhere-why/feed/</wfw:commentRss>
			<slash:comments>5</slash:comments>
		
		
			</item>
		<item>
		<title>You can use newline characters in URLs</title>
		<link>https://lemire.me/blog/2026/02/28/you-can-use-newline-characters-in-urls/</link>
					<comments>https://lemire.me/blog/2026/02/28/you-can-use-newline-characters-in-urls/#comments</comments>
		
		<dc:creator><![CDATA[Daniel Lemire]]></dc:creator>
		<pubDate>Sat, 28 Feb 2026 19:21:39 +0000</pubDate>
				<category><![CDATA[]]></category>
		<guid isPermaLink="false">https://lemire.me/blog/?p=22517</guid>

					<description><![CDATA[<img width="150" height="150" src="https://lemire.me/blog/wp-content/uploads/2026/02/Capture-decran-le-2026-02-28-a-14.21.14-150x150.png" class="webfeedsFeaturedVisual wp-post-image" alt="" style="display: block; margin-bottom: 5px; clear:both;max-width: 100%;" link_thumbnail="" decoding="async" loading="lazy" />We locate web content using special addresses called URLs. We are all familiar with addresses like https://google.com. Sometimes, URLs can get long and they can become difficult to read. Thus, we might be tempted to format them like so in HTML using newline and tab characters, like so: &#60;a href="https://lemire.me/blog/2026/02/21/ how-fast-do-browsers-correct-utf-16-strings/"&#62;my blog post&#60;/a&#62; It will &#8230; <a href="https://lemire.me/blog/2026/02/28/you-can-use-newline-characters-in-urls/" class="more-link">Continue reading <span class="screen-reader-text">You can use newline characters in URLs</span></a>]]></description>
										<content:encoded><![CDATA[<img width="150" height="150" src="https://lemire.me/blog/wp-content/uploads/2026/02/Capture-decran-le-2026-02-28-a-14.21.14-150x150.png" class="webfeedsFeaturedVisual wp-post-image" alt="" style="display: block; margin-bottom: 5px; clear:both;max-width: 100%;" link_thumbnail="" decoding="async" loading="lazy" /><p>We locate web content using special addresses called URLs. We are all familiar with addresses like https://google.com. Sometimes, URLs can get long and they can become difficult to read. Thus, we might be tempted to format them<br />
like so in HTML using newline and tab characters, like so:</p>
<div class="highlight" style="background: #f8f8f8;">
<pre style="line-height: 125%;"><code><span style="color: #000; font-weight: bold;">&lt;</span><span style="color: #204a87; font-weight: bold;">a</span> <span style="color: #c4a000;">href</span><span style="color: #ce5c00; font-weight: bold;">=</span><span style="color: #4e9a06;">"https://lemire.me/blog/2026/02/21/</span>
<span style="color: #4e9a06;">        how-fast-do-browsers-correct-utf-16-strings/"</span><span style="color: #000; font-weight: bold;">&gt;</span>my blog post<span style="color: #000; font-weight: bold;">&lt;/</span><span style="color: #204a87; font-weight: bold;">a</span><span style="color: #000; font-weight: bold;">&gt;</span>
</code></pre>
</div>
<p>It will work.</p>
<p>Let us refer to the WHATWG URL specification that browsers follow. It makes two statements in sequence.</p>
<ol>
<li>If input contains any ASCII tab or newline, invalid-URL-unit validation error.</li>
<li>Remove all ASCII tab or newline from input.</li>
</ol>
<p>Notice how it reports an error if there is a tab or newline character, but continues anyway? The specification says that <em>A validation error does not mean that the parser terminates</em> and it encourages systems to report errors <em>somewhere</em>. Effectively, the error is ignored although it might be logged. Thus our HTML is fine in practice.</p>
<p>The following is also fine:</p>
<div class="highlight" style="background: #f8f8f8;">
<pre style="line-height: 125%;"><code><span style="color: #000; font-weight: bold;">&lt;</span><span style="color: #204a87; font-weight: bold;">a</span> <span style="color: #c4a000;">href</span><span style="color: #ce5c00; font-weight: bold;">=</span><span style="color: #4e9a06;">"https://go</span>
<span style="color: #4e9a06;">ogle.c</span>
<span style="color: #4e9a06;">om"</span> <span style="color: #c4a000;">class</span><span style="color: #ce5c00; font-weight: bold;">=</span><span style="color: #4e9a06;">"button"</span><span style="color: #000; font-weight: bold;">&gt;</span>Visit Google<span style="color: #000; font-weight: bold;">&lt;/</span><span style="color: #204a87; font-weight: bold;">a</span><span style="color: #000; font-weight: bold;">&gt;</span>
</code></pre>
</div>
<p>You can also use tabs. But you cannot arbitrarily insert any other whitespace.</p>
<p>Yet there are cases when you can use any ASCII whitespace character: data URLs. Data URLs (also called data URIs) embed small files—like images, text, or other content—directly inside a URL string, instead of linking to an external resource. Data URLs are a <em>special kind</em> of URL and they follow different rules.</p>
<p>A typical data URL might look like <code>data:image/png;base64,iVBORw0KGgoAAAANSUhEUg...</code> where the string <code>iVBORw0KGgoAAAANSUhEUg...</code> is the binary data of the image that has been encoded with base64. Base64 is a text format that can represent any binary content: we use 64 ASCII characters so that each character encodes 6 bits. Your binary email attachments are base64 encoded.</p>
<p>On the web, when decoding a base64 string, you ignore all ASCII whitespaces (including the space character itself). Thus you can embed a PNG image in HTML as follows.</p>
<div class="highlight" style="background: #f8f8f8;">
<pre style="line-height: 125%;"><code><span style="color: #000; font-weight: bold;">&lt;</span><span style="color: #204a87; font-weight: bold;">img</span> <span style="color: #c4a000;">src</span><span style="color: #ce5c00; font-weight: bold;">=</span><span style="color: #4e9a06;">"data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAA
                                 QAAAAECAIAAAAmkwkpAAAAEUl
                                 EQVR4nG</span><span style="color: #4e9a06;">P8z4AATEhsPBwAM9EB
                                 BzDn4UwAAAAASUVORK5CYII="</span> <span style="color: #000; font-weight: bold;">/&gt;</span>
</code></pre>
</div>
<p>This HTML code is valid and will insert a tiny image in your page.</p>
<p>But there is more. A data URL can also be used to insert an SVG image. SVG (Scalable Vector Graphics) is an XML-based vector image format that describes 2D graphics using mathematical paths, shapes, and text instead of pixels.<br />
The following should draw a very simple sunset:</p>
<div class="highlight" style="background: #f8f8f8;">
<pre style="line-height: 125%;"><code><span style="color: #000; font-weight: bold;">&lt;</span><span style="color: #204a87; font-weight: bold;">img</span> <span style="color: #c4a000;">src</span><span style="color: #ce5c00; font-weight: bold;">=</span><span style="color: #4e9a06;">'data:image/svg+xml,</span>
<span style="color: #4e9a06;">&lt;svg width="200" height="200" 
     xmlns="http://www.w3.org/2000/svg"&gt;</span>
<span style="color: #4e9a06;">  &lt;rect width="100%" height="100%" fill="blue" /&gt; 
  &lt;!-- the sky --&gt;</span>
<span style="color: #4e9a06;">  &lt;circle cx="100" cy="110" r="50" fill="yellow" /&gt;  
  &lt;!-- the sun --&gt;</span>
<span style="color: #4e9a06;">  &lt;rect x="0" y="120" width="200" height="80" fill="brown" /&gt;  
  &lt;!-- the ground --&gt;</span>
<span style="color: #4e9a06;">&lt;/svg&gt;'</span> <span style="color: #000; font-weight: bold;">/&gt;</span>
</code></pre>
</div>
<p>Observe how I was able to format the SVG code so that it is readable.</p>
<p><strong>Further reading</strong>: Nizipli, Y., &amp; Lemire, D. (2024). <a href="https://doi.org/10.1002/spe.3296">Parsing millions of URLs per second</a>. <i>Software: Practice and Experience</i>, <i>54</i>(5), 744-758.</p>
]]></content:encoded>
					
					<wfw:commentRss>https://lemire.me/blog/2026/02/28/you-can-use-newline-characters-in-urls/feed/</wfw:commentRss>
			<slash:comments>3</slash:comments>
		
		
			</item>
		<item>
		<title>How fast do browsers correct UTF-16 strings?</title>
		<link>https://lemire.me/blog/2026/02/21/how-fast-do-browsers-correct-utf-16-strings/</link>
					<comments>https://lemire.me/blog/2026/02/21/how-fast-do-browsers-correct-utf-16-strings/#respond</comments>
		
		<dc:creator><![CDATA[Daniel Lemire]]></dc:creator>
		<pubDate>Sat, 21 Feb 2026 20:07:17 +0000</pubDate>
				<category><![CDATA[]]></category>
		<guid isPermaLink="false">https://lemire.me/blog/?p=22443</guid>

					<description><![CDATA[<img width="150" height="150" src="https://lemire.me/blog/wp-content/uploads/2026/02/Capture-decran-le-2026-02-21-a-15.06.57-150x150.png" class="webfeedsFeaturedVisual wp-post-image" alt="" style="display: block; margin-bottom: 5px; clear:both;max-width: 100%;" link_thumbnail="" decoding="async" loading="lazy" />JavaScript represents strings using Unicode, like most programming languages today. Each character in a JavaScript string is stored using one or two 16-bit words. The following JavaScript code might surprise some programmers because a single character becomes two 16-bit words. &#62; t="🧰" '🧰' &#62; t.length 2 &#62; t[0] '\ud83e' &#62; t[1] '\uddf0' The convention is &#8230; <a href="https://lemire.me/blog/2026/02/21/how-fast-do-browsers-correct-utf-16-strings/" class="more-link">Continue reading <span class="screen-reader-text">How fast do browsers correct UTF-16 strings?</span></a>]]></description>
										<content:encoded><![CDATA[<img width="150" height="150" src="https://lemire.me/blog/wp-content/uploads/2026/02/Capture-decran-le-2026-02-21-a-15.06.57-150x150.png" class="webfeedsFeaturedVisual wp-post-image" alt="" style="display: block; margin-bottom: 5px; clear:both;max-width: 100%;" link_thumbnail="" decoding="async" loading="lazy" /><p>JavaScript represents strings using Unicode, like most programming languages today. Each character in a JavaScript string is stored using one or two 16-bit words. The following JavaScript code might surprise some programmers because a single character becomes two 16-bit words.</p>
<pre>&gt; t="&#x1f9f0;"
'&#x1f9f0;'
&gt; t.length
2
&gt; t[0]
'\ud83e'
&gt; t[1]
'\uddf0'
</pre>
<p>The convention is that \uddf0 is the 16-bit value 0xDDF0 also written U+DDF0.</p>
<p>The UTF-16 standard is relatively simple. There are three types of values. high surrogates (the range U+D800 to U+DBFF), low surrogates (U+DC00 to U+DFFF), and all other code units (U+0000–U+D7FF together with U+E000–U+FFFF). A high surrogate must always be followed by a low surrogate, and a low surrogate must always be preceded by a high surrogate.</p>
<p>What happens if you break the rules and have a high surrogate followed by a high surrogate? Then you have an invalid string. We can correct the strings by patching them: we replace the bad values by the replacement character (\ufffd). The replacement character sometimes appears as a question mark.</p>
<p>To correct a broken string in JavaScript, you can call the toWellFormed method.</p>
<pre>&gt; t = '\uddf0\uddf0'
'\uddf0\uddf0'
&gt; t.toWellFormed()
'��'
</pre>
<p>How fast is it?</p>
<p>I wrote a s<a href="https://lemire.github.io/browserwellformed/">mall benchmark</a> that you can test online to measure its speed. I use broken strings of various sizes up to a few kilobytes. I run the benchmarks on my Apple M4 processor using different browsers.</p>
<table>
<tbody>
<tr>
<th>Browser</th>
<th>Speed</th>
</tr>
<tr>
<td>Safari 18.6</td>
<td>1 GiB/s</td>
</tr>
<tr>
<td>Firefox 147</td>
<td>3 GiB/s</td>
</tr>
<tr>
<td>Chrome 145</td>
<td>15 GiB/s</td>
</tr>
</tbody>
</table>
<p>Quite a range of performance! The speed of other chromium-based browsers (Brave and Edge) is much the same as Chrome.</p>
<p>I also tested with JavaScript runtimes.</p>
<table>
<tbody>
<tr>
<th>Engine</th>
<th>Speed</th>
</tr>
<tr>
<td>Node.js v25.5.0</td>
<td>16 GiB/s</td>
</tr>
<tr>
<td>Bun 1.3.9</td>
<td>8.4 GiB/s</td>
</tr>
</tbody>
</table>
<p>Usually Bun is faster than Node, but in this instance, Node is twice as far as Bun.</p>
<p>Thus, we can correct strings in JavaScript at over ten gigabytes per second if you use Chromium-based browsers.</p>
]]></content:encoded>
					
					<wfw:commentRss>https://lemire.me/blog/2026/02/21/how-fast-do-browsers-correct-utf-16-strings/feed/</wfw:commentRss>
			<slash:comments>0</slash:comments>
		
		
			</item>
		<item>
		<title>How bad can Python stop-the-world pauses get?</title>
		<link>https://lemire.me/blog/2026/02/15/how-bad-can-python-stop-the-world-pauses-get/</link>
					<comments>https://lemire.me/blog/2026/02/15/how-bad-can-python-stop-the-world-pauses-get/#comments</comments>
		
		<dc:creator><![CDATA[Daniel Lemire]]></dc:creator>
		<pubDate>Sun, 15 Feb 2026 20:02:29 +0000</pubDate>
				<category><![CDATA[]]></category>
		<guid isPermaLink="false">https://lemire.me/blog/?p=22495</guid>

					<description><![CDATA[<img width="150" height="150" src="https://lemire.me/blog/wp-content/uploads/2026/02/Capture-decran-le-2026-02-15-a-15.01.17-150x150.png" class="webfeedsFeaturedVisual wp-post-image" alt="" style="display: block; margin-bottom: 5px; clear:both;max-width: 100%;" link_thumbnail="" decoding="async" loading="lazy" />When programming, we need to allocate memory, and then deallocate it. If you program in C, you get used to malloc/free functions. Sadly, this leaves you vulnerable to memory leaks: unrecovered memory. Most popular programming languages today use automated memory management: Java, JavaScript, Python, C#, Go, Swift and so forth. There are essentially two types &#8230; <a href="https://lemire.me/blog/2026/02/15/how-bad-can-python-stop-the-world-pauses-get/" class="more-link">Continue reading <span class="screen-reader-text">How bad can Python stop-the-world pauses get?</span></a>]]></description>
										<content:encoded><![CDATA[<img width="150" height="150" src="https://lemire.me/blog/wp-content/uploads/2026/02/Capture-decran-le-2026-02-15-a-15.01.17-150x150.png" class="webfeedsFeaturedVisual wp-post-image" alt="" style="display: block; margin-bottom: 5px; clear:both;max-width: 100%;" link_thumbnail="" decoding="async" loading="lazy" /><p>When programming, we need to allocate memory, and then deallocate it. If you program in C, you get used to malloc/free functions. Sadly, this leaves you vulnerable to memory leaks: unrecovered memory. Most popular programming languages today use automated memory management: Java, JavaScript, Python, C#, Go, Swift and so forth.</p>
<p>There are essentially two types of automated memory managements. The simplest method is reference counting. You track how many references there are to each object. When an object has no more references, then we can free the memory associated with it. Swift and Python use reference counting. The downside of reference counting are circular references. You may have your main program reference object A, then you add object B which references object A, and you make it so that object A also reference object B. Thus object B has one reference while object A has two references. If your main program drops its reference to object A, the both objects A and B still have a reference count of one. Yet they should be freed. To solve this problem, you could just visit all of your objects to detect which are unreachable, including A and B. However, it takes time to do so. Thus, the other popular approach of automated memory management: generational garbage collection. You use the fact that most memory gets released soon after allocation. Thus you track young objects and visit them from time to time. Then, more rarely, you do a full scan. The downside of generational garbage collection is that typical implementations stop the world to scan the memory. In many instances, your entire program is stopped. There are many variations on the implementation, with decades of research.</p>
<p>The common Python implementation has both types: reference counting and generational garbage collection. The generational garbage collection component can trigger pauses. A lot of servers are written in Python. It means that your service might just become unavailable for a time. We often call them &#8216;stop the world&#8217; pauses. How long can this pause get?</p>
<p>To test this out, I wrote a Python function to create a classical linked list:</p>
<div class="highlight" style="background: #f8f8f8;">
<pre style="line-height: 125%;"><code><span style="color: #204a87; font-weight: bold;">class</span> <span style="color: #000;">Node</span><span style="color: #000; font-weight: bold;">:</span>
    <span style="color: #204a87; font-weight: bold;">def</span> <span style="color: #000;">__init__</span><span style="color: #000; font-weight: bold;">(</span><span style="color: #3465a4;">self</span><span style="color: #000; font-weight: bold;">,</span> <span style="color: #000;">value</span><span style="color: #000; font-weight: bold;">):</span>
        <span style="color: #3465a4;">self</span><span style="color: #ce5c00; font-weight: bold;">.</span><span style="color: #000;">value</span> <span style="color: #ce5c00; font-weight: bold;">=</span> <span style="color: #000;">value</span>
        <span style="color: #3465a4;">self</span><span style="color: #ce5c00; font-weight: bold;">.</span><span style="color: #000;">next</span> <span style="color: #ce5c00; font-weight: bold;">=</span> <span style="color: #204a87; font-weight: bold;">None</span>
    <span style="color: #204a87; font-weight: bold;">def</span> <span style="color: #000;">add_next</span><span style="color: #000; font-weight: bold;">(</span><span style="color: #3465a4;">self</span><span style="color: #000; font-weight: bold;">,</span> <span style="color: #000;">node</span><span style="color: #000; font-weight: bold;">):</span>
        <span style="color: #3465a4;">self</span><span style="color: #ce5c00; font-weight: bold;">.</span><span style="color: #000;">next</span> <span style="color: #ce5c00; font-weight: bold;">=</span> <span style="color: #000;">node</span>

<span style="color: #204a87; font-weight: bold;">def</span> <span style="color: #000;">create_linked_list</span><span style="color: #000; font-weight: bold;">(</span><span style="color: #000;">limit</span><span style="color: #000; font-weight: bold;">):</span>
    <span style="color: #8f5902; font-style: italic;">""" create a linked list of length 'limit' """</span>
    <span style="color: #000;">head</span> <span style="color: #ce5c00; font-weight: bold;">=</span> <span style="color: #000;">Node</span><span style="color: #000; font-weight: bold;">(</span><span style="color: #0000cf; font-weight: bold;">0</span><span style="color: #000; font-weight: bold;">)</span>
    <span style="color: #000;">current</span> <span style="color: #ce5c00; font-weight: bold;">=</span> <span style="color: #000;">head</span>
    <span style="color: #204a87; font-weight: bold;">for</span> <span style="color: #000;">i</span> <span style="color: #204a87; font-weight: bold;">in</span> <span style="color: #204a87;">range</span><span style="color: #000; font-weight: bold;">(</span><span style="color: #0000cf; font-weight: bold;">1</span><span style="color: #000; font-weight: bold;">,</span> <span style="color: #000;">limit</span><span style="color: #000; font-weight: bold;">):</span>
        <span style="color: #000;">new_node</span> <span style="color: #ce5c00; font-weight: bold;">=</span> <span style="color: #000;">Node</span><span style="color: #000; font-weight: bold;">(</span><span style="color: #000;">i</span><span style="color: #000; font-weight: bold;">)</span>
        <span style="color: #000;">current</span><span style="color: #ce5c00; font-weight: bold;">.</span><span style="color: #000;">add_next</span><span style="color: #000; font-weight: bold;">(</span><span style="color: #000;">new_node</span><span style="color: #000; font-weight: bold;">)</span>
        <span style="color: #000;">current</span> <span style="color: #ce5c00; font-weight: bold;">=</span> <span style="color: #000;">new_node</span>
    <span style="color: #204a87; font-weight: bold;">return</span> <span style="color: #000;">head</span>
</code></pre>
</div>
<p>And then I create one large linked list and then, in a tight loop, we create small linked lists that are immediately discarded.</p>
<div class="highlight" style="background: #f8f8f8;">
<pre style="line-height: 125%;"><code><span style="color: #000;">x</span> <span style="color: #ce5c00; font-weight: bold;">=</span> <span style="color: #000;">create_linked_list</span><span style="color: #000; font-weight: bold;">(</span><span style="color: #0000cf; font-weight: bold;">50_000_000</span><span style="color: #000; font-weight: bold;">)</span>
<span style="color: #204a87; font-weight: bold;">for</span> <span style="color: #000;">i</span> <span style="color: #204a87; font-weight: bold;">in</span> <span style="color: #204a87;">range</span><span style="color: #000; font-weight: bold;">(</span><span style="color: #0000cf; font-weight: bold;">1000000</span><span style="color: #000; font-weight: bold;">):</span>
    <span style="color: #000;">create_linked_list</span><span style="color: #000; font-weight: bold;">(</span><span style="color: #0000cf; font-weight: bold;">1000</span><span style="color: #000; font-weight: bold;">)</span>
</code></pre>
</div>
<p>A key characteristic of my code is the 50 million linked list. It does not get released until the end of the program, but the garbage collector may still examine it.</p>
<p>And I record the maximum delay between two iterations in the loop (using <code>time.time()</code>).</p>
<p>How bad can it get? The answer depends on the Python version. And it is not consistent from run-to-run. So I ran it once and picked whatever result I got. I express the delay in milliseconds.</p>
<table>
<thead>
<tr>
<th>python version</th>
<th>system</th>
<th>max delay</th>
</tr>
</thead>
<tbody>
<tr>
<td>3.14</td>
<td>macOS (Apple M4)</td>
<td>320 ms</td>
</tr>
<tr>
<td>3.12</td>
<td>Linux (Intel Ice Lake)</td>
<td> 2,200 ms</td>
</tr>
</tbody>
</table>
<p>Almost all of this delay (say 320 ms) is due to the garbage collection. Creating a linked list with 1000 elements takes less than a millisecond.</p>
<p>How long is 320 ms? It is a third of a second, so it is long enough for human beings to notice it. For reference, a video game drawing the screen 60 times per second has less than 17 ms to draw the screen. The 2,200 ms delay could look like a server crash from the point of view of a user, and might definitely trigger a time-out (failed request).</p>
<p>I ported the Python program to Go. It is the same algorithm, but a direct comparison is likely unfair. Still, it gives us a reference.</p>
<table>
<thead>
<tr>
<th>go version</th>
<th>system</th>
<th>max delay</th>
</tr>
</thead>
<tbody>
<tr>
<td>1.25</td>
<td>macOS (Apple M4)</td>
<td>50 ms</td>
</tr>
<tr>
<td>1.25</td>
<td>Linux (Intel Ice Lake)</td>
<td> 33 ms</td>
</tr>
</tbody>
</table>
<p>Thus Go has pauses that are several times shorter than Python, and there is no catastrophic 2-second pause.</p>
<p>Should these pauses be a concern? Most Python programs do not create so many objects in memory at the same time. Thus you are not likely to see these long pauses if you have a simple web app or a script. Python gives you a few options, such as <code>gc.set_threshold</code> and <code>gc.freeze</code> which could help you tune the behaviour.</p>
<p><a href="https://github.com/lemire/Code-used-on-Daniel-Lemire-s-blog/tree/master/2026/02/13">My source code is available</a>.</p>
<p>&nbsp;</p>
<p><strong>Video</strong></p>
<p><iframe loading="lazy" title="YouTube video player" src="https://www.youtube.com/embed/tGzeImavaFc?si=CmFcGCEeE13OlSnX" width="560" height="315" frameborder="0" allowfullscreen="allowfullscreen"></iframe></p>
]]></content:encoded>
					
					<wfw:commentRss>https://lemire.me/blog/2026/02/15/how-bad-can-python-stop-the-world-pauses-get/feed/</wfw:commentRss>
			<slash:comments>8</slash:comments>
		
		
			</item>
		<item>
		<title>AI: Igniting the Spark to End  Stagnation</title>
		<link>https://lemire.me/blog/2026/02/15/ai-igniting-the-spark-to-end-stagnation/</link>
					<comments>https://lemire.me/blog/2026/02/15/ai-igniting-the-spark-to-end-stagnation/#comments</comments>
		
		<dc:creator><![CDATA[Daniel Lemire]]></dc:creator>
		<pubDate>Sun, 15 Feb 2026 15:19:31 +0000</pubDate>
				<category><![CDATA[]]></category>
		<guid isPermaLink="false">https://lemire.me/blog/?p=22489</guid>

					<description><![CDATA[<img width="150" height="150" src="https://lemire.me/blog/wp-content/uploads/2026/02/02_AI-Adoption-Rate-by-Country-2048x1340-3069525806-150x150.png" class="webfeedsFeaturedVisual wp-post-image" alt="" style="display: block; margin-bottom: 5px; clear:both;max-width: 100%;" link_thumbnail="" decoding="async" loading="lazy" />Much of the West has been economically stagnant. Countries like Canada have failed to improve their productivity and standard of living as of late. In Canada, there has been no progress in Canadian living standards as measured by per-person GDP over the past five years. It is hard to overstate how anomalous this is: the &#8230; <a href="https://lemire.me/blog/2026/02/15/ai-igniting-the-spark-to-end-stagnation/" class="more-link">Continue reading <span class="screen-reader-text">AI: Igniting the Spark to End  Stagnation</span></a>]]></description>
										<content:encoded><![CDATA[<img width="150" height="150" src="https://lemire.me/blog/wp-content/uploads/2026/02/02_AI-Adoption-Rate-by-Country-2048x1340-3069525806-150x150.png" class="webfeedsFeaturedVisual wp-post-image" alt="" style="display: block; margin-bottom: 5px; clear:both;max-width: 100%;" link_thumbnail="" decoding="async" loading="lazy" /><p>Much of the West has been economically stagnant. Countries like Canada have failed to improve their productivity and standard of living as of late. In Canada, there has been no progress in Canadian living standards as measured by per-person GDP over the past five years. It is hard to overstate how anomalous this is: the USSR collapsed in part because it could only sustain a growth rate of about 1%, far below what the West was capable of. Canada is more stagnant than the USSR.</p>
<p>Late in 2022, some of us got access to a technical breakthrough: AI. In three years, it has become part of our lives. Nearly all students use AI to do research or write essays.</p>
<p>Dallas Fed economists projected the most credible effect that AI might have on our economies: AI should help reverse the post-2008 slowdown and deliver higher living standards in line with historical technological progress.</p>
<p>It will imply a profound, rapid but gradual transformation of our economy. There will still be teachers, accountants, and even translators in the future… but their work will change as it has changed in the past. Accountants do far less arithmetic today; that part of their work has been replaced by software. Even more of their work is about to be replaced by software, thus improving their productivity further. We will still have teachers, but all our kids, including the poorest ones, will have dedicated always-on tutors: this will not be just available in Canada or the USA, but everywhere. It is up to us to decide who is allowed to build this technology.</p>
<p>AI empowers the individual. An entrepreneur with a small team can get faster access to quality advice, copywriting, and so forth. Artists with an imagination can create more with fewer constraints.</p>
<p>I don’t have to prove these facts: they are fast becoming obvious to the whole world.</p>
<p>New jobs are created. Students of mine work as AI specialists. One of them helps build software providing AI assistance to pharmacists. One of my sons is an AI engineer. These are great jobs.</p>
<p class="article-editor-paragraph article-editor-content__has-focus">The conventional explanation for Canada’s stagnation is essentially that we have already harvested all the innovation we are ever going to get. The low-hanging fruit has been picked. Further progress has become inherently difficult because we are already so advanced; there is simply not much room left to improve. In this view, there is no need to rethink our institutions. Yet a sufficiently large breakthrough compels us to reconsider where we stand and what is still possible. It forces us to use our imagination again. It helps renew the culture.</p>
<p>We often hear claims that artificial intelligence will consume vast amounts of energy and water in the coming years. It is true that data centers, which host AI workloads along with many other computing tasks, rely on water for cooling.</p>
<p>But let’s look at the actual water numbers. In 2023, U.S. data centers directly consumed roughly 17.4 billion gallons of water—a figure that could potentially double or quadruple by 2028 as demand grows. By comparison, American golf courses use more than 500 billion gallons every year for irrigation, often in arid regions where this usage is widely criticized as wasteful. Even if data-center water demand were to grow exponentially, it would take decades to reach the scale of golf-course irrigation.</p>
<p>On the energy side, data centers are indeed taking a larger share of electricity demand. According to the International Energy Agency’s latest analysis, they consumed approximately 415 TWh in 2024—about 1.5% of global electricity consumption. This is projected to more than double to around 945 TWh by 2030 (just under 3% of global electricity). However, even this rapid growth accounts for less than 10% (roughly 8%) of the total expected increase in worldwide electricity demand through 2030. Data centers are therefore not the main driver of the much larger rise in overall energy use.</p>
<p>If we let engineers in Australia, Canada, or Argentina free to innovate, we will surely see fantastic developments.</p>
<p>You might also have heard about the possibility that ChatGPT might decide to kill us all. Nobody can predict the future, but you are surely more likely to be killed by cancer than by a rogue AI. And AI might help you with your cancer.</p>
<p>We always have a choice. Nations can try to regulate AI out of existence. We can set up new government bodies to prevent the application of AI. This will surely dampen the productivity gains and marginalize some nations economically.</p>
<p>The European Union showed it could be done. By some reports, Europeans make more money by fining American software companies than by building their own innovation enterprises. Countries like Canada have economies dominated by finance, mining and oil (with a side of Shopify).</p>
<p>If you are already well off, stopping innovation sounds good. It&#8217;s not if you are trying to get a start.</p>
<p>AI is likely to help young people who need it so much. They, more than any other group, will find it easier to occupy the new jobs, start the new businesses.</p>
<p>If you are a politician and you want to lose the vote of young people: make it difficult to use AI. It will crater your credibility.</p>
<p>It is time to renew our prosperity. It is time to create new exciting jobs.</p>
<p>&nbsp;</p>
<p>References:</p>
<p>Wynne, M. A., &amp; Derr, L. (2025, June 24). <a href="https://www.dallasfed.org/research/economics/2025/0624">Advances in AI will boost productivity, living standards over time</a>. Federal Reserve Bank of Dallas.</p>
<p>Fraser Institute. (2025, December 16). <a href="https://www.fraserinstitute.org/commentary/canadas-recent-economic-growth-performance-has-been-awful">Canada&#8217;s recent economic growth performance has been awful</a>.</p>
<p>DemandSage. (2026, January 9). <a href="https://www.demandsage.com/ai-in-education-statistics">75 AI in education statistics 2026</a> (Global trends &amp; facts).</p>
<p>MIT Technology Review. (2026, January 21). <a href="https://www.technologyreview.com/2026/01/21/1131366/rethinking-ais-future-in-an-augmented-workplace">Rethinking AI&#8217;s future in an augmented workplace</a>.</p>
<p>Davis, J. H. (2025). <a href="https://www.amazon.com/-/fr/Coming-into-View-Megatrends-Investments/dp/139435908X/">Coming into view: How AI and other megatrends will shape your investment</a>s. Wiley.</p>
<p>Choi, J. H., &amp; Xie, C. (2025, June 26). <a href="https://www.gsb.stanford.edu/insights/ai-reshaping-accounting-jobs-doing-boring-stuff">AI is reshaping accounting jobs by doing the boring stuff</a>. Stanford Graduate School of Business.</p>
<p>International Energy Agency. (n.d.). <a href="https://www.iea.org/reports/energy-and-ai/energy-demand-from-ai">Energy demand from AI</a>.</p>
<p>University of Colorado Anschutz Medical Campus. (2025, May 19). <a href="https://news.cuanschutz.edu/news-stories/real-talk-about-ai-and-advancing-cancer-treatments">Real talk about AI and advancing cancer treatments</a>.</p>
<p>International Energy Agency. (2025). <a href="https://www.iea.org/reports/global-energy-review-2025">Global energy review 2025</a>.</p>
]]></content:encoded>
					
					<wfw:commentRss>https://lemire.me/blog/2026/02/15/ai-igniting-the-spark-to-end-stagnation/feed/</wfw:commentRss>
			<slash:comments>4</slash:comments>
		
		
			</item>
		<item>
		<title>The cost of a function call</title>
		<link>https://lemire.me/blog/2026/02/08/the-cost-of-a-function-call/</link>
					<comments>https://lemire.me/blog/2026/02/08/the-cost-of-a-function-call/#comments</comments>
		
		<dc:creator><![CDATA[Daniel Lemire]]></dc:creator>
		<pubDate>Sun, 08 Feb 2026 20:11:32 +0000</pubDate>
				<category><![CDATA[]]></category>
		<guid isPermaLink="false">https://lemire.me/blog/?p=22485</guid>

					<description><![CDATA[<img width="150" height="150" src="https://lemire.me/blog/wp-content/uploads/2026/02/Capture-decran-le-2026-02-08-a-15.07.17-150x150.png" class="webfeedsFeaturedVisual wp-post-image" alt="" style="display: block; margin-bottom: 5px; clear:both;max-width: 100%;" link_thumbnail="" decoding="async" loading="lazy" />When programming, we chain functions together. Function A calls function B. And so forth. You do not have to program this way, you could write an entire program using a single function. It would be a fun exercise to write a non-trivial program using a single function&#8230; as long as you delegate the code writing &#8230; <a href="https://lemire.me/blog/2026/02/08/the-cost-of-a-function-call/" class="more-link">Continue reading <span class="screen-reader-text">The cost of a function call</span></a>]]></description>
										<content:encoded><![CDATA[<img width="150" height="150" src="https://lemire.me/blog/wp-content/uploads/2026/02/Capture-decran-le-2026-02-08-a-15.07.17-150x150.png" class="webfeedsFeaturedVisual wp-post-image" alt="" style="display: block; margin-bottom: 5px; clear:both;max-width: 100%;" link_thumbnail="" decoding="async" loading="lazy" /><p>When programming, we chain functions together. Function A calls function B. And so forth.</p>
<p>You do not have to program this way, you could write an entire program using a single function. It would be a fun exercise to write a non-trivial program using a single function&#8230; as long as you delegate the code writing to AI because human beings quickly struggle with long functions.</p>
<p>A key compiler optimization is &#8216;inlining&#8217;: the compiler takes your function definition and it tries to substitute it at the call location. It is conceptually quite simple. Consider the following example where the function <code>add3</code> calls the function <code>add</code>.</p>
<div class="highlight" style="background: #f8f8f8;">
<pre style="line-height: 125%;"><code><span style="color: #204a87; font-weight: bold;">int</span> <span style="color: #000;">add</span><span style="color: #000; font-weight: bold;">(</span><span style="color: #204a87; font-weight: bold;">int</span> <span style="color: #000;">x</span><span style="color: #000; font-weight: bold;">,</span> <span style="color: #204a87; font-weight: bold;">int</span> <span style="color: #000;">y</span><span style="color: #000; font-weight: bold;">)</span> <span style="color: #000; font-weight: bold;">{</span>
    <span style="color: #204a87; font-weight: bold;">return</span> <span style="color: #000;">x</span> <span style="color: #ce5c00; font-weight: bold;">+</span> <span style="color: #000;">y</span><span style="color: #000; font-weight: bold;">;</span>
<span style="color: #000; font-weight: bold;">}</span>

<span style="color: #204a87; font-weight: bold;">int</span> <span style="color: #000;">add3</span><span style="color: #000; font-weight: bold;">(</span><span style="color: #204a87; font-weight: bold;">int</span> <span style="color: #000;">x</span><span style="color: #000; font-weight: bold;">,</span> <span style="color: #204a87; font-weight: bold;">int</span> <span style="color: #000;">y</span><span style="color: #000; font-weight: bold;">,</span> <span style="color: #204a87; font-weight: bold;">int</span> <span style="color: #000;">z</span><span style="color: #000; font-weight: bold;">)</span> <span style="color: #000; font-weight: bold;">{</span>
    <span style="color: #204a87; font-weight: bold;">return</span> <span style="color: #000;">add</span><span style="color: #000; font-weight: bold;">(</span><span style="color: #000;">add</span><span style="color: #000; font-weight: bold;">(</span><span style="color: #000;">x</span><span style="color: #000; font-weight: bold;">,</span> <span style="color: #000;">y</span><span style="color: #000; font-weight: bold;">),</span> <span style="color: #000;">z</span><span style="color: #000; font-weight: bold;">);</span>
<span style="color: #000; font-weight: bold;">}</span>
</code></pre>
</div>
<p>You can manually inline the call as follows.</p>
<div class="highlight" style="background: #f8f8f8;">
<pre style="line-height: 125%;"><code><span style="color: #204a87; font-weight: bold;">int</span> <span style="color: #000;">add3</span><span style="color: #000; font-weight: bold;">(</span><span style="color: #204a87; font-weight: bold;">int</span> <span style="color: #000;">x</span><span style="color: #000; font-weight: bold;">,</span> <span style="color: #204a87; font-weight: bold;">int</span> <span style="color: #000;">y</span><span style="color: #000; font-weight: bold;">,</span> <span style="color: #204a87; font-weight: bold;">int</span> <span style="color: #000;">z</span><span style="color: #000; font-weight: bold;">)</span> <span style="color: #000; font-weight: bold;">{</span>
    <span style="color: #204a87; font-weight: bold;">return</span> <span style="color: #000;">x</span> <span style="color: #ce5c00; font-weight: bold;">+</span> <span style="color: #000;">y</span> <span style="color: #ce5c00; font-weight: bold;">+</span> <span style="color: #000;">z</span><span style="color: #000; font-weight: bold;">;</span>
<span style="color: #000; font-weight: bold;">}</span>
</code></pre>
</div>
<p>A function call is reasonably cheap performance-wise, but not free. If the function takes non-trivial parameters, you might need to save and restore them on the stack, so you get extra loads and stores. You need to jump into the function, and then jump out at the end. And depending on the function call convention on your system, and the type of instructions you are using, there are extra instructions at the beginning and at the end.</p>
<p>If a function is sufficiently simple, such as my <code>add</code> function, it should always be inlined when performance is critical. Let us examine a concrete example. Let me sum the integers in an array.</p>
<div class="highlight" style="background: #f8f8f8;">
<pre style="line-height: 125%;"><code><span style="color: #204a87; font-weight: bold;">for</span> <span style="color: #000; font-weight: bold;">(</span><span style="color: #204a87; font-weight: bold;">int</span> <span style="color: #000;">x</span> <span style="color: #ce5c00; font-weight: bold;">:</span> <span style="color: #000;">numbers</span><span style="color: #000; font-weight: bold;">)</span> <span style="color: #000; font-weight: bold;">{</span>
  <span style="color: #000;">sum</span> <span style="color: #ce5c00; font-weight: bold;">=</span> <span style="color: #000;">add</span><span style="color: #000; font-weight: bold;">(</span><span style="color: #000;">sum</span><span style="color: #000; font-weight: bold;">,</span> <span style="color: #000;">x</span><span style="color: #000; font-weight: bold;">);</span>
<span style="color: #000; font-weight: bold;">}</span>
</code></pre>
</div>
<p>I am using my MacBook (M4 processor with LLVM).</p>
<table>
<thead>
<tr>
<th>function</th>
<th>ns/int</th>
</tr>
</thead>
<tbody>
<tr>
<td>regular</td>
<td>0.7</td>
</tr>
<tr>
<td>inline</td>
<td>0.03</td>
</tr>
</tbody>
</table>
<p>Wow. The inline version is over 20 times faster.</p>
<p>Let us try to see what is happening. The call site of the &#8216;add&#8217; function is just a straight loop with a call to the function.</p>
<div class="highlight" style="background: #f8f8f8;">
<pre style="line-height: 125%;"><code><span style="color: #000;">ldr</span>    <span style="color: #000;">w1</span><span style="color: #000; font-weight: bold;">,</span> <span style="color: #000; font-weight: bold;">[</span><span style="color: #000;">x19</span><span style="color: #000; font-weight: bold;">],</span> <span style="color: #0000cf; font-weight: bold;">#0</span><span style="color: #000;">x4</span>
<span style="color: #000;">bl</span>     <span style="color: #0000cf; font-weight: bold;">0x100021740</span>    <span style="color: #8f5902; font-style: italic;">; add(int, int)</span>
<span style="color: #000;">cmp</span>    <span style="color: #000;">x19</span><span style="color: #000; font-weight: bold;">,</span> <span style="color: #000;">x20</span>
<span style="color: #000;">b.ne</span>   <span style="color: #0000cf; font-weight: bold;">0x100001368</span>    <span style="color: #8f5902; font-style: italic;">; &lt;+28&gt;</span>
</code></pre>
</div>
<p>The function itself is as cheap as it can be: just two instructions.</p>
<div class="highlight" style="background: #f8f8f8;">
<pre style="line-height: 125%;"><code><span style="color: #000;">add</span>    <span style="color: #000;">w0</span><span style="color: #000; font-weight: bold;">,</span> <span style="color: #000;">w1</span><span style="color: #000; font-weight: bold;">,</span> <span style="color: #000;">w0</span>
<span style="color: #000;">ret</span>
</code></pre>
</div>
<p>So, we spend 6 instructions for each addition. It takes about 3 cycles per addition.</p>
<p>What about the inline function?</p>
<div class="highlight" style="background: #f8f8f8;">
<pre style="line-height: 125%;"><code><span style="color: #000;">ldp</span>    <span style="color: #000;">q4</span><span style="color: #000; font-weight: bold;">,</span> <span style="color: #000;">q5</span><span style="color: #000; font-weight: bold;">,</span> <span style="color: #000; font-weight: bold;">[</span><span style="color: #000;">x12</span><span style="color: #000; font-weight: bold;">,</span> <span style="color: #0000cf; font-weight: bold;">#-0</span><span style="color: #000;">x20</span><span style="color: #000; font-weight: bold;">]</span>
<span style="color: #000;">ldp</span>    <span style="color: #000;">q6</span><span style="color: #000; font-weight: bold;">,</span> <span style="color: #000;">q7</span><span style="color: #000; font-weight: bold;">,</span> <span style="color: #000; font-weight: bold;">[</span><span style="color: #000;">x12</span><span style="color: #000; font-weight: bold;">],</span> <span style="color: #0000cf; font-weight: bold;">#0</span><span style="color: #000;">x40</span>
<span style="color: #000;">add.4s</span> <span style="color: #000;">v0</span><span style="color: #000; font-weight: bold;">,</span> <span style="color: #000;">v4</span><span style="color: #000; font-weight: bold;">,</span> <span style="color: #000;">v0</span>
<span style="color: #000;">add.4s</span> <span style="color: #000;">v1</span><span style="color: #000; font-weight: bold;">,</span> <span style="color: #000;">v5</span><span style="color: #000; font-weight: bold;">,</span> <span style="color: #000;">v1</span>
<span style="color: #000;">add.4s</span> <span style="color: #000;">v2</span><span style="color: #000; font-weight: bold;">,</span> <span style="color: #000;">v6</span><span style="color: #000; font-weight: bold;">,</span> <span style="color: #000;">v2</span>
<span style="color: #000;">add.4s</span> <span style="color: #000;">v3</span><span style="color: #000; font-weight: bold;">,</span> <span style="color: #000;">v7</span><span style="color: #000; font-weight: bold;">,</span> <span style="color: #000;">v3</span>
<span style="color: #000;">subs</span>   <span style="color: #000;">x13</span><span style="color: #000; font-weight: bold;">,</span> <span style="color: #000;">x13</span><span style="color: #000; font-weight: bold;">,</span> <span style="color: #0000cf; font-weight: bold;">#0</span><span style="color: #000;">x10</span>
<span style="color: #000;">b.ne</span>   <span style="color: #0000cf; font-weight: bold;">0x1000013fc</span>    <span style="color: #8f5902; font-style: italic;">; &lt;+104&gt;</span>
</code></pre>
</div>
<p>It is entirely different. The compiler has converted the addition to advanced (SIMD) instructions processing blocks of 16 integers using 8 instructions. So we are down to half an instruction per integer (from 6 instructions). So we use 12 times fewer instructions. On top of having fewer instructions, the processor is able to retire more instructions per cycle, for a massive performance boost.</p>
<p>What if we prevented the compiler from using these fancy instructions while still inlining? We still get a significant performance boost (about 10x faster).</p>
<table>
<thead>
<tr>
<th>function</th>
<th>ns/int</th>
</tr>
</thead>
<tbody>
<tr>
<td>regular</td>
<td>0.7</td>
</tr>
<tr>
<td>inline</td>
<td>0.03</td>
</tr>
<tr>
<td>inline (no SIMD)</td>
<td>0.07</td>
</tr>
</tbody>
</table>
<p>Ok. But the <code>add</code> function is a bit extreme. We know it should always be inlined. What about something less trivial like a function that counts the number of spaces in a string.</p>
<div class="highlight" style="background: #f8f8f8;">
<pre style="line-height: 125%;"><code><span style="color: #204a87; font-weight: bold;">size_t</span> <span style="color: #000;">count_spaces</span><span style="color: #000; font-weight: bold;">(</span><span style="color: #000;">std</span><span style="color: #ce5c00; font-weight: bold;">::</span><span style="color: #000;">string_view</span> <span style="color: #000;">sv</span><span style="color: #000; font-weight: bold;">)</span> <span style="color: #000; font-weight: bold;">{</span>
    <span style="color: #204a87; font-weight: bold;">size_t</span> <span style="color: #000;">count</span> <span style="color: #ce5c00; font-weight: bold;">=</span> <span style="color: #0000cf; font-weight: bold;">0</span><span style="color: #000; font-weight: bold;">;</span>
    <span style="color: #204a87; font-weight: bold;">for</span> <span style="color: #000; font-weight: bold;">(</span><span style="color: #204a87; font-weight: bold;">char</span> <span style="color: #000;">c</span> <span style="color: #ce5c00; font-weight: bold;">:</span> <span style="color: #000;">sv</span><span style="color: #000; font-weight: bold;">)</span> <span style="color: #000; font-weight: bold;">{</span>
        <span style="color: #204a87; font-weight: bold;">if</span> <span style="color: #000; font-weight: bold;">(</span><span style="color: #000;">c</span> <span style="color: #ce5c00; font-weight: bold;">==</span> <span style="color: #4e9a06;">' '</span><span style="color: #000; font-weight: bold;">)</span> <span style="color: #ce5c00; font-weight: bold;">++</span><span style="color: #000;">count</span><span style="color: #000; font-weight: bold;">;</span>
    <span style="color: #000; font-weight: bold;">}</span>
    <span style="color: #204a87; font-weight: bold;">return</span> <span style="color: #000;">count</span><span style="color: #000; font-weight: bold;">;</span>
<span style="color: #000; font-weight: bold;">}</span>
</code></pre>
</div>
<p>If the string is reasonably long, then the overhead of the function call should be negligible.<br />
Let us pass a string of 1000 characters.</p>
<table>
<thead>
<tr>
<th>function</th>
<th>ns/string</th>
</tr>
</thead>
<tbody>
<tr>
<td>regular</td>
<td>111</td>
</tr>
<tr>
<td>inline</td>
<td>115</td>
</tr>
</tbody>
</table>
<p>The inline version is not only not faster, but it is even slightly slower. I am not sure why.</p>
<p>What if I use short strings (say between 0 and 6 characters)? Then the inline function is measurably faster.</p>
<table>
<thead>
<tr>
<th>function</th>
<th>ns/string</th>
</tr>
</thead>
<tbody>
<tr>
<td>regular</td>
<td>1.6</td>
</tr>
<tr>
<td>inline</td>
<td>1.0</td>
</tr>
</tbody>
</table>
<p>Takeaways:</p>
<ol>
<li>Short and simple functions should be inlined when possible if performance is a concern. The benefits can be impressive.</li>
<li>For functions that can be fast or slow, the decision as to whether to inline or not depends on the input. For string processing functions, the size of the string may determine whether inlining is necessary for best performance.</li>
</ol>
<p><em>Note</em>: <a href="https://github.com/lemire/Code-used-on-Daniel-Lemire-s-blog/tree/master/2026/02/08/benchmark">My source code is available</a>.</p>
]]></content:encoded>
					
					<wfw:commentRss>https://lemire.me/blog/2026/02/08/the-cost-of-a-function-call/feed/</wfw:commentRss>
			<slash:comments>3</slash:comments>
		
		
			</item>
		<item>
		<title>Converting data to hexadecimal outputs quickly</title>
		<link>https://lemire.me/blog/2026/02/02/converting-data-to-hexadecimal-outputs-quickly/</link>
					<comments>https://lemire.me/blog/2026/02/02/converting-data-to-hexadecimal-outputs-quickly/#comments</comments>
		
		<dc:creator><![CDATA[Daniel Lemire]]></dc:creator>
		<pubDate>Mon, 02 Feb 2026 15:52:27 +0000</pubDate>
				<category><![CDATA[]]></category>
		<guid isPermaLink="false">https://lemire.me/blog/?p=22476</guid>

					<description><![CDATA[<img width="150" height="150" src="https://lemire.me/blog/wp-content/uploads/2026/02/Capture-decran-le-2026-02-02-a-10.48.59-150x150.png" class="webfeedsFeaturedVisual wp-post-image" alt="" style="display: block; margin-bottom: 5px; clear:both;max-width: 100%;" link_thumbnail="" decoding="async" loading="lazy" />Given any string of bytes, you can convert it to an hexadecimal string by mapping the least significant and the most significant 4 bits of byte to characters in 01...9A...F. There are more efficient techniques like base64, that map 3 bytes to 4 characters. However, hexadecimal outputs are easier to understand and often sufficiently concise. &#8230; <a href="https://lemire.me/blog/2026/02/02/converting-data-to-hexadecimal-outputs-quickly/" class="more-link">Continue reading <span class="screen-reader-text">Converting data to hexadecimal outputs quickly</span></a>]]></description>
										<content:encoded><![CDATA[<img width="150" height="150" src="https://lemire.me/blog/wp-content/uploads/2026/02/Capture-decran-le-2026-02-02-a-10.48.59-150x150.png" class="webfeedsFeaturedVisual wp-post-image" alt="" style="display: block; margin-bottom: 5px; clear:both;max-width: 100%;" link_thumbnail="" decoding="async" loading="lazy" /><p>Given any string of bytes, you can convert it to an hexadecimal string by mapping the least significant and the most significant 4 bits of byte to characters in <code>01...9A...F</code>. There are more efficient techniques like base64, that map 3 bytes to 4 characters. However, hexadecimal outputs are easier to understand and often sufficiently concise.</p>
<p>A simple function to do the conversion using a short lookup table is as follows:</p>
<div class="highlight" style="background: #f8f8f8;">
<pre style="line-height: 125%;"><code><span style="color: #204a87; font-weight: bold;">static</span> <span style="color: #204a87; font-weight: bold;">const</span> <span style="color: #204a87; font-weight: bold;">char</span> <span style="color: #000;">hex</span><span style="color: #000; font-weight: bold;">[]</span> <span style="color: #ce5c00; font-weight: bold;">=</span> <span style="color: #4e9a06;">"0123456789abcdef"</span><span style="color: #000; font-weight: bold;">;</span>
<span style="color: #204a87; font-weight: bold;">for</span> <span style="color: #000; font-weight: bold;">(</span><span style="color: #204a87; font-weight: bold;">size_t</span> <span style="color: #000;">i</span> <span style="color: #ce5c00; font-weight: bold;">=</span> <span style="color: #0000cf; font-weight: bold;">0</span><span style="color: #000; font-weight: bold;">,</span> <span style="color: #000;">k</span> <span style="color: #ce5c00; font-weight: bold;">=</span> <span style="color: #0000cf; font-weight: bold;">0</span><span style="color: #000; font-weight: bold;">;</span> <span style="color: #000;">k</span> <span style="color: #ce5c00; font-weight: bold;">&lt;</span> <span style="color: #000;">dlen</span><span style="color: #000; font-weight: bold;">;</span> <span style="color: #000;">i</span> <span style="color: #ce5c00; font-weight: bold;">+=</span> <span style="color: #0000cf; font-weight: bold;">1</span><span style="color: #000; font-weight: bold;">,</span> <span style="color: #000;">k</span> <span style="color: #ce5c00; font-weight: bold;">+=</span> <span style="color: #0000cf; font-weight: bold;">2</span><span style="color: #000; font-weight: bold;">)</span> <span style="color: #000; font-weight: bold;">{</span>
    <span style="color: #204a87; font-weight: bold;">uint8_t</span> <span style="color: #000;">val</span> <span style="color: #ce5c00; font-weight: bold;">=</span> <span style="color: #000;">src</span><span style="color: #000; font-weight: bold;">[</span><span style="color: #000;">i</span><span style="color: #000; font-weight: bold;">];</span>
    <span style="color: #000;">dst</span><span style="color: #000; font-weight: bold;">[</span><span style="color: #000;">k</span> <span style="color: #ce5c00; font-weight: bold;">+</span> <span style="color: #0000cf; font-weight: bold;">0</span><span style="color: #000; font-weight: bold;">]</span> <span style="color: #ce5c00; font-weight: bold;">=</span> <span style="color: #000;">hex</span><span style="color: #000; font-weight: bold;">[</span><span style="color: #000;">val</span> <span style="color: #ce5c00; font-weight: bold;">&gt;&gt;</span> <span style="color: #0000cf; font-weight: bold;">4</span><span style="color: #000; font-weight: bold;">];</span>
    <span style="color: #000;">dst</span><span style="color: #000; font-weight: bold;">[</span><span style="color: #000;">k</span> <span style="color: #ce5c00; font-weight: bold;">+</span> <span style="color: #0000cf; font-weight: bold;">1</span><span style="color: #000; font-weight: bold;">]</span> <span style="color: #ce5c00; font-weight: bold;">=</span> <span style="color: #000;">hex</span><span style="color: #000; font-weight: bold;">[</span><span style="color: #000;">val</span> <span style="color: #ce5c00; font-weight: bold;">&amp;</span> <span style="color: #0000cf; font-weight: bold;">15</span><span style="color: #000; font-weight: bold;">];</span>
<span style="color: #000; font-weight: bold;">}</span>
</code></pre>
</div>
<p>This code snippet implements a straightforward byte-to-hexadecimal string conversion loop in C++. It iterates over an input byte array (<code>src</code>), processing one byte at a time using index <code>i</code>, while simultaneously building the output string in <code>dst</code> with index <code>k</code> that advances twice as fast (by 2) since each input byte produces two hexadecimal characters. For each byte, it extracts the value as an unsigned 8-bit integer (<code>val</code>), then isolates the high 4 bits (via right shift by 4) and low 4 bits (via bitwise AND with 15) to index into a static lookup table (<code>hex</code>) containing the characters &#8216;0&#8217; through &#8216;9&#8217; and &#8216;a&#8217; through &#8216;f&#8217;. The loop continues until <code>k</code> reaches the expected output length (<code>dlen</code>), which should be twice the input length, ensuring all bytes are converted without bounds errors.</p>
<p>This lookup table approach is used in the popular Node.js JavaScript runtime. Skovoroda recently proposed to replace this lookup table approach with an arithmetic version.</p>
<div class="highlight" style="background: #f8f8f8;">
<pre style="line-height: 125%;"><code><span style="color: #204a87; font-weight: bold;">char</span> <span style="color: #000;">nibble</span><span style="color: #000; font-weight: bold;">(</span><span style="color: #204a87; font-weight: bold;">uint8_t</span> <span style="color: #000;">x</span><span style="color: #000; font-weight: bold;">)</span> <span style="color: #000; font-weight: bold;">{</span> <span style="color: #204a87; font-weight: bold;">return</span> <span style="color: #000;">x</span> <span style="color: #ce5c00; font-weight: bold;">+</span> <span style="color: #4e9a06;">'0'</span> <span style="color: #ce5c00; font-weight: bold;">+</span> <span style="color: #000; font-weight: bold;">((</span><span style="color: #000;">x</span> <span style="color: #ce5c00; font-weight: bold;">&gt;</span> <span style="color: #0000cf; font-weight: bold;">9</span><span style="color: #000; font-weight: bold;">)</span> <span style="color: #ce5c00; font-weight: bold;">*</span> <span style="color: #0000cf; font-weight: bold;">39</span><span style="color: #000; font-weight: bold;">);</span> <span style="color: #000; font-weight: bold;">}</span>
<span style="color: #204a87; font-weight: bold;">for</span> <span style="color: #000; font-weight: bold;">(</span><span style="color: #204a87; font-weight: bold;">size_t</span> <span style="color: #000;">i</span> <span style="color: #ce5c00; font-weight: bold;">=</span> <span style="color: #0000cf; font-weight: bold;">0</span><span style="color: #000; font-weight: bold;">,</span> <span style="color: #000;">k</span> <span style="color: #ce5c00; font-weight: bold;">=</span> <span style="color: #0000cf; font-weight: bold;">0</span><span style="color: #000; font-weight: bold;">;</span> <span style="color: #000;">k</span> <span style="color: #ce5c00; font-weight: bold;">&lt;</span> <span style="color: #000;">dlen</span><span style="color: #000; font-weight: bold;">;</span> <span style="color: #000;">i</span> <span style="color: #ce5c00; font-weight: bold;">+=</span> <span style="color: #0000cf; font-weight: bold;">1</span><span style="color: #000; font-weight: bold;">,</span> <span style="color: #000;">k</span> <span style="color: #ce5c00; font-weight: bold;">+=</span> <span style="color: #0000cf; font-weight: bold;">2</span><span style="color: #000; font-weight: bold;">)</span> <span style="color: #000; font-weight: bold;">{</span>
    <span style="color: #204a87; font-weight: bold;">uint8_t</span> <span style="color: #000;">val</span> <span style="color: #ce5c00; font-weight: bold;">=</span> <span style="color: #000;">src</span><span style="color: #000; font-weight: bold;">[</span><span style="color: #000;">i</span><span style="color: #000; font-weight: bold;">];</span>
    <span style="color: #000;">dst</span><span style="color: #000; font-weight: bold;">[</span><span style="color: #000;">k</span> <span style="color: #ce5c00; font-weight: bold;">+</span> <span style="color: #0000cf; font-weight: bold;">0</span><span style="color: #000; font-weight: bold;">]</span> <span style="color: #ce5c00; font-weight: bold;">=</span> <span style="color: #000;">nibble</span><span style="color: #000; font-weight: bold;">(</span><span style="color: #000;">val</span> <span style="color: #ce5c00; font-weight: bold;">&gt;&gt;</span> <span style="color: #0000cf; font-weight: bold;">4</span><span style="color: #000; font-weight: bold;">);</span>
    <span style="color: #000;">dst</span><span style="color: #000; font-weight: bold;">[</span><span style="color: #000;">k</span> <span style="color: #ce5c00; font-weight: bold;">+</span> <span style="color: #0000cf; font-weight: bold;">1</span><span style="color: #000; font-weight: bold;">]</span> <span style="color: #ce5c00; font-weight: bold;">=</span> <span style="color: #000;">nibble</span><span style="color: #000; font-weight: bold;">(</span><span style="color: #000;">val</span> <span style="color: #ce5c00; font-weight: bold;">&amp;</span> <span style="color: #0000cf; font-weight: bold;">15</span><span style="color: #000; font-weight: bold;">);</span>
<span style="color: #000; font-weight: bold;">}</span>
</code></pre>
</div>
<p>Surprisingly maybe, this approach is much faster and uses far fewer instructions. At first glance, this result might be puzzling. A table lookup is cheap, the new <code>nibble</code> function seemingly does more work.</p>
<p>The trick that Skovoroda relies upon is that compilers are smart: they will &#8216;autovectorize&#8217; such number crunching functions (if you are lucky). That is, instead of using regular instructions that process byte values, the will SIMD instructions that process 16 bytes at once or more.</p>
<p>Of course, instead of relying on the compiler, you can manually invoke SIMD instructions through SIMD instrinsic functions. Let us assume that you have an ARM processors (e.g., on Apple Silicon). Then you can process blocks of 32 bytes as follows.</p>
<div class="highlight" style="background: #f8f8f8;">
<pre style="line-height: 125%;"><code><span style="color: #204a87; font-weight: bold;">size_t</span> <span style="color: #000;">maxv</span> <span style="color: #ce5c00; font-weight: bold;">=</span> <span style="color: #000; font-weight: bold;">(</span><span style="color: #000;">slen</span> <span style="color: #ce5c00; font-weight: bold;">-</span> <span style="color: #000; font-weight: bold;">(</span><span style="color: #000;">slen</span><span style="color: #ce5c00; font-weight: bold;">%</span><span style="color: #0000cf; font-weight: bold;">32</span><span style="color: #000; font-weight: bold;">));</span>
<span style="color: #204a87; font-weight: bold;">for</span> <span style="color: #000; font-weight: bold;">(;</span> <span style="color: #000;">i</span> <span style="color: #ce5c00; font-weight: bold;">&lt;</span> <span style="color: #000;">maxv</span><span style="color: #000; font-weight: bold;">;</span> <span style="color: #000;">i</span> <span style="color: #ce5c00; font-weight: bold;">+=</span> <span style="color: #0000cf; font-weight: bold;">32</span><span style="color: #000; font-weight: bold;">)</span> <span style="color: #000; font-weight: bold;">{</span>
    <span style="color: #000;">uint8x16_t</span> <span style="color: #000;">val1</span> <span style="color: #ce5c00; font-weight: bold;">=</span> <span style="color: #000;">vld1q_u8</span><span style="color: #000; font-weight: bold;">((</span><span style="color: #204a87; font-weight: bold;">uint8_t</span><span style="color: #ce5c00; font-weight: bold;">*</span><span style="color: #000; font-weight: bold;">)</span><span style="color: #000;">src</span> <span style="color: #ce5c00; font-weight: bold;">+</span> <span style="color: #000;">i</span><span style="color: #000; font-weight: bold;">);</span>
    <span style="color: #000;">uint8x16_t</span> <span style="color: #000;">val2</span> <span style="color: #ce5c00; font-weight: bold;">=</span> <span style="color: #000;">vld1q_u8</span><span style="color: #000; font-weight: bold;">((</span><span style="color: #204a87; font-weight: bold;">uint8_t</span><span style="color: #ce5c00; font-weight: bold;">*</span><span style="color: #000; font-weight: bold;">)</span><span style="color: #000;">src</span> <span style="color: #ce5c00; font-weight: bold;">+</span> <span style="color: #000;">i</span> <span style="color: #ce5c00; font-weight: bold;">+</span> <span style="color: #0000cf; font-weight: bold;">16</span><span style="color: #000; font-weight: bold;">);</span>
    <span style="color: #000;">uint8x16_t</span> <span style="color: #000;">high1</span> <span style="color: #ce5c00; font-weight: bold;">=</span> <span style="color: #000;">vshrq_n_u8</span><span style="color: #000; font-weight: bold;">(</span><span style="color: #000;">val1</span><span style="color: #000; font-weight: bold;">,</span> <span style="color: #0000cf; font-weight: bold;">4</span><span style="color: #000; font-weight: bold;">);</span>
    <span style="color: #000;">uint8x16_t</span> <span style="color: #000;">low1</span> <span style="color: #ce5c00; font-weight: bold;">=</span> <span style="color: #000;">vandq_u8</span><span style="color: #000; font-weight: bold;">(</span><span style="color: #000;">val1</span><span style="color: #000; font-weight: bold;">,</span> <span style="color: #000;">vdupq_n_u8</span><span style="color: #000; font-weight: bold;">(</span><span style="color: #0000cf; font-weight: bold;">15</span><span style="color: #000; font-weight: bold;">));</span>
    <span style="color: #000;">uint8x16_t</span> <span style="color: #000;">high2</span> <span style="color: #ce5c00; font-weight: bold;">=</span> <span style="color: #000;">vshrq_n_u8</span><span style="color: #000; font-weight: bold;">(</span><span style="color: #000;">val2</span><span style="color: #000; font-weight: bold;">,</span> <span style="color: #0000cf; font-weight: bold;">4</span><span style="color: #000; font-weight: bold;">);</span>
    <span style="color: #000;">uint8x16_t</span> <span style="color: #000;">low2</span> <span style="color: #ce5c00; font-weight: bold;">=</span> <span style="color: #000;">vandq_u8</span><span style="color: #000; font-weight: bold;">(</span><span style="color: #000;">val2</span><span style="color: #000; font-weight: bold;">,</span> <span style="color: #000;">vdupq_n_u8</span><span style="color: #000; font-weight: bold;">(</span><span style="color: #0000cf; font-weight: bold;">15</span><span style="color: #000; font-weight: bold;">));</span>
    <span style="color: #000;">uint8x16_t</span> <span style="color: #000;">high_chars1</span> <span style="color: #ce5c00; font-weight: bold;">=</span> <span style="color: #000;">vqtbl1q_u8</span><span style="color: #000; font-weight: bold;">(</span><span style="color: #000;">table</span><span style="color: #000; font-weight: bold;">,</span> <span style="color: #000;">high1</span><span style="color: #000; font-weight: bold;">);</span>
    <span style="color: #000;">uint8x16_t</span> <span style="color: #000;">low_chars1</span> <span style="color: #ce5c00; font-weight: bold;">=</span> <span style="color: #000;">vqtbl1q_u8</span><span style="color: #000; font-weight: bold;">(</span><span style="color: #000;">table</span><span style="color: #000; font-weight: bold;">,</span> <span style="color: #000;">low1</span><span style="color: #000; font-weight: bold;">);</span>
    <span style="color: #000;">uint8x16_t</span> <span style="color: #000;">high_chars2</span> <span style="color: #ce5c00; font-weight: bold;">=</span> <span style="color: #000;">vqtbl1q_u8</span><span style="color: #000; font-weight: bold;">(</span><span style="color: #000;">table</span><span style="color: #000; font-weight: bold;">,</span> <span style="color: #000;">high2</span><span style="color: #000; font-weight: bold;">);</span>
    <span style="color: #000;">uint8x16_t</span> <span style="color: #000;">low_chars2</span> <span style="color: #ce5c00; font-weight: bold;">=</span> <span style="color: #000;">vqtbl1q_u8</span><span style="color: #000; font-weight: bold;">(</span><span style="color: #000;">table</span><span style="color: #000; font-weight: bold;">,</span> <span style="color: #000;">low2</span><span style="color: #000; font-weight: bold;">);</span>
    <span style="color: #000;">uint8x16x2_t</span> <span style="color: #000;">zipped1</span> <span style="color: #ce5c00; font-weight: bold;">=</span> <span style="color: #000; font-weight: bold;">{</span><span style="color: #000;">high_chars1</span><span style="color: #000; font-weight: bold;">,</span> <span style="color: #000;">low_chars1</span><span style="color: #000; font-weight: bold;">};</span>
    <span style="color: #000;">uint8x16x2_t</span> <span style="color: #000;">zipped2</span> <span style="color: #ce5c00; font-weight: bold;">=</span> <span style="color: #000; font-weight: bold;">{</span><span style="color: #000;">high_chars2</span><span style="color: #000; font-weight: bold;">,</span> <span style="color: #000;">low_chars2</span><span style="color: #000; font-weight: bold;">};</span>
    <span style="color: #000;">vst2q_u8</span><span style="color: #000; font-weight: bold;">((</span><span style="color: #204a87; font-weight: bold;">uint8_t</span><span style="color: #ce5c00; font-weight: bold;">*</span><span style="color: #000; font-weight: bold;">)</span><span style="color: #000;">dst</span> <span style="color: #ce5c00; font-weight: bold;">+</span> <span style="color: #000;">i</span><span style="color: #ce5c00; font-weight: bold;">*</span><span style="color: #0000cf; font-weight: bold;">2</span><span style="color: #000; font-weight: bold;">,</span> <span style="color: #000;">zipped1</span><span style="color: #000; font-weight: bold;">);</span>
    <span style="color: #000;">vst2q_u8</span><span style="color: #000; font-weight: bold;">((</span><span style="color: #204a87; font-weight: bold;">uint8_t</span><span style="color: #ce5c00; font-weight: bold;">*</span><span style="color: #000; font-weight: bold;">)</span><span style="color: #000;">dst</span> <span style="color: #ce5c00; font-weight: bold;">+</span> <span style="color: #000;">i</span><span style="color: #ce5c00; font-weight: bold;">*</span><span style="color: #0000cf; font-weight: bold;">2</span> <span style="color: #ce5c00; font-weight: bold;">+</span> <span style="color: #0000cf; font-weight: bold;">32</span><span style="color: #000; font-weight: bold;">,</span> <span style="color: #000;">zipped2</span><span style="color: #000; font-weight: bold;">);</span>
<span style="color: #000; font-weight: bold;">}</span>
</code></pre>
</div>
<p>This SIMD code leverages ARM NEON intrinsics to accelerate hexadecimal encoding by processing 32 input bytes simultaneously. It begins by loading two 16-byte vectors (<code>val1</code> and <code>val2</code>) from the source array using <code>vld1q_u8</code>. For each vector, it extracts the high nibbles (via right shift by 4 with <code>vshrq_n_u8</code>) and low nibbles (via bitwise AND with 15 using <code>vandq_u8</code> and <code>vdupq_n_u8</code>). The nibbles are then used as indices into a pre-loaded hex table via <code>vqtbl1q_u8</code> to fetch the corresponding ASCII characters. The high and low character vectors are interleaved using <code>vzipq_u8</code>, producing two output vectors per input pair. Finally, the results are stored back to the destination array with <code>vst1q_u8</code>, ensuring efficient memory operations.</p>
<p>You could do similar work on other systems like x64. The same code with AVX-512 for recent Intel and AMD processors would probably be insanely efficient.</p>
<p>Benchmarking these implementations on a dataset of 10,000 random bytes reveals significant performance differences. The basic lookup table version achieves around 3 GB/s, while the arithmetic version, benefiting from compiler autovectorization, reaches 23 GB/s. The manual SIMD NEON versions push performance further: I reach 42 GB/s in my tests.</p>
<table>
<thead>
<tr>
<th>method</th>
<th>speed</th>
<th>instructions per byte</th>
</tr>
</thead>
<tbody>
<tr>
<td>table</td>
<td>3.1 GB/s</td>
<td>9</td>
</tr>
<tr>
<td>Skovoroda</td>
<td>23 GB/s</td>
<td>0.75</td>
</tr>
<tr>
<td>intrinsics</td>
<td>42 GB/s</td>
<td>0.69</td>
</tr>
</tbody>
</table>
<p>One lesson is that intuition can be a poor guide when trying to assess performance.</p>
<p><a href="https://github.com/lemire/Code-used-on-Daniel-Lemire-s-blog/tree/master/2026/02/02">My source code is available.</a></p>
]]></content:encoded>
					
					<wfw:commentRss>https://lemire.me/blog/2026/02/02/converting-data-to-hexadecimal-outputs-quickly/feed/</wfw:commentRss>
			<slash:comments>6</slash:comments>
		
		
			</item>
		<item>
		<title>Converting floats to strings quickly</title>
		<link>https://lemire.me/blog/2026/02/01/converting-floats-to-strings-quickly/</link>
					<comments>https://lemire.me/blog/2026/02/01/converting-floats-to-strings-quickly/#comments</comments>
		
		<dc:creator><![CDATA[Daniel Lemire]]></dc:creator>
		<pubDate>Sun, 01 Feb 2026 15:23:25 +0000</pubDate>
				<category><![CDATA[]]></category>
		<guid isPermaLink="false">https://lemire.me/blog/?p=22472</guid>

					<description><![CDATA[<img width="150" height="150" src="https://lemire.me/blog/wp-content/uploads/2026/02/Capture-decran-le-2026-02-01-a-10.32.13-150x150.png" class="webfeedsFeaturedVisual wp-post-image" alt="" style="display: block; margin-bottom: 5px; clear:both;max-width: 100%;" link_thumbnail="" decoding="async" loading="lazy" />When serializing data to JSON, CSV or when logging, we convert numbers to strings. Floating-point numbers are stored in binary, but we need them as decimal strings. The first formally published algorithm is Steele and White&#8217;s Dragon schemes (specifically Dragin2) in 1990. Since then, faster methods have emerged: Grisu3, Ryū, Schubfach, Grisu-Exact, and Dragonbox. In &#8230; <a href="https://lemire.me/blog/2026/02/01/converting-floats-to-strings-quickly/" class="more-link">Continue reading <span class="screen-reader-text">Converting floats to strings quickly</span></a>]]></description>
										<content:encoded><![CDATA[<img width="150" height="150" src="https://lemire.me/blog/wp-content/uploads/2026/02/Capture-decran-le-2026-02-01-a-10.32.13-150x150.png" class="webfeedsFeaturedVisual wp-post-image" alt="" style="display: block; margin-bottom: 5px; clear:both;max-width: 100%;" link_thumbnail="" decoding="async" loading="lazy" /><p>When serializing data to JSON, CSV or when logging, we convert numbers to strings. Floating-point numbers are stored in binary, but we need them as decimal strings. The first formally published algorithm is Steele and White&#8217;s Dragon schemes (specifically Dragin2) in 1990. Since then, faster methods have emerged: Grisu3, Ryū, Schubfach, Grisu-Exact, and Dragonbox. In C++17, we have a standard function called <code>std::to_chars</code> for this purpose. A common objective is to generate the shortest strings while still being able to uniquely identify the original number.</p>
<p>We recently published <a href="https://onlinelibrary.wiley.com/doi/epdf/10.1002/spe.70056">Converting Binary Floating-Point Numbers to Shortest Decimal Strings</a>. We examine the full conversion, from the floating-point number to the string. In practice, the conversion implies two steps: we take the number and compute the significant and the power of 10 (step 1) and then we generate the string (step 2). E.g., for the number pi, you might need to compute <code>31415927</code> and <code>-7</code> (step 1) before generating the string <code>3.1415927</code>. The string generation requires placing the dot at the right location and switching to the exponential notation when needed. The generation of the string is relatively cheap and was probably a negligible cost for older schemes, but as the software got faster, it is now a more important component (using 20% to 35% of the time).</p>
<p>The results vary quite a bit depending on the numbers being converted. But we find that the two implementations tend to do best: Dragonbox by Jeon and Schubfach by Giulietti. The Ryū implementation by Adams is close behind or just as fast. All of these techniques are about 10 times faster than the original Dragon 4 from 1990. A tenfold performance gain in performance over three decades is equivalent to a gain of about 8% per year, entirely due to better implementations and algorithms.</p>
<p><a href="http://lemire.me/blog/wp-content/uploads/2026/02/result.png"><img loading="lazy" decoding="async" class="alignnone size-full wp-image-22473" src="http://lemire.me/blog/wp-content/uploads/2026/02/result.png" alt="" width="752" height="758" srcset="https://lemire.me/blog/wp-content/uploads/2026/02/result.png 752w, https://lemire.me/blog/wp-content/uploads/2026/02/result-298x300.png 298w, https://lemire.me/blog/wp-content/uploads/2026/02/result-150x150.png 150w" sizes="auto, (max-width: 752px) 100vw, 752px" /></a></p>
<p>Efficient algorithms use between 200 and 350 instructions for each string generated. We find that the standard function <code>std::to_chars</code> under Linux uses slightly more instructions than needed (up to nearly 2 times too many). So there is room to improve common implementations. Using the popular C++ library <code>fmt</code> is slightly less efficient.</p>
<p>A fun fact is that we found that that none of the available functions generate the shortest possible string. The <code>std::to_chars</code> C++ function renders the number 0.00011 as 0.00011 (7 characters), while the shorter scientific form 1.1e-4 would do. But, by convention, when switching to the scientific notation, it is required to pad the exponent to two digits (so 1.1e-04). Beyond this technicality, we found that no implementation always generate the shortest string.</p>
<p>All our code, datasets, and raw results are open-source. The benchmarking suite is at https://github.com/fastfloat/float_serialization_benchmark, test data at https://github.com/fastfloat/float-data.</p>
<p><em>Reference</em>: <a href="https://onlinelibrary.wiley.com/doi/epdf/10.1002/spe.70056">Converting Binary Floating-Point Numbers to Shortest<br />
Decimal Strings: An Experimental Review</a>, Software: Practice and Experience (to appear)</p>
]]></content:encoded>
					
					<wfw:commentRss>https://lemire.me/blog/2026/02/01/converting-floats-to-strings-quickly/feed/</wfw:commentRss>
			<slash:comments>10</slash:comments>
		
		
			</item>
		<item>
		<title>Optimizing Python scripts with AI</title>
		<link>https://lemire.me/blog/2026/01/25/optimizing-python-scripts-with-ai/</link>
					<comments>https://lemire.me/blog/2026/01/25/optimizing-python-scripts-with-ai/#comments</comments>
		
		<dc:creator><![CDATA[Daniel Lemire]]></dc:creator>
		<pubDate>Sun, 25 Jan 2026 23:19:12 +0000</pubDate>
				<category><![CDATA[]]></category>
		<guid isPermaLink="false">https://lemire.me/blog/?p=22466</guid>

					<description><![CDATA[<img width="150" height="150" src="https://lemire.me/blog/wp-content/uploads/2026/01/Capture-decran-le-2026-01-25-a-18.08.36-150x150.png" class="webfeedsFeaturedVisual wp-post-image" alt="" style="display: block; margin-bottom: 5px; clear:both;max-width: 100%;" link_thumbnail="" decoding="async" loading="lazy" />One of the first steps we take when we want to optimize software is to look at profiling data. Software profilers are tools that try to identify where your software spends its time. Though the exact approach can vary, a typical profiler samples your software (steps it at regular intervals) and collects statistics. If your &#8230; <a href="https://lemire.me/blog/2026/01/25/optimizing-python-scripts-with-ai/" class="more-link">Continue reading <span class="screen-reader-text">Optimizing Python scripts with AI</span></a>]]></description>
										<content:encoded><![CDATA[<img width="150" height="150" src="https://lemire.me/blog/wp-content/uploads/2026/01/Capture-decran-le-2026-01-25-a-18.08.36-150x150.png" class="webfeedsFeaturedVisual wp-post-image" alt="" style="display: block; margin-bottom: 5px; clear:both;max-width: 100%;" link_thumbnail="" decoding="async" loading="lazy" /><p>One of the first steps we take when we want to optimize software is to look<br />
at profiling data. Software profilers are tools that try to identify where<br />
your software spends its time. Though the exact approach can vary, a typical profiler samples your software (steps it at regular intervals) and collects statistics. If your software is routinely stopped in a given function, this function is likely using a lot of time. In turn, it might be where you should put your optimization efforts.</p>
<p>Matteo Collina recently <a href="https://github.com/platformatic/pprof-to-md">shared with me his work on feeding profiler data</a> for software optimization purposes in JavaScript. Essentially, Matteo takes the profiling data, and prepares it in a way that an AI can comprehend. The insight is simple but intriguing: tell an AI how it can capture profiling data and then let it optimize your code, possibly by repeatedly profiling the code. The idea is not original since AI tools will, on their own, figure out that they can get profiling data.</p>
<p>How well does it work? I had to try it.</p>
<h2>Case 1. Code amalgamation script</h2>
<p>For the simdutf software library, <a href="https://raw.githubusercontent.com/simdutf/simdutf/refs/heads/master/singleheader/amalgamate.py">we use an amalgamation script</a>: it collects all of the C++ files on disk, does some shallow parsing and glues them together according to some rules.</p>
<p>I first ask the AI to optimize the script without access to profiling data. What it did immediately was to add a file cache. The script repeatedly loads the same files from disk (the script is a bit complex). This saved about 20% of the running time.</p>
<p>Specifically, the AI replaced this naive code&#8230;</p>
<div class="highlight" style="background: #f8f8f8;">
<pre style="line-height: 125%;"><code><span style="color: #204a87; font-weight: bold;">def</span> <span style="color: #000;">read_file</span><span style="color: #000; font-weight: bold;">(</span><span style="color: #000;">file</span><span style="color: #000; font-weight: bold;">):</span>
    <span style="color: #204a87; font-weight: bold;">with</span> <span style="color: #204a87;">open</span><span style="color: #000; font-weight: bold;">(</span><span style="color: #000;">file</span><span style="color: #000; font-weight: bold;">,</span> <span style="color: #4e9a06;">'r'</span><span style="color: #000; font-weight: bold;">)</span> <span style="color: #204a87; font-weight: bold;">as</span> <span style="color: #000;">f</span><span style="color: #000; font-weight: bold;">:</span>
        <span style="color: #204a87; font-weight: bold;">for</span> <span style="color: #000;">line</span> <span style="color: #204a87; font-weight: bold;">in</span> <span style="color: #000;">f</span><span style="color: #000; font-weight: bold;">:</span>
            <span style="color: #204a87; font-weight: bold;">yield</span> <span style="color: #000;">line</span><span style="color: #ce5c00; font-weight: bold;">.</span><span style="color: #000;">rstrip</span><span style="color: #000; font-weight: bold;">()</span>
</code></pre>
</div>
<p>by this version with caching&#8230;</p>
<div class="highlight" style="background: #f8f8f8;">
<pre style="line-height: 125%;"><code><span style="color: #204a87; font-weight: bold;">def</span> <span style="color: #000;">read_file</span><span style="color: #000; font-weight: bold;">(</span><span style="color: #000;">file</span><span style="color: #000; font-weight: bold;">):</span>
    <span style="color: #204a87; font-weight: bold;">if</span> <span style="color: #000;">file</span> <span style="color: #204a87; font-weight: bold;">in</span> <span style="color: #000;">file_cache</span><span style="color: #000; font-weight: bold;">:</span>
        <span style="color: #204a87; font-weight: bold;">for</span> <span style="color: #000;">line</span> <span style="color: #204a87; font-weight: bold;">in</span> <span style="color: #000;">file_cache</span><span style="color: #000; font-weight: bold;">[</span><span style="color: #000;">file</span><span style="color: #000; font-weight: bold;">]:</span>
            <span style="color: #204a87; font-weight: bold;">yield</span> <span style="color: #000;">line</span>
    <span style="color: #204a87; font-weight: bold;">else</span><span style="color: #000; font-weight: bold;">:</span>
        <span style="color: #000;">lines</span> <span style="color: #ce5c00; font-weight: bold;">=</span> <span style="color: #000; font-weight: bold;">[]</span>
        <span style="color: #204a87; font-weight: bold;">with</span> <span style="color: #204a87;">open</span><span style="color: #000; font-weight: bold;">(</span><span style="color: #000;">file</span><span style="color: #000; font-weight: bold;">,</span> <span style="color: #4e9a06;">'r'</span><span style="color: #000; font-weight: bold;">)</span> <span style="color: #204a87; font-weight: bold;">as</span> <span style="color: #000;">f</span><span style="color: #000; font-weight: bold;">:</span>
            <span style="color: #204a87; font-weight: bold;">for</span> <span style="color: #000;">line</span> <span style="color: #204a87; font-weight: bold;">in</span> <span style="color: #000;">f</span><span style="color: #000; font-weight: bold;">:</span>
                <span style="color: #000;">line</span> <span style="color: #ce5c00; font-weight: bold;">=</span> <span style="color: #000;">line</span><span style="color: #ce5c00; font-weight: bold;">.</span><span style="color: #000;">rstrip</span><span style="color: #000; font-weight: bold;">()</span>
                <span style="color: #000;">lines</span><span style="color: #ce5c00; font-weight: bold;">.</span><span style="color: #000;">append</span><span style="color: #000; font-weight: bold;">(</span><span style="color: #000;">line</span><span style="color: #000; font-weight: bold;">)</span>
                <span style="color: #204a87; font-weight: bold;">yield</span> <span style="color: #000;">line</span>
        <span style="color: #000;">file_cache</span><span style="color: #000; font-weight: bold;">[</span><span style="color: #000;">file</span><span style="color: #000; font-weight: bold;">]</span> <span style="color: #ce5c00; font-weight: bold;">=</span> <span style="color: #000;">lines</span>
</code></pre>
</div>
<p>Could the AI do better with profiling data? I instructed it to run the Python profiler: <code>python -m cProfile -s cumtime myprogram.py</code>. It found two additional optimizations:</p>
<p>1. It precompiled the regular expressions (<code>re.compile</code>). It replaced</p>
<div class="highlight" style="background: #f8f8f8;">
<pre style="line-height: 125%;"><code>  <span style="color: #204a87; font-weight: bold;">if</span> <span style="color: #000;">re</span><span style="color: #ce5c00; font-weight: bold;">.</span><span style="color: #000;">match</span><span style="color: #000; font-weight: bold;">(</span><span style="color: #4e9a06;">'.*generic/.*.h'</span><span style="color: #000; font-weight: bold;">,</span> <span style="color: #000;">file</span><span style="color: #000; font-weight: bold;">):</span>
    <span style="color: #8f5902; font-style: italic;"># ...</span>
</code></pre>
</div>
<p>by</p>
<div class="highlight" style="background: #f8f8f8;">
<pre style="line-height: 125%;"><code><span style="color: #204a87; font-weight: bold;">if</span> <span style="color: #000;">generic_pattern</span><span style="color: #ce5c00; font-weight: bold;">.</span><span style="color: #000;">match</span><span style="color: #000; font-weight: bold;">(</span><span style="color: #000;">file</span><span style="color: #000; font-weight: bold;">):</span>
    <span style="color: #8f5902; font-style: italic;"># ...</span>
</code></pre>
</div>
<p>where elsewhere in the code, we have&#8230;</p>
<div class="highlight" style="background: #f8f8f8;">
<pre style="line-height: 125%;"><code><span style="color: #000;">generic_pattern</span> <span style="color: #ce5c00; font-weight: bold;">=</span> <span style="color: #000;">re</span><span style="color: #ce5c00; font-weight: bold;">.</span><span style="color: #000;">compile</span><span style="color: #000; font-weight: bold;">(</span><span style="color: #4e9a06;">r'.*generic/.*\.h'</span><span style="color: #000; font-weight: bold;">)</span></code></pre>
</div>
<p>2. Instead of repeatedly calling <code>re.sub</code> to do a regular expression substitution, it filtered the strings by checking for the presence of a keyword in the string first.</p>
<div class="highlight" style="background: #f8f8f8;">
<pre style="line-height: 125%;"><code><span style="color: #204a87; font-weight: bold;">if</span> <span style="color: #4e9a06;">'SIMDUTF_IMPLEMENTATION'</span> <span style="color: #204a87; font-weight: bold;">in</span> <span style="color: #000;">line</span><span style="color: #000; font-weight: bold;">:</span> <span style="color: #8f5902; font-style: italic;"># This IF is the optimization</span>
  <span style="color: #204a87;">print</span><span style="color: #000; font-weight: bold;">(</span><span style="color: #000;">uses_simdutf_implementation</span><span style="color: #ce5c00; font-weight: bold;">.</span><span style="color: #000;">sub</span><span style="color: #000; font-weight: bold;">(</span><span style="color: #000;">context</span><span style="color: #ce5c00; font-weight: bold;">.</span><span style="color: #000;">current_implementation</span><span style="color: #ce5c00; font-weight: bold;">+</span><span style="color: #4e9a06;">"\\1"</span><span style="color: #000; font-weight: bold;">,</span> <span style="color: #000;">line</span><span style="color: #000; font-weight: bold;">),</span> <span style="color: #000;">file</span><span style="color: #ce5c00; font-weight: bold;">=</span><span style="color: #000;">fid</span><span style="color: #000; font-weight: bold;">)</span>
<span style="color: #204a87; font-weight: bold;">else</span><span style="color: #000; font-weight: bold;">:</span>
  <span style="color: #204a87;">print</span><span style="color: #000; font-weight: bold;">(</span><span style="color: #000;">line</span><span style="color: #000; font-weight: bold;">,</span> <span style="color: #000;">file</span><span style="color: #ce5c00; font-weight: bold;">=</span><span style="color: #000;">fid</span><span style="color: #000; font-weight: bold;">)</span> <span style="color: #8f5902; font-style: italic;"># Fast path</span>
</code></pre>
</div>
<p>These two optimizations could probably have been arrived at by looking at the code directly, and I cannot be certain that they were driven by the profiling data. But I can tell that they do appear in the profile data.</p>
<p>Unfortunately, the low-hanging fruit, caching the file access, represented the bulk of the gain. The AI was not able to further optimize the code. So the profiling data did not help much.</p>
<h2>Case 2: Check Link Script</h2>
<p>When I design online courses, I often use a lot of links. These links break over time. So I have a simple Python script that goes through all the links, and verifies them.</p>
<p>I first ask my AI to optimize the code. It did the same regex trick, compiling the regular expression. It created a thread pool and made the script asynchronous.</p>
<div class="highlight" style="background: #f8f8f8;">
<pre style="line-height: 125%;"><code><span style="color: #204a87; font-weight: bold;">with</span> <span style="color: #000;">concurrent</span><span style="color: #ce5c00; font-weight: bold;">.</span><span style="color: #000;">futures</span><span style="color: #ce5c00; font-weight: bold;">.</span><span style="color: #000;">ThreadPoolExecutor</span><span style="color: #000; font-weight: bold;">(</span><span style="color: #000;">max_workers</span><span style="color: #ce5c00; font-weight: bold;">=</span><span style="color: #0000cf; font-weight: bold;">10</span><span style="color: #000; font-weight: bold;">)</span> <span style="color: #204a87; font-weight: bold;">as</span> <span style="color: #000;">executor</span><span style="color: #000; font-weight: bold;">:</span>
    <span style="color: #000;">url_results</span> <span style="color: #ce5c00; font-weight: bold;">=</span> <span style="color: #000; font-weight: bold;">{</span><span style="color: #000;">url</span><span style="color: #000; font-weight: bold;">:</span> <span style="color: #000;">executor</span><span style="color: #ce5c00; font-weight: bold;">.</span><span style="color: #000;">submit</span><span style="color: #000; font-weight: bold;">(</span><span style="color: #000;">check_url</span><span style="color: #000; font-weight: bold;">,</span> <span style="color: #000;">url</span><span style="color: #000; font-weight: bold;">)</span> <span style="color: #204a87; font-weight: bold;">for</span> <span style="color: #000;">url</span> <span style="color: #204a87; font-weight: bold;">in</span> <span style="color: #000;">urls_to_check</span><span style="color: #000; font-weight: bold;">}</span>
    <span style="color: #204a87; font-weight: bold;">for</span> <span style="color: #000;">url</span><span style="color: #000; font-weight: bold;">,</span> <span style="color: #000;">future</span> <span style="color: #204a87; font-weight: bold;">in</span> <span style="color: #000;">url_results</span><span style="color: #ce5c00; font-weight: bold;">.</span><span style="color: #000;">items</span><span style="color: #000; font-weight: bold;">():</span>
        <span style="color: #000;">url_cache</span><span style="color: #000; font-weight: bold;">[</span><span style="color: #000;">url</span><span style="color: #000; font-weight: bold;">]</span> <span style="color: #ce5c00; font-weight: bold;">=</span> <span style="color: #000;">future</span><span style="color: #ce5c00; font-weight: bold;">.</span><span style="color: #000;">result</span><span style="color: #000; font-weight: bold;">()</span>
</code></pre>
</div>
<p>This parallelization more than doubled the speed of the script.</p>
<p>It cached the URL checks in an interesting way, using <code>functools</code>:</p>
<div class="highlight" style="background: #f8f8f8;">
<pre style="line-height: 125%;"><code><span style="color: #204a87; font-weight: bold;">from</span> <span style="color: #000;">functools</span> <span style="color: #204a87; font-weight: bold;">import</span> <span style="color: #000;">lru_cache</span>

<span style="color: #5c35cc; font-weight: bold;">@lru_cache</span><span style="color: #000; font-weight: bold;">(</span><span style="color: #000;">maxsize</span><span style="color: #ce5c00; font-weight: bold;">=</span><span style="color: #204a87; font-weight: bold;">None</span><span style="color: #000; font-weight: bold;">)</span>
<span style="color: #204a87; font-weight: bold;">def</span> <span style="color: #000;">check</span><span style="color: #000; font-weight: bold;">(</span><span style="color: #000;">link</span><span style="color: #000; font-weight: bold;">):</span>
    <span style="color: #8f5902; font-style: italic;"># ...</span>
</code></pre>
</div>
<p>I did not know about this nice trick. This proved useless in my context because I rarely have several times the same link.</p>
<p>I then started again, and told it to use the profiler. It did much the same thing, except for the optimization of the regular expression.</p>
<p>As far as I can tell all optimizations were in vain, except for the multithreading. And it could do this part without the profiling data.</p>
<h2>Conclusion so far</h2>
<p>The Python scripts I tried were not heavily optimized, as their performance was not critical. They are relatively simple.</p>
<p>For the amalgamation, I got a 20% performance gain for &#8216;free&#8217; thanks to the file caching. The link checker is going to be faster now that it is multithreaded. Both optimizations are valid and useful, and will make my life marginally better.</p>
<p>In neither case I was able to discern benefits due to the profiler data. I was initially hoping to get the AI busy optimizing the code in a loop, continuously running the profiler, but it did not happen in these simple cases. The AI optimized code segments that contributed little to the running time as per the profiler data.</p>
<p>To be fair, profiling data is often of limited use. The real problems are often architectural and not related to narrow bottlenecks. Even when there are identifiable bottlenecks, a simple profiling run can fail to make them clearly identifiable. Further, profilers become more useful as the code base grows, while my test cases are tiny.</p>
<p>Overall, I expect that the main reason for my relative failure is that I did not have the right use cases. I think that collecting profiling data and asking an AI to have a look might be a reasonable first step at this point.</p>
]]></content:encoded>
					
					<wfw:commentRss>https://lemire.me/blog/2026/01/25/optimizing-python-scripts-with-ai/feed/</wfw:commentRss>
			<slash:comments>7</slash:comments>
		
		
			</item>
		<item>
		<title>A new way to call C from Java: how fast is it?</title>
		<link>https://lemire.me/blog/2026/01/17/a-new-way-to-call-c-from-java-how-fast-is-it/</link>
					<comments>https://lemire.me/blog/2026/01/17/a-new-way-to-call-c-from-java-how-fast-is-it/#comments</comments>
		
		<dc:creator><![CDATA[Daniel Lemire]]></dc:creator>
		<pubDate>Sat, 17 Jan 2026 23:44:38 +0000</pubDate>
				<category><![CDATA[]]></category>
		<guid isPermaLink="false">https://lemire.me/blog/?p=22445</guid>

					<description><![CDATA[<img width="150" height="150" src="https://lemire.me/blog/wp-content/uploads/2026/01/Capture-decran-le-2026-01-17-a-18.44.19-150x150.png" class="webfeedsFeaturedVisual wp-post-image" alt="" style="display: block; margin-bottom: 5px; clear:both;max-width: 100%;" link_thumbnail="" decoding="async" loading="lazy" />Irrespective of your programming language of choice, calling C functions is often a necessity. For the longest time, the only standard way to call C was the Java Native Interface (JNI). But it was so painful that few dared to do it. I have heard it said that it was deliberately painful so that people &#8230; <a href="https://lemire.me/blog/2026/01/17/a-new-way-to-call-c-from-java-how-fast-is-it/" class="more-link">Continue reading <span class="screen-reader-text">A new way to call C from Java: how fast is it?</span></a>]]></description>
										<content:encoded><![CDATA[<img width="150" height="150" src="https://lemire.me/blog/wp-content/uploads/2026/01/Capture-decran-le-2026-01-17-a-18.44.19-150x150.png" class="webfeedsFeaturedVisual wp-post-image" alt="" style="display: block; margin-bottom: 5px; clear:both;max-width: 100%;" link_thumbnail="" decoding="async" loading="lazy" /><p>Irrespective of your programming language of choice, calling C functions is often a necessity. For the longest time, the only standard way to call C was the Java Native Interface (JNI). But it was so painful that few dared to do it. I have heard it said that it was deliberately painful so that people would be enticed to use pure Java as much as possible.</p>
<p>Since Java 22, there is a new approach called the Foreign Function &amp; Memory API in <code>java.lang.foreign</code>. Let me go through step by step.</p>
<p>You need a <code>Linker</code> and a <code>SymbolLookup</code> instance from which you will build a <code>MethodHandle</code> that will capture the native function you want to call.</p>
<p>The linker is easy:</p>
<div class="highlight" style="background: #f8f8f8;">
<pre style="line-height: 125%;"><code><span style="color: #000;">Linker</span> <span style="color: #000;">linker</span> <span style="color: #ce5c00; font-weight: bold;">=</span> <span style="color: #000;">Linker</span><span style="color: #000; font-weight: bold;">.</span><span style="color: #c4a000;">nativeLinker</span><span style="color: #000; font-weight: bold;">();</span>
</code></pre>
</div>
<p>To load the <code>SymbolLookup</code> instance for your library (called <code>mylibrary</code>), you may do so as follows:</p>
<div class="highlight" style="background: #f8f8f8;">
<pre style="line-height: 125%;"><code><span style="color: #000;">System</span><span style="color: #000; font-weight: bold;">.</span><span style="color: #c4a000;">loadLibrary</span><span style="color: #000; font-weight: bold;">(</span><span style="color: #4e9a06;">"mylibrary"</span><span style="color: #000; font-weight: bold;">);</span>
<span style="color: #000;">SymbolLookup</span> <span style="color: #000;">lookup</span> <span style="color: #ce5c00; font-weight: bold;">=</span> <span style="color: #000;">SymbolLookup</span><span style="color: #000; font-weight: bold;">.</span><span style="color: #c4a000;">loaderLookup</span><span style="color: #000; font-weight: bold;">();</span>
</code></pre>
</div>
<p>The native library file should be on your <code>java.library.path</code> path, or somewhere on the default library paths. (You can pass it to your java executable as <code>-Djava.library.path=something</code>).</p>
<p>Alternatively, you can use <code>SymbolLookup.libraryLookup</code> or other means of loading<br />
the library, but <code>System.loadLibrary</code> should work well enough.</p>
<p>You have the lookup, you can grab the address of a function like so:</p>
<div class="highlight" style="background: #f8f8f8;">
<pre style="line-height: 125%;"><code><span style="color: #000;">lookup</span><span style="color: #000; font-weight: bold;">.</span><span style="color: #c4a000;">find</span><span style="color: #000; font-weight: bold;">(</span><span style="color: #4e9a06;">"myfunction"</span><span style="color: #000; font-weight: bold;">)</span>
</code></pre>
</div>
<p>This returns an <code>Optional&lt;MemorySegment&gt;</code>. You can grab the <code>MemorySegment</code> like so:</p>
<div class="highlight" style="background: #f8f8f8;">
<pre style="line-height: 125%;"><code><span style="color: #000;">MemorySegment</span> <span style="color: #000;">mem</span> <span style="color: #ce5c00; font-weight: bold;">=</span> <span style="color: #000;">lookup</span><span style="color: #000; font-weight: bold;">.</span><span style="color: #c4a000;">find</span><span style="color: #000; font-weight: bold;">(</span><span style="color: #4e9a06;">"myfunction"</span><span style="color: #000; font-weight: bold;">).</span><span style="color: #c4a000;">orElseThrow</span><span style="color: #000; font-weight: bold;">()</span>
</code></pre>
</div>
<p>Once you have your <code>MemorySegment</code>, you can pass it to your <code>linker</code> to get a <code>MethodHandle</code> which is close to a callable function:</p>
<div class="highlight" style="background: #f8f8f8;">
<pre style="line-height: 125%;"><code> <span style="color: #000;">MethodHandle</span> <span style="color: #000;">myfunc</span> <span style="color: #ce5c00; font-weight: bold;">=</span> <span style="color: #000;">linker</span><span style="color: #000; font-weight: bold;">.</span><span style="color: #c4a000;">downcallHandle</span><span style="color: #000; font-weight: bold;">(</span>
     <span style="color: #000;">mem</span><span style="color: #000; font-weight: bold;">,</span>
     <span style="color: #000;">functiondescr</span>
 <span style="color: #000; font-weight: bold;">);</span>
</code></pre>
</div>
<p>The <code>functiondescr</code> must describe the returned value and the function parameters that your function takes. If you pass a pointer and get back a <code>long</code> value, you might proceed as follows:</p>
<div class="highlight" style="background: #f8f8f8;">
<pre style="line-height: 125%;"><code> <span style="color: #000;">MethodHandle</span> <span style="color: #000;">myfunc</span> <span style="color: #ce5c00; font-weight: bold;">=</span> <span style="color: #000;">linker</span><span style="color: #000; font-weight: bold;">.</span><span style="color: #c4a000;">downcallHandle</span><span style="color: #000; font-weight: bold;">(</span>
     <span style="color: #000;">mem</span><span style="color: #000; font-weight: bold;">,</span>
     <span style="color: #000;">FunctionDescriptor</span><span style="color: #000; font-weight: bold;">.</span><span style="color: #c4a000;">of</span><span style="color: #000; font-weight: bold;">(</span>
        <span style="color: #000;">ValueLayout</span><span style="color: #000; font-weight: bold;">.</span><span style="color: #c4a000;">JAVA_LONG</span><span style="color: #000; font-weight: bold;">,</span>
        <span style="color: #000;">ValueLayout</span><span style="color: #000; font-weight: bold;">.</span><span style="color: #c4a000;">ADDRESS</span>
    <span style="color: #000; font-weight: bold;">)</span>
 <span style="color: #000; font-weight: bold;">);</span>
</code></pre>
</div>
<p>That is, the first parameter is the returned value.</p>
<p>For function returning nothing, you use <code>FunctionDescriptor.ofVoid</code>.</p>
<p>The <code>MethodHandle</code> can be called almost like a normal Java function:<br />
<code>myfunc.invokeExact(parameters)</code>. It always returns an <code>Object</code> which means that if it should return a <code>long</code>, it will return a <code>Long</code>. So a cast might be necessary.</p>
<p class="article-editor-paragraph article-editor-content__has-focus">It is a bit painful, but thankfully, there is a tool called <a href="https://github.com/openjdk/jextract">jextract</a> that can automate this task. It generates Java bindings from native library headers.</p>
<p>You can allocate C data structures from Java that you can pass to your native code by using an <code>Arena</code>. Let us say that you want to create an instance like</p>
<div class="highlight" style="background: #f8f8f8;">
<pre style="line-height: 125%;"><code><span style="color: #000;">MemoryLayout</span> <span style="color: #000;">mystruct</span> <span style="color: #ce5c00; font-weight: bold;">=</span> <span style="color: #000;">MemoryLayout</span><span style="color: #000; font-weight: bold;">.</span><span style="color: #c4a000;">structLayout</span><span style="color: #000; font-weight: bold;">(</span>
        <span style="color: #000;">ValueLayout</span><span style="color: #000; font-weight: bold;">.</span><span style="color: #c4a000;">JAVA_LONG</span><span style="color: #000; font-weight: bold;">.</span><span style="color: #c4a000;">withName</span><span style="color: #000; font-weight: bold;">(</span><span style="color: #4e9a06;">"age"</span><span style="color: #000; font-weight: bold;">),</span>
        <span style="color: #000;">ValueLayout</span><span style="color: #000; font-weight: bold;">.</span><span style="color: #c4a000;">JAVA_INT</span><span style="color: #000; font-weight: bold;">.</span><span style="color: #c4a000;">withName</span><span style="color: #000; font-weight: bold;">(</span><span style="color: #4e9a06;">"friends"</span><span style="color: #000; font-weight: bold;">));</span>
</code></pre>
</div>
<p>You could do it in this manner:</p>
<div class="highlight" style="background: #f8f8f8;">
<pre style="line-height: 125%;"><code><span style="color: #000;">MemorySegment</span> <span style="color: #000;">myseg</span> <span style="color: #ce5c00; font-weight: bold;">=</span> <span style="color: #000;">arena</span><span style="color: #000; font-weight: bold;">.</span><span style="color: #c4a000;">allocate</span><span style="color: #000; font-weight: bold;">(</span><span style="color: #000;">mystruct</span><span style="color: #000; font-weight: bold;">);</span>
</code></pre>
</div>
<p>You can then pass <code>myseg</code> as a pointer to a data structure in C.</p>
<p>You often get an array with a <code>try</code> clause like so:</p>
<div class="highlight" style="background: #f8f8f8;">
<pre style="line-height: 125%;"><code><span style="color: #204a87; font-weight: bold;">try</span> <span style="color: #000; font-weight: bold;">(</span><span style="color: #000;">Arena</span> <span style="color: #000;">arena</span> <span style="color: #ce5c00; font-weight: bold;">=</span> <span style="color: #000;">Arena</span><span style="color: #000; font-weight: bold;">.</span><span style="color: #c4a000;">ofConfined</span><span style="color: #000; font-weight: bold;">())</span> <span style="color: #000; font-weight: bold;">{</span>
       <span style="color: #8f5902; font-style: italic;">//</span>
<span style="color: #000; font-weight: bold;">}</span>
</code></pre>
</div>
<p>There are many types of arenas: confined, global, automatic, shared. The confined arenas are accessible from a single thread. A shared or global arena is accessible from several threads. The global and automatic arenas are managed by the Java garbage collector whereas the confined and shared arenas are managed explicitly, with a specific lifetime.</p>
<p>So, it is fairly complicated but manageable. Is it fast? To find out, I call from Java a C library I wrote with support for binary fuse filters. They are a fast alternative to Bloom filters.</p>
<p>You don&#8217;t need to know what any of this means, however. Keep in mind that I wrote a Java library called <a href="https://github.com/FastFilter/jfusebin">jfusebin</a> which calls a C library. Then I also have a pure <a href="https://github.com/FastFilter/fastfilter_java">Java implementation</a> and I can compare the speed.</p>
<p>I should first point out that even if calling the C function did not include any overhead, it might still be slower because the Java compiler is unlikely to inline a native function. However, if you have a pure Java function, and it is relatively small, it can get inlined and you get all sorts of nice optimizations like constant folding and so forth.</p>
<p>Thus I can overestimate the cost of the overhead. But that&#8217;s ok. I just want a ballpark measure.</p>
<p>In my benchmark, I check for the presence of a key in a set. I have one million keys in the filter. I can ask whether a key is not present in the filter.</p>
<p>I find that the library calling C can issue 44 million calls per second using the 8-bit binary fuse filter. I reach about 400 million calls per second using the pure Java implementation.</p>
<table>
<thead>
<tr>
<th>method</th>
<th>time per query in nanoseconds</th>
</tr>
</thead>
<tbody>
<tr>
<td>Java-to-C</td>
<td>22.7 ns</td>
</tr>
<tr>
<td>Pure Java</td>
<td>2.5 ns</td>
</tr>
</tbody>
</table>
<p>Thus I measure an overhead of about 20 ns per C function calls from Java using a macBook (M4 processor).</p>
<p>We can do slightly better by marking the functions that are expected to be short running as <em>critical</em>. You achieve this result by passing an option to the <code>linker.downcallHandle</code> call.</p>
<div class="highlight" style="background: #f8f8f8;">
<pre style="line-height: 125%;"><code><span style="color: #000;">binary_fuse8_contain</span> <span style="color: #ce5c00; font-weight: bold;">=</span> <span style="color: #000;">linker</span><span style="color: #000; font-weight: bold;">.</span><span style="color: #c4a000;">downcallHandle</span><span style="color: #000; font-weight: bold;">(</span>
    <span style="color: #000;">lookup</span><span style="color: #000; font-weight: bold;">.</span><span style="color: #c4a000;">find</span><span style="color: #000; font-weight: bold;">(</span><span style="color: #4e9a06;">"xfuse_binary_fuse8_contain"</span><span style="color: #000; font-weight: bold;">).</span><span style="color: #c4a000;">orElseThrow</span><span style="color: #000; font-weight: bold;">(),</span>
    <span style="color: #000;">binary_fuse8_contain_desc</span><span style="color: #000; font-weight: bold;">,</span>
    <span style="color: #000;">Linker</span><span style="color: #000; font-weight: bold;">.</span><span style="color: #c4a000;">Option</span><span style="color: #000; font-weight: bold;">.</span><span style="color: #c4a000;">critical</span><span style="color: #000; font-weight: bold;">(</span><span style="color: #204a87; font-weight: bold;">false</span><span style="color: #000; font-weight: bold;">)</span>
<span style="color: #000; font-weight: bold;">);</span>
</code></pre>
</div>
<p>You save about 15% of the running time in my case.</p>
<table>
<thead>
<tr>
<th>method</th>
<th>time per query in nanoseconds</th>
</tr>
</thead>
<tbody>
<tr>
<td>Java-to-C</td>
<td>22.7 ns</td>
</tr>
<tr>
<td>Java-to-C (critical)</td>
<td>19.5 ns</td>
</tr>
<tr>
<td>Pure Java</td>
<td>2.5 ns</td>
</tr>
</tbody>
</table>
<p>Obviously, in my case, because the Java library is so fast, the 20 ns becomes too much. But it is otherwise a reasonable overhead.</p>
<p>I did not compare with the old approach (JNI), but other folks did and they find that the new foreign function approach can be <span data-offset-key="226ip-1-0"><a class="css-1jxf684 r-bcqeeo r-1ttztb7 r-qvutc0 r-poiln3 r-1inkyih r-rjixqe r-1ddef8g r-tjvw6i r-1loqt21" role="link" href="https://github.com/zakgof/java-native-benchmark" target="_blank" rel="noopener noreferrer nofollow">measurably faster </a></span>(e.g., 50% faster). In particular, it has been reported that calling a Java function from C is now relatively fast: I have not tested this functionality myself.</p>
<p>One of the cool feature of the new interface is that you can pass directly data from the Java heap to your C function with relative ease.</p>
<p>Suppose you have the following C function:</p>
<div class="highlight" style="background: #f8f8f8;">
<pre style="line-height: 125%;"><code><span style="color: #204a87; font-weight: bold;">int</span> <span style="color: #000;">sum_array</span><span style="color: #000; font-weight: bold;">(</span><span style="color: #204a87; font-weight: bold;">int</span><span style="color: #ce5c00; font-weight: bold;">*</span> <span style="color: #000;">data</span><span style="color: #000; font-weight: bold;">,</span> <span style="color: #204a87; font-weight: bold;">int</span> <span style="color: #000;">count</span><span style="color: #000; font-weight: bold;">)</span> <span style="color: #000; font-weight: bold;">{</span>
    <span style="color: #204a87; font-weight: bold;">int</span> <span style="color: #000;">sum</span> <span style="color: #ce5c00; font-weight: bold;">=</span> <span style="color: #0000cf; font-weight: bold;">0</span><span style="color: #000; font-weight: bold;">;</span>
    <span style="color: #204a87; font-weight: bold;">for</span><span style="color: #000; font-weight: bold;">(</span><span style="color: #204a87; font-weight: bold;">int</span> <span style="color: #000;">i</span> <span style="color: #ce5c00; font-weight: bold;">=</span> <span style="color: #0000cf; font-weight: bold;">0</span><span style="color: #000; font-weight: bold;">;</span> <span style="color: #000;">i</span> <span style="color: #ce5c00; font-weight: bold;">&lt;</span> <span style="color: #000;">count</span><span style="color: #000; font-weight: bold;">;</span> <span style="color: #000;">i</span><span style="color: #ce5c00; font-weight: bold;">++</span><span style="color: #000; font-weight: bold;">)</span> <span style="color: #000; font-weight: bold;">{</span>
        <span style="color: #000;">sum</span> <span style="color: #ce5c00; font-weight: bold;">+=</span> <span style="color: #000;">data</span><span style="color: #000; font-weight: bold;">[</span><span style="color: #000;">i</span><span style="color: #000; font-weight: bold;">];</span>
    <span style="color: #000; font-weight: bold;">}</span>
    <span style="color: #204a87; font-weight: bold;">return</span> <span style="color: #000;">sum</span><span style="color: #000; font-weight: bold;">;</span>
<span style="color: #000; font-weight: bold;">}</span>
</code></pre>
</div>
<p>And you want the following Java array to be passed to C <strong>without a copy</strong>:</p>
<div class="highlight" style="background: #f8f8f8;">
<pre style="line-height: 125%;"><code><span style="color: #204a87; font-weight: bold;">int</span><span style="color: #ce5c00; font-weight: bold;">[]</span> <span style="color: #000;">javaArray</span> <span style="color: #ce5c00; font-weight: bold;">=</span> <span style="color: #000; font-weight: bold;">{</span><span style="color: #0000cf; font-weight: bold;">10</span><span style="color: #000; font-weight: bold;">,</span> <span style="color: #0000cf; font-weight: bold;">20</span><span style="color: #000; font-weight: bold;">,</span> <span style="color: #0000cf; font-weight: bold;">30</span><span style="color: #000; font-weight: bold;">,</span> <span style="color: #0000cf; font-weight: bold;">40</span><span style="color: #000; font-weight: bold;">,</span> <span style="color: #0000cf; font-weight: bold;">50</span><span style="color: #000; font-weight: bold;">};</span>
</code></pre>
</div>
<p>It is as simple as the following code.</p>
<div class="highlight" style="background: #f8f8f8;">
<pre style="line-height: 125%;"><code><span style="color: #000;">System</span><span style="color: #000; font-weight: bold;">.</span><span style="color: #c4a000;">loadLibrary</span><span style="color: #000; font-weight: bold;">(</span><span style="color: #4e9a06;">"sum"</span><span style="color: #000; font-weight: bold;">);</span>
<span style="color: #000;">Linker</span> <span style="color: #000;">linker</span> <span style="color: #ce5c00; font-weight: bold;">=</span> <span style="color: #000;">Linker</span><span style="color: #000; font-weight: bold;">.</span><span style="color: #c4a000;">nativeLinker</span><span style="color: #000; font-weight: bold;">();</span>
<span style="color: #000;">SymbolLookup</span> <span style="color: #000;">lookup</span> <span style="color: #ce5c00; font-weight: bold;">=</span> <span style="color: #000;">SymbolLookup</span><span style="color: #000; font-weight: bold;">.</span><span style="color: #c4a000;">loaderLookup</span><span style="color: #000; font-weight: bold;">();</span>
<span style="color: #000;">MemorySegment</span> <span style="color: #000;">sumAddress</span> <span style="color: #ce5c00; font-weight: bold;">=</span> <span style="color: #000;">lookup</span><span style="color: #000; font-weight: bold;">.</span><span style="color: #c4a000;">find</span><span style="color: #000; font-weight: bold;">(</span><span style="color: #4e9a06;">"sum_array"</span><span style="color: #000; font-weight: bold;">).</span><span style="color: #c4a000;">orElseThrow</span><span style="color: #000; font-weight: bold;">();</span>

<span style="color: #8f5902; font-style: italic;">// C Signature: int sum_array(int* data, int count)</span>
<span style="color: #000;">MethodHandle</span> <span style="color: #000;">sumArray</span> <span style="color: #ce5c00; font-weight: bold;">=</span> <span style="color: #000;">linker</span><span style="color: #000; font-weight: bold;">.</span><span style="color: #c4a000;">downcallHandle</span><span style="color: #000; font-weight: bold;">(</span>
    <span style="color: #000;">sumAddress</span><span style="color: #000; font-weight: bold;">,</span>
    <span style="color: #000;">FunctionDescriptor</span><span style="color: #000; font-weight: bold;">.</span><span style="color: #c4a000;">of</span><span style="color: #000; font-weight: bold;">(</span><span style="color: #000;">ValueLayout</span><span style="color: #000; font-weight: bold;">.</span><span style="color: #c4a000;">JAVA_INT</span><span style="color: #000; font-weight: bold;">,</span> <span style="color: #000;">ValueLayout</span><span style="color: #000; font-weight: bold;">.</span><span style="color: #c4a000;">ADDRESS</span><span style="color: #000; font-weight: bold;">,</span> <span style="color: #000;">ValueLayout</span><span style="color: #000; font-weight: bold;">.</span><span style="color: #c4a000;">JAVA_INT</span><span style="color: #000; font-weight: bold;">),</span>
    <span style="color: #000;">Linker</span><span style="color: #000; font-weight: bold;">.</span><span style="color: #c4a000;">Option</span><span style="color: #000; font-weight: bold;">.</span><span style="color: #c4a000;">critical</span><span style="color: #000; font-weight: bold;">(</span><span style="color: #204a87; font-weight: bold;">true</span><span style="color: #000; font-weight: bold;">)</span>
<span style="color: #000; font-weight: bold;">);</span>

<span style="color: #204a87; font-weight: bold;">int</span><span style="color: #ce5c00; font-weight: bold;">[]</span> <span style="color: #000;">javaArray</span> <span style="color: #ce5c00; font-weight: bold;">=</span> <span style="color: #000; font-weight: bold;">{</span><span style="color: #0000cf; font-weight: bold;">10</span><span style="color: #000; font-weight: bold;">,</span> <span style="color: #0000cf; font-weight: bold;">20</span><span style="color: #000; font-weight: bold;">,</span> <span style="color: #0000cf; font-weight: bold;">30</span><span style="color: #000; font-weight: bold;">,</span> <span style="color: #0000cf; font-weight: bold;">40</span><span style="color: #000; font-weight: bold;">,</span> <span style="color: #0000cf; font-weight: bold;">50</span><span style="color: #000; font-weight: bold;">};</span>

<span style="color: #204a87; font-weight: bold;">try</span> <span style="color: #000; font-weight: bold;">(</span><span style="color: #000;">Arena</span> <span style="color: #000;">arena</span> <span style="color: #ce5c00; font-weight: bold;">=</span> <span style="color: #000;">Arena</span><span style="color: #000; font-weight: bold;">.</span><span style="color: #c4a000;">ofConfined</span><span style="color: #000; font-weight: bold;">())</span> <span style="color: #000; font-weight: bold;">{</span>
    <span style="color: #000;">MemorySegment</span> <span style="color: #000;">heapSegment</span> <span style="color: #ce5c00; font-weight: bold;">=</span> <span style="color: #000;">MemorySegment</span><span style="color: #000; font-weight: bold;">.</span><span style="color: #c4a000;">ofArray</span><span style="color: #000; font-weight: bold;">(</span><span style="color: #000;">javaArray</span><span style="color: #000; font-weight: bold;">);</span>
    <span style="color: #204a87; font-weight: bold;">int</span> <span style="color: #000;">result</span> <span style="color: #ce5c00; font-weight: bold;">=</span> <span style="color: #000; font-weight: bold;">(</span><span style="color: #204a87; font-weight: bold;">int</span><span style="color: #000; font-weight: bold;">)</span> <span style="color: #000;">sumArray</span><span style="color: #000; font-weight: bold;">.</span><span style="color: #c4a000;">invoke</span><span style="color: #000; font-weight: bold;">(</span><span style="color: #000;">heapSegment</span><span style="color: #000; font-weight: bold;">,</span> <span style="color: #000;">javaArray</span><span style="color: #000; font-weight: bold;">.</span><span style="color: #c4a000;">length</span><span style="color: #000; font-weight: bold;">);</span>
    <span style="color: #000;">System</span><span style="color: #000; font-weight: bold;">.</span><span style="color: #c4a000;">out</span><span style="color: #000; font-weight: bold;">.</span><span style="color: #c4a000;">println</span><span style="color: #000; font-weight: bold;">(</span><span style="color: #4e9a06;">"The sum from C is: "</span> <span style="color: #ce5c00; font-weight: bold;">+</span> <span style="color: #000;">result</span><span style="color: #000; font-weight: bold;">);</span>
<span style="color: #000; font-weight: bold;">}</span>
</code></pre>
</div>
<p><a href="https://github.com/lemire/Code-used-on-Daniel-Lemire-s-blog/tree/master/2026/01/17/example">I created a complete example</a> in a few minutes. One trick is to make sure that java finds the native library. If it is not at a standard library path, you can specify the location with <code>-Djava.library.path</code> like so:</p>
<div class="highlight" style="background: #f8f8f8;">
<pre style="line-height: 125%;"><code>java -Djava.library.path<span style="color: #ce5c00; font-weight: bold;">=</span>target -cp target/classes IntArrayExample
</code></pre>
</div>
<p><strong>Further reading.</strong><a href="https://bazlur.ca/2025/12/14/when-does-javas-foreign-function-memory-api-actually-make-sense/">When Does Java’s Foreign Function &amp; Memory API Actually Make Sense?</a> by A N M Bazlur Rahman.</p>
]]></content:encoded>
					
					<wfw:commentRss>https://lemire.me/blog/2026/01/17/a-new-way-to-call-c-from-java-how-fast-is-it/feed/</wfw:commentRss>
			<slash:comments>5</slash:comments>
		
		
			</item>
		<item>
		<title>How stagnant is CPU technology?</title>
		<link>https://lemire.me/blog/2026/01/14/how-stagnant-is-cpu-technology/</link>
					<comments>https://lemire.me/blog/2026/01/14/how-stagnant-is-cpu-technology/#comments</comments>
		
		<dc:creator><![CDATA[Daniel Lemire]]></dc:creator>
		<pubDate>Wed, 14 Jan 2026 14:52:39 +0000</pubDate>
				<category><![CDATA[]]></category>
		<guid isPermaLink="false">https://lemire.me/blog/?p=22451</guid>

					<description><![CDATA[<img width="150" height="150" src="https://lemire.me/blog/wp-content/uploads/2026/01/G-iCzw6XgAAkLsj-150x150.jpeg" class="webfeedsFeaturedVisual wp-post-image" alt="" style="display: block; margin-bottom: 5px; clear:both;max-width: 100%;" link_thumbnail="" decoding="async" loading="lazy" />Sometimes, people tell me that there is no more progress in CPU performance. Consider these three processors which had comparable prices at release time. The AMD Ryzen 7 9800X3D (Zen 5, with up to 5.3 GHz boost) was released in 2024. The AMD Ryzen 7 7800X3D (Zen 4, with up to 5.1 GHz boost) was &#8230; <a href="https://lemire.me/blog/2026/01/14/how-stagnant-is-cpu-technology/" class="more-link">Continue reading <span class="screen-reader-text">How stagnant is CPU technology?</span></a>]]></description>
										<content:encoded><![CDATA[<img width="150" height="150" src="https://lemire.me/blog/wp-content/uploads/2026/01/G-iCzw6XgAAkLsj-150x150.jpeg" class="webfeedsFeaturedVisual wp-post-image" alt="" style="display: block; margin-bottom: 5px; clear:both;max-width: 100%;" link_thumbnail="" decoding="async" loading="lazy" /><p>Sometimes, people tell me that there is no more progress in CPU performance.</p>
<p>Consider these three processors which had comparable prices at release time.</p>
<ol>
<li>The AMD Ryzen 7 9800X3D (Zen 5, with up to 5.3 GHz boost) was released in 2024.</li>
<li>The AMD Ryzen 7 7800X3D (Zen 4, with up to 5.1 GHz boost) was released in 2023.</li>
<li>The AMD Ryzen 7 5800X3D (Zen 3, with 3.4 GHz base) was released in 2022.</li>
</ol>
<p>Let us consider their results on on the <a href="https://openbenchmarking.org/test/pts/simdjson&amp;eval=55ded682564c03f676d77dbc1f8de67e27c7aadc#metrics">PartialTweets open benchmark</a> (JSON parsing). It is a single core benchmark.</p>
<table>
<tbody>
<tr>
<td>2024 processor</td>
<td>12.7 GB/s</td>
</tr>
<tr>
<td>2023 processor</td>
<td>9 GB/s</td>
</tr>
<tr>
<td>2022 processor</td>
<td>5.2 GB/s</td>
</tr>
</tbody>
</table>
<p>In two years, on this benchmark, AMD more than doubled the performance for the same cost.</p>
<p>So what is happening is that processor performance is indeed going up, sometimes dramatically so, but not all of our software can benefit from the improvements. Software developers must track the trends and adapt our software accordingly. Unfortunately, it is hard work and it requires expertise. In the case of this benchmark, the simdjson library is designed to benefit from better processor features.</p>
<p>Not all software can easily run much faster on new processors, and genuine progress is difficult.</p>
<p>Let us be ambitious. Let us move forward!</p>
]]></content:encoded>
					
					<wfw:commentRss>https://lemire.me/blog/2026/01/14/how-stagnant-is-cpu-technology/feed/</wfw:commentRss>
			<slash:comments>2</slash:comments>
		
		
			</item>
		<item>
		<title>What I Got Wrong About &#8220;Hard Work&#8221; in My 20s</title>
		<link>https://lemire.me/blog/2026/01/08/what-i-got-wrong-about-hard-work-in-my-20s/</link>
					<comments>https://lemire.me/blog/2026/01/08/what-i-got-wrong-about-hard-work-in-my-20s/#comments</comments>
		
		<dc:creator><![CDATA[Daniel Lemire]]></dc:creator>
		<pubDate>Thu, 08 Jan 2026 00:39:36 +0000</pubDate>
				<category><![CDATA[]]></category>
		<guid isPermaLink="false">https://lemire.me/blog/?p=22447</guid>

					<description><![CDATA[<img width="150" height="150" src="https://lemire.me/blog/wp-content/uploads/2026/01/Capture-decran-le-2026-01-07-a-19.39.15-150x150.png" class="webfeedsFeaturedVisual wp-post-image" alt="" style="display: block; margin-bottom: 5px; clear:both;max-width: 100%;" link_thumbnail="" decoding="async" loading="lazy" />When I was younger, in my 20s, I assumed that everyone was working &#8220;hard,&#8221; meaning a solid 35 hours of work a week. Especially, say, university professors and professional engineers. I&#8217;d feel terribly guilty when I would be messing around, playing video games on a workday. Today I realize that most people become very adept &#8230; <a href="https://lemire.me/blog/2026/01/08/what-i-got-wrong-about-hard-work-in-my-20s/" class="more-link">Continue reading <span class="screen-reader-text">What I Got Wrong About &#8220;Hard Work&#8221; in My 20s</span></a>]]></description>
										<content:encoded><![CDATA[<img width="150" height="150" src="https://lemire.me/blog/wp-content/uploads/2026/01/Capture-decran-le-2026-01-07-a-19.39.15-150x150.png" class="webfeedsFeaturedVisual wp-post-image" alt="" style="display: block; margin-bottom: 5px; clear:both;max-width: 100%;" link_thumbnail="" decoding="async" loading="lazy" /><p>When I was younger, in my 20s, I assumed that everyone was working &#8220;hard,&#8221; meaning a solid 35 hours of work a week. Especially, say, university professors and professional engineers. I&#8217;d feel terribly guilty when I would be messing around, playing video games on a workday.</p>
<p>Today I realize that most people become very adept at avoiding actual work. And the people you think are working really hard are often just very good at focusing on what is externally visible. They show up to the right meetings but unashamedly avoid the hard work.</p>
<p>It ends up being visible to the people &#8220;who know.&#8221; Why? Because working hard is how you acquire actual expertise. And lack of actual expertise ends up being visible&#8230; but only to those who have the relevant expertise.</p>
<p>And the effect compounds. The difference between someone who has honed their skills for 20 years and someone who has merely showed up to the right meetings becomes enormous. And so, we end up with huge competency gaps between people who are in their 30s, 40s, 50s. It becomes night and day.</p>
]]></content:encoded>
					
					<wfw:commentRss>https://lemire.me/blog/2026/01/08/what-i-got-wrong-about-hard-work-in-my-20s/feed/</wfw:commentRss>
			<slash:comments>1</slash:comments>
		
		
			</item>
		<item>
		<title>A bit of glass and freedom is all you need</title>
		<link>https://lemire.me/blog/2026/01/07/a-bit-of-glass-and-freedom-is-all-you-need/</link>
					<comments>https://lemire.me/blog/2026/01/07/a-bit-of-glass-and-freedom-is-all-you-need/#comments</comments>
		
		<dc:creator><![CDATA[Daniel Lemire]]></dc:creator>
		<pubDate>Wed, 07 Jan 2026 00:47:23 +0000</pubDate>
				<category><![CDATA[]]></category>
		<guid isPermaLink="false">https://lemire.me/blog/?p=22437</guid>

					<description><![CDATA[<img width="150" height="150" src="https://lemire.me/blog/wp-content/uploads/2026/01/Unknown-150x150.png" class="webfeedsFeaturedVisual wp-post-image" alt="" style="display: block; margin-bottom: 5px; clear:both;max-width: 100%;" link_thumbnail="" decoding="async" loading="lazy" />Galileo Galilei was the OpenAI of his time. He helped establish modern science by emphasizing experimentation as the primary means to uncover natural truths. To this end, he built his own telescopes. He revealed to the world the moons of Jupiter, thereby changing forever how we viewed the cosmos.   How was Galileo able to &#8230; <a href="https://lemire.me/blog/2026/01/07/a-bit-of-glass-and-freedom-is-all-you-need/" class="more-link">Continue reading <span class="screen-reader-text">A bit of glass and freedom is all you need</span></a>]]></description>
										<content:encoded><![CDATA[<img width="150" height="150" src="https://lemire.me/blog/wp-content/uploads/2026/01/Unknown-150x150.png" class="webfeedsFeaturedVisual wp-post-image" alt="" style="display: block; margin-bottom: 5px; clear:both;max-width: 100%;" link_thumbnail="" decoding="async" loading="lazy" /><div data-rbd-draggable-context-id="14" data-rbd-draggable-id="3lg85">
<div class="" data-block="true" data-editor="fb7vq" data-offset-key="3lg85-0-0">
<div class="public-DraftStyleDefault-block public-DraftStyleDefault-ltr" data-offset-key="3lg85-0-0"><span data-offset-key="3lg85-0-0">Galileo Galilei was the OpenAI of his time. He helped establish modern science by emphasizing experimentation as the primary means to uncover natural truths. To this end, he built his own telescopes. He revealed to the world the moons of Jupiter, thereby changing forever how we viewed the cosmos.</span></div>
</div>
</div>
<div data-rbd-draggable-context-id="14" data-rbd-draggable-id="afarn">
<div class="" data-block="true" data-editor="fb7vq" data-offset-key="afarn-0-0">
<div class="public-DraftStyleDefault-block public-DraftStyleDefault-ltr" data-offset-key="afarn-0-0"><span data-offset-key="afarn-0-0"> </span></div>
</div>
</div>
<div data-rbd-draggable-context-id="14" data-rbd-draggable-id="6qbds">
<div class="" data-block="true" data-editor="fb7vq" data-offset-key="6qbds-0-0">
<div class="public-DraftStyleDefault-block public-DraftStyleDefault-ltr" data-offset-key="6qbds-0-0"><span data-offset-key="6qbds-0-0">How was Galileo able to design better telescopes than others ? If you have ever been to Venice, you may know that it was famous for its glassmakers. There is a small island nearby (Murano) where there are still glassmakers. Further, Venice had some of the best merchants of Europe, so they could export their glass worldwide.</span></div>
</div>
</div>
<div data-rbd-draggable-context-id="14" data-rbd-draggable-id="53jsd">
<div class="" data-block="true" data-editor="fb7vq" data-offset-key="53jsd-0-0">
<div class="public-DraftStyleDefault-block public-DraftStyleDefault-ltr" data-offset-key="53jsd-0-0"><span data-offset-key="53jsd-0-0"> </span></div>
</div>
</div>
<div data-rbd-draggable-context-id="14" data-rbd-draggable-id="cdp89">
<div class="" data-block="true" data-editor="fb7vq" data-offset-key="cdp89-0-0">
<div class="public-DraftStyleDefault-block public-DraftStyleDefault-ltr" data-offset-key="cdp89-0-0"><span data-offset-key="cdp89-0-0">This is an important lesson as to what drives innovation. It is not a linear process. Living near people making fancy glasses could be the key you need.</span></div>
</div>
</div>
<div data-rbd-draggable-context-id="14" data-rbd-draggable-id="70gq6">
<div class="" data-block="true" data-editor="fb7vq" data-offset-key="70gq6-0-0">
<div class="public-DraftStyleDefault-block public-DraftStyleDefault-ltr" data-offset-key="70gq6-0-0"><span data-offset-key="70gq6-0-0"> </span></div>
</div>
</div>
<div data-rbd-draggable-context-id="14" data-rbd-draggable-id="1u12p">
<div class="" data-block="true" data-editor="fb7vq" data-offset-key="1u12p-0-0">
<div class="public-DraftStyleDefault-block public-DraftStyleDefault-ltr" data-offset-key="1u12p-0-0"><span data-offset-key="1u12p-0-0">A common misconception portrays Galileo as persecuted solely for advocating heliocentrism, the idea that Earth orbits the Sun. In reality, he spent most of his career challenging established doctrines and thrived under Church patronage. Galileo overturned the widespread belief that heavier objects fall faster, and this achievement, if nothing else, brought him greater fame. When he gathered strong evidence for heliocentrism, he initially faced only cautions rather than outright condemnation.</span></div>
</div>
</div>
<div data-rbd-draggable-context-id="14" data-rbd-draggable-id="1mt2m">
<div class="" data-block="true" data-editor="fb7vq" data-offset-key="1mt2m-0-0">
<div class="public-DraftStyleDefault-block public-DraftStyleDefault-ltr" data-offset-key="1mt2m-0-0"><span data-offset-key="1mt2m-0-0"> </span></div>
</div>
</div>
<div data-rbd-draggable-context-id="14" data-rbd-draggable-id="6hhla">
<div class="" data-block="true" data-editor="fb7vq" data-offset-key="6hhla-0-0">
<div class="public-DraftStyleDefault-block public-DraftStyleDefault-ltr" data-offset-key="6hhla-0-0"><span data-offset-key="6hhla-0-0">Pope Urban VIII had personally permitted Galileo to discuss heliocentrism as a hypothesis and even requested that his own arguments on the matter be included. However, Galileo placed these papal views in the mouth of Simplicio, a character portrayed as intellectually inadequate in defending the traditional geocentric position. This was widely interpreted as a mockery.</span></div>
</div>
</div>
<div data-rbd-draggable-context-id="14" data-rbd-draggable-id="4q8bd">
<div class="" data-block="true" data-editor="fb7vq" data-offset-key="4q8bd-0-0">
<div class="public-DraftStyleDefault-block public-DraftStyleDefault-ltr" data-offset-key="4q8bd-0-0"><span data-offset-key="4q8bd-0-0"> </span></div>
</div>
</div>
<div data-rbd-draggable-context-id="14" data-rbd-draggable-id="f1nr">
<div class="" data-block="true" data-editor="fb7vq" data-offset-key="f1nr-0-0">
<div class="public-DraftStyleDefault-block public-DraftStyleDefault-ltr" data-offset-key="f1nr-0-0"><span data-offset-key="f1nr-0-0">Galileo was sentenced to house arrest, during which he continued productive work. The ban on his Copernican writings applied mainly within Catholic territories, allowing their dissemination elsewhere in Europe.</span></div>
</div>
</div>
<div data-rbd-draggable-context-id="14" data-rbd-draggable-id="d1gss">
<div class="" data-block="true" data-editor="fb7vq" data-offset-key="d1gss-0-0">
<div class="public-DraftStyleDefault-block public-DraftStyleDefault-ltr" data-offset-key="d1gss-0-0"><span data-offset-key="d1gss-0-0"> </span></div>
</div>
</div>
<div data-rbd-draggable-context-id="14" data-rbd-draggable-id="5f2np">
<div class="" data-block="true" data-editor="fb7vq" data-offset-key="5f2np-0-0">
<div class="public-DraftStyleDefault-block public-DraftStyleDefault-ltr" data-offset-key="5f2np-0-0"><span data-offset-key="5f2np-0-0">Thus another important element that made Galileo possible was the relative freedom he enjoyed.</span></div>
</div>
</div>
<div data-rbd-draggable-context-id="14" data-rbd-draggable-id="65qs3">
<div class="" data-block="true" data-editor="fb7vq" data-offset-key="65qs3-0-0">
<div class="public-DraftStyleDefault-block public-DraftStyleDefault-ltr" data-offset-key="65qs3-0-0"><span data-offset-key="65qs3-0-0"> </span></div>
</div>
</div>
<div data-rbd-draggable-context-id="14" data-rbd-draggable-id="7sne8">
<div class="" data-block="true" data-editor="fb7vq" data-offset-key="7sne8-0-0">
<div class="public-DraftStyleDefault-block public-DraftStyleDefault-ltr" data-offset-key="7sne8-0-0"><span data-offset-key="7sne8-0-0">You want to innovate ? Don&#8217;t live in the world of ideas solely. Don&#8217;t be shy about mixing with commercial interest. And make sure to have a bit of freedom.</span></div>
</div>
</div>
]]></content:encoded>
					
					<wfw:commentRss>https://lemire.me/blog/2026/01/07/a-bit-of-glass-and-freedom-is-all-you-need/feed/</wfw:commentRss>
			<slash:comments>4</slash:comments>
		
		
			</item>
		<item>
		<title>Technology is culture</title>
		<link>https://lemire.me/blog/2026/01/01/technology-is-culture/</link>
					<comments>https://lemire.me/blog/2026/01/01/technology-is-culture/#comments</comments>
		
		<dc:creator><![CDATA[Daniel Lemire]]></dc:creator>
		<pubDate>Thu, 01 Jan 2026 14:03:25 +0000</pubDate>
				<category><![CDATA[]]></category>
		<guid isPermaLink="false">https://lemire.me/blog/?p=22425</guid>

					<description><![CDATA[<img width="150" height="150" src="https://lemire.me/blog/wp-content/uploads/2026/01/IMG_1492-150x150.jpeg" class="webfeedsFeaturedVisual wp-post-image" alt="" style="display: block; margin-bottom: 5px; clear:both;max-width: 100%;" link_thumbnail="" decoding="async" loading="lazy" />We are experiencing one of the most significant technological breakthroughs of the last few decades. Call it what you will: AI, generative AI, large language models&#8230; But where does it come from? Academics will tell you that it stems from decades of mathematical efforts on campus. But think about it: if this were the best &#8230; <a href="https://lemire.me/blog/2026/01/01/technology-is-culture/" class="more-link">Continue reading <span class="screen-reader-text">Technology is culture</span></a>]]></description>
										<content:encoded><![CDATA[<img width="150" height="150" src="https://lemire.me/blog/wp-content/uploads/2026/01/IMG_1492-150x150.jpeg" class="webfeedsFeaturedVisual wp-post-image" alt="" style="display: block; margin-bottom: 5px; clear:both;max-width: 100%;" link_thumbnail="" decoding="async" loading="lazy" /><p>We are experiencing one of the most significant technological breakthroughs of the last few decades. Call it what you will: AI, generative AI, large language models&#8230;</p>
<p>But where does it come from? Academics will tell you that it stems from decades of mathematical efforts on campus. But think about it: if this were the best model to explain what happened, where would the current breakthroughs have occurred? They would have happened on campus first, then propagated to industry. That’s the linear model of innovation—a rather indefensible one.</p>
<p>Technology is culture. Technological progress does not follow a path from the blackboard of a middle-aged MIT professor to your desk, via a corporation.</p>
<p>So what is the cultural background? Of course, there is hacker culture and the way hackers won a culture war in the 1980s by becoming cool enough to have a seat at the table.</p>
<p>But closer to us… I believe there are two main roots. The first is gaming. Gamers wanted photorealistic, high-performance games. They built powerful machines capable of solving linear algebra problems at very high speeds.</p>
<p>Powerful computing alone, however, does you no good if you want to build an AI. That’s where web culture came in. Everything was networked, published, republished. Web nerds helped build the greatest library the world had ever seen.<br />
These two cultures came together to generate the current revolution.</p>
<p>If you like my model, I submit that it has a few interesting consequences. The most immediate one is that if you want to understand how and where technological progress happens, you have to look at cultural drivers—not at what professors at MIT are publishing.</p>
]]></content:encoded>
					
					<wfw:commentRss>https://lemire.me/blog/2026/01/01/technology-is-culture/feed/</wfw:commentRss>
			<slash:comments>4</slash:comments>
		
		
			</item>
		<item>
		<title>The culture war that we won</title>
		<link>https://lemire.me/blog/2025/12/31/the-culture-war-that-we-won/</link>
					<comments>https://lemire.me/blog/2025/12/31/the-culture-war-that-we-won/#respond</comments>
		
		<dc:creator><![CDATA[Daniel Lemire]]></dc:creator>
		<pubDate>Wed, 31 Dec 2025 15:26:25 +0000</pubDate>
				<category><![CDATA[]]></category>
		<guid isPermaLink="false">https://lemire.me/blog/?p=22419</guid>

					<description><![CDATA[<img width="150" height="150" src="https://lemire.me/blog/wp-content/uploads/2025/12/G9gXfT1XgAEsoUQ-150x150.jpeg" class="webfeedsFeaturedVisual wp-post-image" alt="" style="display: block; margin-bottom: 5px; clear:both;max-width: 100%;" link_thumbnail="" decoding="async" loading="lazy" />Culture wars are real. They occur when a dominant culture faces a serious challenge. But unless you pay close attention, you might miss them entirely. As a kid, I was a “nerd.” I read a lot and spent hours on my computer. I devoured science and technology magazines. I taught myself programming. “Great!” you might &#8230; <a href="https://lemire.me/blog/2025/12/31/the-culture-war-that-we-won/" class="more-link">Continue reading <span class="screen-reader-text">The culture war that we won</span></a>]]></description>
										<content:encoded><![CDATA[<img width="150" height="150" src="https://lemire.me/blog/wp-content/uploads/2025/12/G9gXfT1XgAEsoUQ-150x150.jpeg" class="webfeedsFeaturedVisual wp-post-image" alt="" style="display: block; margin-bottom: 5px; clear:both;max-width: 100%;" link_thumbnail="" decoding="async" loading="lazy" /><p><span class="css-1jxf684 r-bcqeeo r-1ttztb7 r-qvutc0 r-poiln3">Culture wars are real. They occur when a dominant culture faces a serious challenge. But unless you pay close attention, you might miss them entirely. As a kid, I was a “nerd.” I read a lot and spent hours on my computer. I devoured science and technology magazines. I taught myself programming. “Great!” you might think. Not at all. This was not valued where and when I grew up. Computers were seen as toys. A kid who spent a lot of time on a computer was viewed as obsessed with a rather dull plaything. We had a computer club, but it was essentially a gathering of “social rejects.” No one looked up to us. Working with computers carried no prestige. Dungeons &amp; Dragons was outright “dangerous”—you had to hide any interest in such games. The 1983 movie WarGames stands out precisely because the computer-obsessed kid gets the girl and saves the world. Bill Gates was becoming famous around that time, but this marked only the beginning of a decade-long culture war in which hacker culture gradually rose to dominance.</span></p>
<p><span class="css-1jxf684 r-bcqeeo r-1ttztb7 r-qvutc0 r-poiln3"> Today, most people can speak the language of hackers. It did not have to turn out this way, and it did not unfold identically everywhere. The status of hacker culture is high in the United States, but it remains lower in many other places even now. Even so, in many organizations today, even in the United States, the « computer people » are stored in the basement. We do not let them out too often. They are not « people persons ». So the culture war was won by the hackers, the victory is undeniable. But as with all wars, the result is more nuanced that one might think. Many would like nothing more than to send back the computer people at the bottom of the prestige ladder. </span></p>
<div class="css-175oi2r r-6koalj r-6gpygo r-1s2bzr4">
<div class="css-175oi2r r-xoduu5">
<div class="css-175oi2r r-xoduu5">
<div class="css-175oi2r r-xoduu5 r-1ets6dv r-1phboty r-rs99b7 r-1867qdf r-1udh08x r-o7ynqc r-6416eg r-1ny4l3l">
<div class="css-175oi2r r-xoduu5">
<div class="css-175oi2r r-xoduu5 r-16y2uox r-1pi2tsx r-13qz1uu">
<div class="css-175oi2r r-1adg3ll r-1udh08x"></div>
</div>
</div>
</div>
</div>
</div>
</div>
<p><span class="css-1jxf684 r-bcqeeo r-1ttztb7 r-qvutc0 r-poiln3">Salaries are a good indicator for prestige. In the USA, in Australia and in Switzerland, « computer people » have high salaries and relatively high status. In the UK as a whole? Not so much. I bet you do better as a « financial analyst » over there. </span></p>
<p><span class="css-1jxf684 r-bcqeeo r-1ttztb7 r-qvutc0 r-poiln3">What is worth watching is the effect that « AI » will have on the status battles. In some sense, building software that can do financial, political and legal analysis is the latest weapon in the arsenal of the computer people. Many despair about what AI might do to software developers: I recommend looking at it in the context of the hacker culture war.</span></p>
]]></content:encoded>
					
					<wfw:commentRss>https://lemire.me/blog/2025/12/31/the-culture-war-that-we-won/feed/</wfw:commentRss>
			<slash:comments>0</slash:comments>
		
		
			</item>
		<item>
		<title>By how much does your memory allocator overallocate?</title>
		<link>https://lemire.me/blog/2025/12/30/by-how-much-does-your-memory-allocator-overallocates/</link>
					<comments>https://lemire.me/blog/2025/12/30/by-how-much-does-your-memory-allocator-overallocates/#respond</comments>
		
		<dc:creator><![CDATA[Daniel Lemire]]></dc:creator>
		<pubDate>Tue, 30 Dec 2025 19:15:55 +0000</pubDate>
				<category><![CDATA[]]></category>
		<guid isPermaLink="false">https://lemire.me/blog/?p=22414</guid>

					<description><![CDATA[<img width="150" height="150" src="https://lemire.me/blog/wp-content/uploads/2025/12/plot-150x150.png" class="webfeedsFeaturedVisual wp-post-image" alt="" style="display: block; margin-bottom: 5px; clear:both;max-width: 100%;" link_thumbnail="" decoding="async" loading="lazy" />How much virtual memory does the following C++ expression allocate on the heap? new char[4096] The answer is at least 4 kibibytes but surely more. Firstly, each heap memory allocation requires some memory to keep track of what has been allocated. You are likely using 8 bytes or so of overhead that your program cannot &#8230; <a href="https://lemire.me/blog/2025/12/30/by-how-much-does-your-memory-allocator-overallocates/" class="more-link">Continue reading <span class="screen-reader-text">By how much does your memory allocator overallocate?</span></a>]]></description>
										<content:encoded><![CDATA[<img width="150" height="150" src="https://lemire.me/blog/wp-content/uploads/2025/12/plot-150x150.png" class="webfeedsFeaturedVisual wp-post-image" alt="" style="display: block; margin-bottom: 5px; clear:both;max-width: 100%;" link_thumbnail="" decoding="async" loading="lazy" /><p>How much virtual memory does the following C++ expression allocate on the heap?</p>
<div class="highlight" style="background: #f8f8f8;">
<pre style="line-height: 125%;"><code><span style="color: #204a87; font-weight: bold;">new</span> <span style="color: #204a87; font-weight: bold;">char</span><span style="color: #000; font-weight: bold;">[</span><span style="color: #0000cf; font-weight: bold;">4096</span><span style="color: #000; font-weight: bold;">]</span>
</code></pre>
</div>
<p>The answer is <em>at least 4 kibibytes</em> but surely more.</p>
<p>Firstly, each heap memory allocation requires some memory to keep track of what has been allocated. You are likely using 8 bytes or so of overhead that your program cannot access.</p>
<p>Secondly, the memory allocator may allocate a bit more than the 4096 bytes you requested. On a Linux machine, I found that it would allocate 4104 bytes, so 8 extra bytes that are usable by your program. You can check this value by calling <code>malloc_usable_size</code> under Linux.</p>
<p>Thus, overall, you may end up with an extra 16 bytes allocated when you requested 4096 bytes. It is an overhead of about 0.4%. You are basically <em>wasting</em> a byte for every 256 bytes that you allocate.</p>
<p>But that is not the worst possible case. On macOS, let us consider the following line of code.</p>
<div class="highlight" style="background: #f8f8f8;">
<pre style="line-height: 125%;"><code><span style="color: #204a87; font-weight: bold;">new</span> <span style="color: #204a87; font-weight: bold;">char</span><span style="color: #000; font-weight: bold;">[</span><span style="color: #0000cf; font-weight: bold;">3585</span><span style="color: #000; font-weight: bold;">]</span>
</code></pre>
</div>
<p>The system reports an allocation of 4096 bytes: a 14% overhead. What is happening is that macOS rounds up the memory allocation to the nearest 512 byte boundary for moderately small allocations. If you try allocating even larger memory blocks, it starts rounding up even more.</p>
<p><a href="http://lemire.me/blog/wp-content/uploads/2025/12/plot.png"><img loading="lazy" decoding="async" class="alignnone size-full wp-image-22415" src="http://lemire.me/blog/wp-content/uploads/2025/12/plot.png" alt="" width="878" height="535" srcset="https://lemire.me/blog/wp-content/uploads/2025/12/plot.png 878w, https://lemire.me/blog/wp-content/uploads/2025/12/plot-300x183.png 300w, https://lemire.me/blog/wp-content/uploads/2025/12/plot-768x468.png 768w" sizes="auto, (max-width: 878px) 100vw, 878px" /></a></p>
]]></content:encoded>
					
					<wfw:commentRss>https://lemire.me/blog/2025/12/30/by-how-much-does-your-memory-allocator-overallocates/feed/</wfw:commentRss>
			<slash:comments>0</slash:comments>
		
		
			</item>
		<item>
		<title>Freedom from incompetence</title>
		<link>https://lemire.me/blog/2025/12/29/freedom-from-incompetence/</link>
					<comments>https://lemire.me/blog/2025/12/29/freedom-from-incompetence/#comments</comments>
		
		<dc:creator><![CDATA[Daniel Lemire]]></dc:creator>
		<pubDate>Mon, 29 Dec 2025 14:41:00 +0000</pubDate>
				<category><![CDATA[]]></category>
		<guid isPermaLink="false">https://lemire.me/blog/?p=22411</guid>

					<description><![CDATA[<img width="150" height="150" src="https://lemire.me/blog/wp-content/uploads/2025/12/G9RgcZXW8AASe2S-150x150.jpeg" class="webfeedsFeaturedVisual wp-post-image" alt="" style="display: block; margin-bottom: 5px; clear:both;max-width: 100%;" link_thumbnail="" decoding="async" loading="lazy" />Many people say that they crave more freedom. But what do we mean by “freedom”? Being free from constraints? Is that what we mean? Would you feel “freer” if you could walk outside in your underwear? It is almost surely not what you mean by “freedom.” I submit to you that it is almost always &#8230; <a href="https://lemire.me/blog/2025/12/29/freedom-from-incompetence/" class="more-link">Continue reading <span class="screen-reader-text">Freedom from incompetence</span></a>]]></description>
										<content:encoded><![CDATA[<img width="150" height="150" src="https://lemire.me/blog/wp-content/uploads/2025/12/G9RgcZXW8AASe2S-150x150.jpeg" class="webfeedsFeaturedVisual wp-post-image" alt="" style="display: block; margin-bottom: 5px; clear:both;max-width: 100%;" link_thumbnail="" decoding="async" loading="lazy" /><p>Many people say that they crave more freedom.<br />
But what do we mean by “freedom”?<br />
Being free from constraints? Is that what we mean? Would you feel “freer” if you could walk outside in your underwear?<br />
It is almost surely not what you mean by “freedom.”<br />
I submit to you that it is almost always the case that if you are frustrated at work by your lack of freedom, the actual problem is competence.<br />
Imagine two scenarios.</p>
<ul>
<li>Scenario A: You work for a highly directive boss. You are constantly accountable for what you do. But everyone around you is highly competent. You need to wear a jacket and a tie, but you work in the best team in the world.</li>
<li>Scenario B: You work in a context where you hardly know who your boss is. You come to work in your underwear. However, everyone is incompetent. You work with the least competent team in the world.</li>
</ul>
<p>Assuming that the salary is the same, which job do you prefer?<br />
I cannot answer for you, but most people I know prefer Scenario A.</p>
]]></content:encoded>
					
					<wfw:commentRss>https://lemire.me/blog/2025/12/29/freedom-from-incompetence/feed/</wfw:commentRss>
			<slash:comments>7</slash:comments>
		
		
			</item>
		<item>
		<title>Don&#8217;t be so eager to rewrite your code</title>
		<link>https://lemire.me/blog/2025/12/28/dont-be-so-eager-to-rewrite-your-code/</link>
					<comments>https://lemire.me/blog/2025/12/28/dont-be-so-eager-to-rewrite-your-code/#comments</comments>
		
		<dc:creator><![CDATA[Daniel Lemire]]></dc:creator>
		<pubDate>Sun, 28 Dec 2025 03:02:44 +0000</pubDate>
				<category><![CDATA[]]></category>
		<guid isPermaLink="false">https://lemire.me/blog/?p=22399</guid>

					<description><![CDATA[<img width="150" height="150" src="https://lemire.me/blog/wp-content/uploads/2025/12/Capture-decran-le-2025-12-27-a-22.02.19-150x150.png" class="webfeedsFeaturedVisual wp-post-image" alt="" style="display: block; margin-bottom: 5px; clear:both;max-width: 100%;" link_thumbnail="" decoding="async" loading="lazy" />I used to always want to rewrite my code. Maybe even use another programming language. « If only I could rewrite my code, it would be so much better now. » If you maintain software projects, you see it all the time. Someone new comes along and they want to start rewriting everything. They always &#8230; <a href="https://lemire.me/blog/2025/12/28/dont-be-so-eager-to-rewrite-your-code/" class="more-link">Continue reading <span class="screen-reader-text">Don&#8217;t be so eager to rewrite your code</span></a>]]></description>
										<content:encoded><![CDATA[<img width="150" height="150" src="https://lemire.me/blog/wp-content/uploads/2025/12/Capture-decran-le-2025-12-27-a-22.02.19-150x150.png" class="webfeedsFeaturedVisual wp-post-image" alt="" style="display: block; margin-bottom: 5px; clear:both;max-width: 100%;" link_thumbnail="" decoding="async" loading="lazy" /><p><span class="css-1jxf684 r-bcqeeo r-1ttztb7 r-qvutc0 r-poiln3">I used to always want to rewrite my code. Maybe even use another programming language. « If only I could rewrite my code, it would be so much better now. » </span></p>
<p><span class="css-1jxf684 r-bcqeeo r-1ttztb7 r-qvutc0 r-poiln3">If you maintain software projects, you see it all the time. Someone new comes along and they want to start rewriting everything. They always have subjective arguments: it is going to be more maintainable or safer or just more elegant. </span></p>
<p><span class="css-1jxf684 r-bcqeeo r-1ttztb7 r-qvutc0 r-poiln3">If your code is battle tested… then the correct instinct is to be conservative and keep your current code. Sometimes you need to rewrite your code : you made a mistake or must change your architecture. </span><span class="css-1jxf684 r-bcqeeo r-1ttztb7 r-qvutc0 r-poiln3">But most times, the old code is fine and investing time in updating your current code is better than starting anew. </span></p>
<p><span class="css-1jxf684 r-bcqeeo r-1ttztb7 r-qvutc0 r-poiln3">The great intellectual Robin Hanson </span><span class="css-1jxf684 r-bcqeeo r-1ttztb7 r-qvutc0 r-poiln3">argues that software ages. One of his arguments is that software engineers say that it does. That’s what engineers feel but whether it is true is another matter. </span></p>
<p><span class="css-1jxf684 r-bcqeeo r-1ttztb7 r-qvutc0 r-poiln3">« Before Borland’s new spreadsheet for Windows shipped, Philippe Kahn, the colorful founder of Borland, was quoted a lot in the press bragging about how Quattro Pro would be much better than Microsoft Excel, because it was written from scratch. All new source code! As if source code rusted. The idea that new code is better than old is patently absurd. Old code has been used. It has been tested. Lots of bugs have been found, and they’ve been fixed. There’s nothing wrong with it. It doesn’t acquire bugs just by sitting around on your hard drive. Au contraire, baby! Is software supposed to be like an old Dodge Dart, that rusts just sitting in the garage? Is software like a teddy bear that’s kind of gross if it’s not made out of all new material? » (Joel Spolsky)</span></p>
]]></content:encoded>
					
					<wfw:commentRss>https://lemire.me/blog/2025/12/28/dont-be-so-eager-to-rewrite-your-code/feed/</wfw:commentRss>
			<slash:comments>4</slash:comments>
		
		
			</item>
		<item>
		<title>Parsing IP addresses quickly (portably, without SIMD magic)</title>
		<link>https://lemire.me/blog/2025/12/27/parsing-ip-addresses-quickly-portably-without-simd-magic/</link>
					<comments>https://lemire.me/blog/2025/12/27/parsing-ip-addresses-quickly-portably-without-simd-magic/#comments</comments>
		
		<dc:creator><![CDATA[Daniel Lemire]]></dc:creator>
		<pubDate>Sat, 27 Dec 2025 23:39:57 +0000</pubDate>
				<category><![CDATA[]]></category>
		<guid isPermaLink="false">https://lemire.me/blog/?p=22393</guid>

					<description><![CDATA[<img width="150" height="150" src="https://lemire.me/blog/wp-content/uploads/2025/12/Capture-decran-le-2025-12-27-a-18.33.45-150x150.png" class="webfeedsFeaturedVisual wp-post-image" alt="" style="display: block; margin-bottom: 5px; clear:both;max-width: 100%;" link_thumbnail="" decoding="async" loading="lazy" />Most programmers are familiar with IP addresses. They take the form of four numbers between 0 and 255 separated by dots: 192.168.0.1. In some sense, it is a convoluted way to represent a 32-bit integer. The modern version of an IP address is IPv6 which is usually surrounded by square brackets. It is less common &#8230; <a href="https://lemire.me/blog/2025/12/27/parsing-ip-addresses-quickly-portably-without-simd-magic/" class="more-link">Continue reading <span class="screen-reader-text">Parsing IP addresses quickly (portably, without SIMD magic)</span></a>]]></description>
										<content:encoded><![CDATA[<img width="150" height="150" src="https://lemire.me/blog/wp-content/uploads/2025/12/Capture-decran-le-2025-12-27-a-18.33.45-150x150.png" class="webfeedsFeaturedVisual wp-post-image" alt="" style="display: block; margin-bottom: 5px; clear:both;max-width: 100%;" link_thumbnail="" decoding="async" loading="lazy" /><p>Most programmers are familiar with IP addresses. They take the form<br />
of four numbers between 0 and 255 separated by dots: <code>192.168.0.1</code>.<br />
In some sense, it is a convoluted way to represent a 32-bit integer.<br />
The <em>modern</em> version of an IP address is IPv6 which is usually surrounded<br />
by square brackets. It is less common in my experience.</p>
<p>Using fancy techniques, <a href="https://lemire.me/blog/2023/06/08/parsing-ip-addresses-crazily-fast/">you can parse IP addresses with as little as 50 instructions</a>. It is a bit complicated and not necessarily portable.</p>
<p>What if you want high speed without too much work or a specialized library? You can try to roll your own. But since I am civilized programmer, I just asked my favorite AI to write it for me.</p>
<div class="highlight" style="background: #f8f8f8;">
<pre style="line-height: 125%;"><code><span style="color: #8f5902; font-style: italic;">// Parse an IPv4 address starting at 'p'.</span>
<span style="color: #8f5902; font-style: italic;">// p : start pointer, pend: end of the string</span>
<span style="color: #000;">std</span><span style="color: #ce5c00; font-weight: bold;">::</span><span style="color: #000;">expected</span><span style="color: #ce5c00; font-weight: bold;">&lt;</span><span style="color: #204a87; font-weight: bold;">uint32_t</span><span style="color: #000; font-weight: bold;">,</span> <span style="color: #000;">parse_error</span><span style="color: #ce5c00; font-weight: bold;">&gt;</span> <span style="color: #000;">parse_manual</span><span style="color: #000; font-weight: bold;">(</span><span style="color: #204a87; font-weight: bold;">const</span> <span style="color: #204a87; font-weight: bold;">char</span> <span style="color: #ce5c00; font-weight: bold;">*</span><span style="color: #000;">p</span><span style="color: #000; font-weight: bold;">,</span> <span style="color: #204a87; font-weight: bold;">const</span> <span style="color: #204a87; font-weight: bold;">char</span> <span style="color: #ce5c00; font-weight: bold;">*</span><span style="color: #000;">pend</span><span style="color: #000; font-weight: bold;">)</span> <span style="color: #000; font-weight: bold;">{</span>
<span style="color: #204a87; font-weight: bold;">uint32_t</span> <span style="color: #000;">ip</span> <span style="color: #ce5c00; font-weight: bold;">=</span> <span style="color: #0000cf; font-weight: bold;">0</span><span style="color: #000; font-weight: bold;">;</span>
    <span style="color: #204a87; font-weight: bold;">int</span> <span style="color: #000;">octets</span> <span style="color: #ce5c00; font-weight: bold;">=</span> <span style="color: #0000cf; font-weight: bold;">0</span><span style="color: #000; font-weight: bold;">;</span>
    <span style="color: #204a87; font-weight: bold;">while</span> <span style="color: #000; font-weight: bold;">(</span><span style="color: #000;">p</span> <span style="color: #ce5c00; font-weight: bold;">&lt;</span> <span style="color: #000;">pend</span> <span style="color: #ce5c00; font-weight: bold;">&amp;&amp;</span> <span style="color: #000;">octets</span> <span style="color: #ce5c00; font-weight: bold;">&lt;</span> <span style="color: #0000cf; font-weight: bold;">4</span><span style="color: #000; font-weight: bold;">)</span> <span style="color: #000; font-weight: bold;">{</span>
        <span style="color: #204a87; font-weight: bold;">uint32_t</span> <span style="color: #000;">val</span> <span style="color: #ce5c00; font-weight: bold;">=</span> <span style="color: #0000cf; font-weight: bold;">0</span><span style="color: #000; font-weight: bold;">;</span>
        <span style="color: #204a87; font-weight: bold;">const</span> <span style="color: #204a87; font-weight: bold;">char</span> <span style="color: #ce5c00; font-weight: bold;">*</span><span style="color: #000;">start</span> <span style="color: #ce5c00; font-weight: bold;">=</span> <span style="color: #000;">p</span><span style="color: #000; font-weight: bold;">;</span>
        <span style="color: #204a87; font-weight: bold;">while</span> <span style="color: #000; font-weight: bold;">(</span><span style="color: #000;">p</span> <span style="color: #ce5c00; font-weight: bold;">&lt;</span> <span style="color: #000;">pend</span> <span style="color: #ce5c00; font-weight: bold;">&amp;&amp;</span> <span style="color: #ce5c00; font-weight: bold;">*</span><span style="color: #000;">p</span> <span style="color: #ce5c00; font-weight: bold;">&gt;=</span> <span style="color: #4e9a06;">'0'</span> <span style="color: #ce5c00; font-weight: bold;">&amp;&amp;</span> <span style="color: #ce5c00; font-weight: bold;">*</span><span style="color: #000;">p</span> <span style="color: #ce5c00; font-weight: bold;">&lt;=</span> <span style="color: #4e9a06;">'9'</span><span style="color: #000; font-weight: bold;">)</span> <span style="color: #000; font-weight: bold;">{</span>
            <span style="color: #000;">val</span> <span style="color: #ce5c00; font-weight: bold;">=</span> <span style="color: #000;">val</span> <span style="color: #ce5c00; font-weight: bold;">*</span> <span style="color: #0000cf; font-weight: bold;">10</span> <span style="color: #ce5c00; font-weight: bold;">+</span> <span style="color: #000; font-weight: bold;">(</span><span style="color: #ce5c00; font-weight: bold;">*</span><span style="color: #000;">p</span> <span style="color: #ce5c00; font-weight: bold;">-</span> <span style="color: #4e9a06;">'0'</span><span style="color: #000; font-weight: bold;">);</span>
            <span style="color: #204a87; font-weight: bold;">if</span> <span style="color: #000; font-weight: bold;">(</span><span style="color: #000;">val</span> <span style="color: #ce5c00; font-weight: bold;">&gt;</span> <span style="color: #0000cf; font-weight: bold;">255</span><span style="color: #000; font-weight: bold;">)</span> <span style="color: #000; font-weight: bold;">{</span>
                <span style="color: #204a87; font-weight: bold;">return</span> <span style="color: #000;">std</span><span style="color: #ce5c00; font-weight: bold;">::</span><span style="color: #000;">unexpected</span><span style="color: #000; font-weight: bold;">(</span><span style="color: #000;">invalid_format</span><span style="color: #000; font-weight: bold;">);</span>
            <span style="color: #000; font-weight: bold;">}</span>
            <span style="color: #000;">p</span><span style="color: #ce5c00; font-weight: bold;">++</span><span style="color: #000; font-weight: bold;">;</span>
        <span style="color: #000; font-weight: bold;">}</span>
        <span style="color: #204a87; font-weight: bold;">if</span> <span style="color: #000; font-weight: bold;">(</span><span style="color: #000;">p</span> <span style="color: #ce5c00; font-weight: bold;">==</span> <span style="color: #000;">start</span> <span style="color: #ce5c00; font-weight: bold;">||</span> <span style="color: #000; font-weight: bold;">(</span><span style="color: #000;">p</span> <span style="color: #ce5c00; font-weight: bold;">-</span> <span style="color: #000;">start</span> <span style="color: #ce5c00; font-weight: bold;">&gt;</span> <span style="color: #0000cf; font-weight: bold;">1</span> <span style="color: #ce5c00; font-weight: bold;">&amp;&amp;</span> <span style="color: #ce5c00; font-weight: bold;">*</span><span style="color: #000;">start</span> <span style="color: #ce5c00; font-weight: bold;">==</span> <span style="color: #4e9a06;">'0'</span><span style="color: #000; font-weight: bold;">))</span> <span style="color: #000; font-weight: bold;">{</span>
            <span style="color: #204a87; font-weight: bold;">return</span> <span style="color: #000;">std</span><span style="color: #ce5c00; font-weight: bold;">::</span><span style="color: #000;">unexpected</span><span style="color: #000; font-weight: bold;">(</span><span style="color: #000;">invalid_format</span><span style="color: #000; font-weight: bold;">);</span>
        <span style="color: #000; font-weight: bold;">}</span>
        <span style="color: #000;">ip</span> <span style="color: #ce5c00; font-weight: bold;">=</span> <span style="color: #000; font-weight: bold;">(</span><span style="color: #000;">ip</span> <span style="color: #ce5c00; font-weight: bold;">&lt;&lt;</span> <span style="color: #0000cf; font-weight: bold;">8</span><span style="color: #000; font-weight: bold;">)</span> <span style="color: #ce5c00; font-weight: bold;">|</span> <span style="color: #000;">val</span><span style="color: #000; font-weight: bold;">;</span>
        <span style="color: #000;">octets</span><span style="color: #ce5c00; font-weight: bold;">++</span><span style="color: #000; font-weight: bold;">;</span>

        <span style="color: #204a87; font-weight: bold;">if</span> <span style="color: #000; font-weight: bold;">(</span><span style="color: #000;">octets</span> <span style="color: #ce5c00; font-weight: bold;">&lt;</span> <span style="color: #0000cf; font-weight: bold;">4</span><span style="color: #000; font-weight: bold;">)</span> <span style="color: #000; font-weight: bold;">{</span>
            <span style="color: #204a87; font-weight: bold;">if</span> <span style="color: #000; font-weight: bold;">(</span><span style="color: #000;">p</span> <span style="color: #ce5c00; font-weight: bold;">==</span> <span style="color: #000;">pend</span> <span style="color: #ce5c00; font-weight: bold;">||</span> <span style="color: #ce5c00; font-weight: bold;">*</span><span style="color: #000;">p</span> <span style="color: #ce5c00; font-weight: bold;">!=</span> <span style="color: #4e9a06;">'.'</span><span style="color: #000; font-weight: bold;">)</span> <span style="color: #000; font-weight: bold;">{</span>
                <span style="color: #204a87; font-weight: bold;">return</span> <span style="color: #000;">std</span><span style="color: #ce5c00; font-weight: bold;">::</span><span style="color: #000;">unexpected</span><span style="color: #000; font-weight: bold;">(</span><span style="color: #000;">invalid_format</span><span style="color: #000; font-weight: bold;">);</span>
            <span style="color: #000; font-weight: bold;">}</span>
            <span style="color: #000;">p</span><span style="color: #ce5c00; font-weight: bold;">++</span><span style="color: #000; font-weight: bold;">;</span> <span style="color: #8f5902; font-style: italic;">// Skip dot</span>
        <span style="color: #000; font-weight: bold;">}</span>
    <span style="color: #000; font-weight: bold;">}</span>
    <span style="color: #204a87; font-weight: bold;">if</span> <span style="color: #000; font-weight: bold;">(</span><span style="color: #000;">octets</span> <span style="color: #ce5c00; font-weight: bold;">==</span> <span style="color: #0000cf; font-weight: bold;">4</span> <span style="color: #ce5c00; font-weight: bold;">&amp;&amp;</span> <span style="color: #000;">p</span> <span style="color: #ce5c00; font-weight: bold;">==</span> <span style="color: #000;">pend</span><span style="color: #000; font-weight: bold;">)</span> <span style="color: #000; font-weight: bold;">{</span>
        <span style="color: #204a87; font-weight: bold;">return</span> <span style="color: #000;">ip</span><span style="color: #000; font-weight: bold;">;</span>
    <span style="color: #000; font-weight: bold;">}</span> <span style="color: #204a87; font-weight: bold;">else</span> <span style="color: #000; font-weight: bold;">{</span>
        <span style="color: #204a87; font-weight: bold;">return</span> <span style="color: #000;">std</span><span style="color: #ce5c00; font-weight: bold;">::</span><span style="color: #000;">unexpected</span><span style="color: #000; font-weight: bold;">(</span><span style="color: #000;">invalid_format</span><span style="color: #000; font-weight: bold;">);</span>
    <span style="color: #000; font-weight: bold;">}</span>
<span style="color: #000; font-weight: bold;">}</span>
</code></pre>
</div>
<p>It was immediately clear to me that this function was not as fast as it could be. I then asked the AI to improve the result by using the fact that each number is made of between one and three digits. I got the following reasonable function.</p>
<div class="highlight" style="background: #f8f8f8;">
<pre style="line-height: 125%;"><code><span style="color: #000;">std</span><span style="color: #ce5c00; font-weight: bold;">::</span><span style="color: #000;">expected</span><span style="color: #ce5c00; font-weight: bold;">&lt;</span><span style="color: #204a87; font-weight: bold;">uint32_t</span><span style="color: #000; font-weight: bold;">,</span> <span style="color: #000;">parse_error</span><span style="color: #ce5c00; font-weight: bold;">&gt;</span> <span style="color: #000;">parse_manual_unrolled</span><span style="color: #000; font-weight: bold;">(</span><span style="color: #204a87; font-weight: bold;">const</span> <span style="color: #204a87; font-weight: bold;">char</span> <span style="color: #ce5c00; font-weight: bold;">*</span><span style="color: #000;">p</span><span style="color: #000; font-weight: bold;">,</span> <span style="color: #204a87; font-weight: bold;">const</span> <span style="color: #204a87; font-weight: bold;">char</span> <span style="color: #ce5c00; font-weight: bold;">*</span><span style="color: #000;">pend</span><span style="color: #000; font-weight: bold;">)</span> <span style="color: #000; font-weight: bold;">{</span>
    <span style="color: #204a87; font-weight: bold;">uint32_t</span> <span style="color: #000;">ip</span> <span style="color: #ce5c00; font-weight: bold;">=</span> <span style="color: #0000cf; font-weight: bold;">0</span><span style="color: #000; font-weight: bold;">;</span>
    <span style="color: #204a87; font-weight: bold;">int</span> <span style="color: #000;">octets</span> <span style="color: #ce5c00; font-weight: bold;">=</span> <span style="color: #0000cf; font-weight: bold;">0</span><span style="color: #000; font-weight: bold;">;</span>
    <span style="color: #204a87; font-weight: bold;">while</span> <span style="color: #000; font-weight: bold;">(</span><span style="color: #000;">p</span> <span style="color: #ce5c00; font-weight: bold;">&lt;</span> <span style="color: #000;">pend</span> <span style="color: #ce5c00; font-weight: bold;">&amp;&amp;</span> <span style="color: #000;">octets</span> <span style="color: #ce5c00; font-weight: bold;">&lt;</span> <span style="color: #0000cf; font-weight: bold;">4</span><span style="color: #000; font-weight: bold;">)</span> <span style="color: #000; font-weight: bold;">{</span>
        <span style="color: #204a87; font-weight: bold;">uint32_t</span> <span style="color: #000;">val</span> <span style="color: #ce5c00; font-weight: bold;">=</span> <span style="color: #0000cf; font-weight: bold;">0</span><span style="color: #000; font-weight: bold;">;</span>
        <span style="color: #204a87; font-weight: bold;">if</span> <span style="color: #000; font-weight: bold;">(</span><span style="color: #000;">p</span> <span style="color: #ce5c00; font-weight: bold;">&lt;</span> <span style="color: #000;">pend</span> <span style="color: #ce5c00; font-weight: bold;">&amp;&amp;</span> <span style="color: #ce5c00; font-weight: bold;">*</span><span style="color: #000;">p</span> <span style="color: #ce5c00; font-weight: bold;">&gt;=</span> <span style="color: #4e9a06;">'0'</span> <span style="color: #ce5c00; font-weight: bold;">&amp;&amp;</span> <span style="color: #ce5c00; font-weight: bold;">*</span><span style="color: #000;">p</span> <span style="color: #ce5c00; font-weight: bold;">&lt;=</span> <span style="color: #4e9a06;">'9'</span><span style="color: #000; font-weight: bold;">)</span> <span style="color: #000; font-weight: bold;">{</span>
            <span style="color: #000;">val</span> <span style="color: #ce5c00; font-weight: bold;">=</span> <span style="color: #000; font-weight: bold;">(</span><span style="color: #ce5c00; font-weight: bold;">*</span><span style="color: #000;">p</span><span style="color: #ce5c00; font-weight: bold;">++</span> <span style="color: #ce5c00; font-weight: bold;">-</span> <span style="color: #4e9a06;">'0'</span><span style="color: #000; font-weight: bold;">);</span>
            <span style="color: #204a87; font-weight: bold;">if</span> <span style="color: #000; font-weight: bold;">(</span><span style="color: #000;">p</span> <span style="color: #ce5c00; font-weight: bold;">&lt;</span> <span style="color: #000;">pend</span> <span style="color: #ce5c00; font-weight: bold;">&amp;&amp;</span> <span style="color: #ce5c00; font-weight: bold;">*</span><span style="color: #000;">p</span> <span style="color: #ce5c00; font-weight: bold;">&gt;=</span> <span style="color: #4e9a06;">'0'</span> <span style="color: #ce5c00; font-weight: bold;">&amp;&amp;</span> <span style="color: #ce5c00; font-weight: bold;">*</span><span style="color: #000;">p</span> <span style="color: #ce5c00; font-weight: bold;">&lt;=</span> <span style="color: #4e9a06;">'9'</span><span style="color: #000; font-weight: bold;">)</span> <span style="color: #000; font-weight: bold;">{</span>
                <span style="color: #204a87; font-weight: bold;">if</span> <span style="color: #000; font-weight: bold;">(</span><span style="color: #000;">val</span> <span style="color: #ce5c00; font-weight: bold;">==</span> <span style="color: #0000cf; font-weight: bold;">0</span><span style="color: #000; font-weight: bold;">)</span> <span style="color: #000; font-weight: bold;">{</span> 
<span style="color: #204a87; font-weight: bold;">                  return</span> <span style="color: #000;">std</span><span style="color: #ce5c00; font-weight: bold;">::</span><span style="color: #000;">unexpected</span><span style="color: #000; font-weight: bold;">(</span><span style="color: #000;">invalid_format</span><span style="color: #000; font-weight: bold;">);
</span><span style="color: #000; font-weight: bold;">                }</span>
                <span style="color: #000;">val</span> <span style="color: #ce5c00; font-weight: bold;">=</span> <span style="color: #000;">val</span> <span style="color: #ce5c00; font-weight: bold;">*</span> <span style="color: #0000cf; font-weight: bold;">10</span> <span style="color: #ce5c00; font-weight: bold;">+</span> <span style="color: #000; font-weight: bold;">(</span><span style="color: #ce5c00; font-weight: bold;">*</span><span style="color: #000;">p</span><span style="color: #ce5c00; font-weight: bold;">++</span> <span style="color: #ce5c00; font-weight: bold;">-</span> <span style="color: #4e9a06;">'0'</span><span style="color: #000; font-weight: bold;">);</span>
                <span style="color: #204a87; font-weight: bold;">if</span> <span style="color: #000; font-weight: bold;">(</span><span style="color: #000;">p</span> <span style="color: #ce5c00; font-weight: bold;">&lt;</span> <span style="color: #000;">pend</span> <span style="color: #ce5c00; font-weight: bold;">&amp;&amp;</span> <span style="color: #ce5c00; font-weight: bold;">*</span><span style="color: #000;">p</span> <span style="color: #ce5c00; font-weight: bold;">&gt;=</span> <span style="color: #4e9a06;">'0'</span> <span style="color: #ce5c00; font-weight: bold;">&amp;&amp;</span> <span style="color: #ce5c00; font-weight: bold;">*</span><span style="color: #000;">p</span> <span style="color: #ce5c00; font-weight: bold;">&lt;=</span> <span style="color: #4e9a06;">'9'</span><span style="color: #000; font-weight: bold;">)</span> <span style="color: #000; font-weight: bold;">{</span>
                    <span style="color: #000;">val</span> <span style="color: #ce5c00; font-weight: bold;">=</span> <span style="color: #000;">val</span> <span style="color: #ce5c00; font-weight: bold;">*</span> <span style="color: #0000cf; font-weight: bold;">10</span> <span style="color: #ce5c00; font-weight: bold;">+</span> <span style="color: #000; font-weight: bold;">(</span><span style="color: #ce5c00; font-weight: bold;">*</span><span style="color: #000;">p</span><span style="color: #ce5c00; font-weight: bold;">++</span> <span style="color: #ce5c00; font-weight: bold;">-</span> <span style="color: #4e9a06;">'0'</span><span style="color: #000; font-weight: bold;">);</span>
                    <span style="color: #204a87; font-weight: bold;">if</span> <span style="color: #000; font-weight: bold;">(</span><span style="color: #000;">val</span> <span style="color: #ce5c00; font-weight: bold;">&gt;</span> <span style="color: #0000cf; font-weight: bold;">255</span><span style="color: #000; font-weight: bold;">)</span> <span style="color: #000; font-weight: bold;">{</span> 
<span style="color: #204a87; font-weight: bold;">                      return</span> <span style="color: #000;">std</span><span style="color: #ce5c00; font-weight: bold;">::</span><span style="color: #000;">unexpected</span><span style="color: #000; font-weight: bold;">(</span><span style="color: #000;">invalid_format</span><span style="color: #000; font-weight: bold;">);
</span><span style="color: #000; font-weight: bold;">                    }</span>
                <span style="color: #000; font-weight: bold;">}</span>
            <span style="color: #000; font-weight: bold;">}</span>
        <span style="color: #000; font-weight: bold;">}</span> <span style="color: #204a87; font-weight: bold;">else</span> <span style="color: #000; font-weight: bold;">{</span>
            <span style="color: #204a87; font-weight: bold;">return</span> <span style="color: #000;">std</span><span style="color: #ce5c00; font-weight: bold;">::</span><span style="color: #000;">unexpected</span><span style="color: #000; font-weight: bold;">(</span><span style="color: #000;">parse_error</span><span style="color: #ce5c00; font-weight: bold;">::</span><span style="color: #000;">invalid_format</span><span style="color: #000; font-weight: bold;">);</span>
        <span style="color: #000; font-weight: bold;">}</span>
        <span style="color: #000;">ip</span> <span style="color: #ce5c00; font-weight: bold;">=</span> <span style="color: #000; font-weight: bold;">(</span><span style="color: #000;">ip</span> <span style="color: #ce5c00; font-weight: bold;">&lt;&lt;</span> <span style="color: #0000cf; font-weight: bold;">8</span><span style="color: #000; font-weight: bold;">)</span> <span style="color: #ce5c00; font-weight: bold;">|</span> <span style="color: #000;">val</span><span style="color: #000; font-weight: bold;">;</span>
        <span style="color: #000;">octets</span><span style="color: #ce5c00; font-weight: bold;">++</span><span style="color: #000; font-weight: bold;">;</span>
        <span style="color: #204a87; font-weight: bold;">if</span> <span style="color: #000; font-weight: bold;">(</span><span style="color: #000;">octets</span> <span style="color: #ce5c00; font-weight: bold;">&lt;</span> <span style="color: #0000cf; font-weight: bold;">4</span><span style="color: #000; font-weight: bold;">)</span> <span style="color: #000; font-weight: bold;">{</span>
            <span style="color: #204a87; font-weight: bold;">if</span> <span style="color: #000; font-weight: bold;">(</span><span style="color: #000;">p</span> <span style="color: #ce5c00; font-weight: bold;">==</span> <span style="color: #000;">pend</span> <span style="color: #ce5c00; font-weight: bold;">||</span> <span style="color: #ce5c00; font-weight: bold;">*</span><span style="color: #000;">p</span> <span style="color: #ce5c00; font-weight: bold;">!=</span> <span style="color: #4e9a06;">'.'</span><span style="color: #000; font-weight: bold;">)</span> <span style="color: #000; font-weight: bold;">{</span>
              <span style="color: #204a87; font-weight: bold;">return</span> <span style="color: #000;">std</span><span style="color: #ce5c00; font-weight: bold;">::</span><span style="color: #000;">unexpected</span><span style="color: #000; font-weight: bold;">(</span><span style="color: #000;">invalid_format</span><span style="color: #000; font-weight: bold;">);</span>
            <span style="color: #000; font-weight: bold;">}</span>
            <span style="color: #000;">p</span><span style="color: #ce5c00; font-weight: bold;">++</span><span style="color: #000; font-weight: bold;">;</span> <span style="color: #8f5902; font-style: italic;">// Skip the dot</span>
        <span style="color: #000; font-weight: bold;">}</span>
    <span style="color: #000; font-weight: bold;">}</span>
    <span style="color: #204a87; font-weight: bold;">if</span> <span style="color: #000; font-weight: bold;">(</span><span style="color: #000;">octets</span> <span style="color: #ce5c00; font-weight: bold;">==</span> <span style="color: #0000cf; font-weight: bold;">4</span> <span style="color: #ce5c00; font-weight: bold;">&amp;&amp;</span> <span style="color: #000;">p</span> <span style="color: #ce5c00; font-weight: bold;">==</span> <span style="color: #000;">pend</span><span style="color: #000; font-weight: bold;">)</span> <span style="color: #000; font-weight: bold;">{</span>
        <span style="color: #204a87; font-weight: bold;">return</span> <span style="color: #000;">ip</span><span style="color: #000; font-weight: bold;">;</span>
    <span style="color: #000; font-weight: bold;">}</span> <span style="color: #204a87; font-weight: bold;">else</span> <span style="color: #000; font-weight: bold;">{</span>
        <span style="color: #204a87; font-weight: bold;">return</span> <span style="color: #000;">std</span><span style="color: #ce5c00; font-weight: bold;">::</span><span style="color: #000;">unexpected</span><span style="color: #000; font-weight: bold;">(</span><span style="color: #000;">invalid_format</span><span style="color: #000; font-weight: bold;">);</span>
    <span style="color: #000; font-weight: bold;">}</span>
<span style="color: #000; font-weight: bold;">}</span>
</code></pre>
</div>
<p>Nice work AI!</p>
<p>In C++, we have standard functions to parse numbers (<code>std::from_chars</code>) which can significantly simplify the code.</p>
<div class="highlight" style="background: #f8f8f8;">
<pre style="line-height: 125%;"><code><span style="color: #000;">std</span><span style="color: #ce5c00; font-weight: bold;">::</span><span style="color: #000;">expected</span><span style="color: #ce5c00; font-weight: bold;">&lt;</span><span style="color: #204a87; font-weight: bold;">uint32_t</span><span style="color: #000; font-weight: bold;">,</span> <span style="color: #000;">parse_error</span><span style="color: #ce5c00; font-weight: bold;">&gt;</span> <span style="color: #000;">parse_ip</span><span style="color: #000; font-weight: bold;">(</span><span style="color: #204a87; font-weight: bold;">const</span> <span style="color: #204a87; font-weight: bold;">char</span> <span style="color: #ce5c00; font-weight: bold;">*</span><span style="color: #000;">p</span><span style="color: #000; font-weight: bold;">,</span> <span style="color: #204a87; font-weight: bold;">const</span> <span style="color: #204a87; font-weight: bold;">char</span> <span style="color: #ce5c00; font-weight: bold;">*</span><span style="color: #000;">pend</span><span style="color: #000; font-weight: bold;">)</span> <span style="color: #000; font-weight: bold;">{</span>
  <span style="color: #204a87; font-weight: bold;">const</span> <span style="color: #204a87; font-weight: bold;">char</span> <span style="color: #ce5c00; font-weight: bold;">*</span><span style="color: #000;">current</span> <span style="color: #ce5c00; font-weight: bold;">=</span> <span style="color: #000;">p</span><span style="color: #000; font-weight: bold;">;</span>
  <span style="color: #204a87; font-weight: bold;">uint32_t</span> <span style="color: #000;">ip</span> <span style="color: #ce5c00; font-weight: bold;">=</span> <span style="color: #0000cf; font-weight: bold;">0</span><span style="color: #000; font-weight: bold;">;</span>
  <span style="color: #204a87; font-weight: bold;">for</span> <span style="color: #000; font-weight: bold;">(</span><span style="color: #204a87; font-weight: bold;">int</span> <span style="color: #000;">i</span> <span style="color: #ce5c00; font-weight: bold;">=</span> <span style="color: #0000cf; font-weight: bold;">0</span><span style="color: #000; font-weight: bold;">;</span> <span style="color: #000;">i</span> <span style="color: #ce5c00; font-weight: bold;">&lt;</span> <span style="color: #0000cf; font-weight: bold;">4</span><span style="color: #000; font-weight: bold;">;</span> <span style="color: #ce5c00; font-weight: bold;">++</span><span style="color: #000;">i</span><span style="color: #000; font-weight: bold;">)</span> <span style="color: #000; font-weight: bold;">{</span>
    <span style="color: #204a87; font-weight: bold;">uint8_t</span> <span style="color: #000;">value</span><span style="color: #000; font-weight: bold;">;</span>
    <span style="color: #204a87; font-weight: bold;">auto</span> <span style="color: #000;">r</span> <span style="color: #ce5c00; font-weight: bold;">=</span> <span style="color: #000;">std</span><span style="color: #ce5c00; font-weight: bold;">::</span><span style="color: #000;">from_chars</span><span style="color: #000; font-weight: bold;">(</span><span style="color: #000;">current</span><span style="color: #000; font-weight: bold;">,</span> <span style="color: #000;">pend</span><span style="color: #000; font-weight: bold;">,</span> <span style="color: #000;">value</span><span style="color: #000; font-weight: bold;">);</span>
    <span style="color: #204a87; font-weight: bold;">if</span> <span style="color: #000; font-weight: bold;">(</span><span style="color: #000;">r</span><span style="color: #000; font-weight: bold;">.</span><span style="color: #000;">ec</span> <span style="color: #ce5c00; font-weight: bold;">!=</span> <span style="color: #000;">std</span><span style="color: #ce5c00; font-weight: bold;">::</span><span style="color: #000;">errc</span><span style="color: #000; font-weight: bold;">())</span> <span style="color: #000; font-weight: bold;">{</span>
      <span style="color: #204a87; font-weight: bold;">return</span> <span style="color: #000;">std</span><span style="color: #ce5c00; font-weight: bold;">::</span><span style="color: #000;">unexpected</span><span style="color: #000; font-weight: bold;">(</span><span style="color: #000;">invalid_format</span><span style="color: #000; font-weight: bold;">);</span>
    <span style="color: #000; font-weight: bold;">}</span>
    <span style="color: #000;">current</span> <span style="color: #ce5c00; font-weight: bold;">=</span> <span style="color: #000;">r</span><span style="color: #000; font-weight: bold;">.</span><span style="color: #000;">ptr</span><span style="color: #000; font-weight: bold;">;</span>
    <span style="color: #000;">ip</span> <span style="color: #ce5c00; font-weight: bold;">=</span> <span style="color: #000; font-weight: bold;">(</span><span style="color: #000;">ip</span> <span style="color: #ce5c00; font-weight: bold;">&lt;&lt;</span> <span style="color: #0000cf; font-weight: bold;">8</span><span style="color: #000; font-weight: bold;">)</span> <span style="color: #ce5c00; font-weight: bold;">|</span> <span style="color: #000;">value</span><span style="color: #000; font-weight: bold;">;</span>
    <span style="color: #204a87; font-weight: bold;">if</span> <span style="color: #000; font-weight: bold;">(</span><span style="color: #000;">i</span> <span style="color: #ce5c00; font-weight: bold;">&lt;</span> <span style="color: #0000cf; font-weight: bold;">3</span><span style="color: #000; font-weight: bold;">)</span> <span style="color: #000; font-weight: bold;">{</span>
      <span style="color: #204a87; font-weight: bold;">if</span> <span style="color: #000; font-weight: bold;">(</span><span style="color: #000;">current</span> <span style="color: #ce5c00; font-weight: bold;">==</span> <span style="color: #000;">pend</span> <span style="color: #ce5c00; font-weight: bold;">||</span> <span style="color: #ce5c00; font-weight: bold;">*</span><span style="color: #000;">current</span><span style="color: #ce5c00; font-weight: bold;">++</span> <span style="color: #ce5c00; font-weight: bold;">!=</span> <span style="color: #4e9a06;">'.'</span><span style="color: #000; font-weight: bold;">)</span> <span style="color: #000; font-weight: bold;">{</span>
        <span style="color: #204a87; font-weight: bold;">return</span> <span style="color: #000;">std</span><span style="color: #ce5c00; font-weight: bold;">::</span><span style="color: #000;">unexpected</span><span style="color: #000; font-weight: bold;">(</span><span style="color: #000;">invalid_format</span><span style="color: #000; font-weight: bold;">);</span>
      <span style="color: #000; font-weight: bold;">}</span>
    <span style="color: #000; font-weight: bold;">}</span>
  <span style="color: #000; font-weight: bold;">}</span>
  <span style="color: #204a87; font-weight: bold;">return</span> <span style="color: #000;">ip</span><span style="color: #000; font-weight: bold;">;</span>
<span style="color: #000; font-weight: bold;">}</span>
</code></pre>
</div>
<p>You can also use the <a href="https://github.com/fastfloat/fast_float">fast_float library</a> as a substitute for <code>std::from_chars</code>. The latest version of fast_float has faster 8-bit integer parsing thanks to Shikhar Soni (with a fix by Pavel Novikov).</p>
<p><a href="https://github.com/lemire/Code-used-on-Daniel-Lemire-s-blog/tree/master/2025/12/27/benchmark_ip_parsing">I wrote a benchmark for this problem</a>. Let us first consider the results using an Apple M4 processors (4.5 GHz) with LLVM 17.</p>
<table>
<thead>
<tr>
<th>function</th>
<th>instructions/ip</th>
<th>ns/ip</th>
</tr>
</thead>
<tbody>
<tr>
<td>manual</td>
<td>185</td>
<td>6.2</td>
</tr>
<tr>
<td>manual (unrolled)</td>
<td>114</td>
<td>3.3</td>
</tr>
<tr>
<td>from_chars</td>
<td>381</td>
<td>14</td>
</tr>
<tr>
<td>fast_float</td>
<td>181</td>
<td>7.2</td>
</tr>
</tbody>
</table>
<p>Let us try with GCC 12 and an Intel Ice Lake processor (3.2 GHz) using GCC 12.</p>
<table>
<thead>
<tr>
<th>function</th>
<th>instructions/ip</th>
<th>ns/ip</th>
</tr>
</thead>
<tbody>
<tr>
<td>manual</td>
<td>219</td>
<td>30</td>
</tr>
<tr>
<td>manual (unrolled)</td>
<td>154</td>
<td>24</td>
</tr>
<tr>
<td>from_chars</td>
<td>220</td>
<td>29</td>
</tr>
<tr>
<td>fast_float</td>
<td>211</td>
<td>18</td>
</tr>
</tbody>
</table>
<p>And finally, let us try with a Chinese Longsoon 3A6000 processor (2.5 GHz) using LLVM 21.</p>
<table>
<thead>
<tr>
<th>function</th>
<th>instructions/ip</th>
<th>ns/ip</th>
</tr>
</thead>
<tbody>
<tr>
<td>manual</td>
<td>187</td>
<td>29</td>
</tr>
<tr>
<td>manual (unrolled)</td>
<td>109</td>
<td>21</td>
</tr>
<tr>
<td>from_chars</td>
<td>191</td>
<td>39</td>
</tr>
<tr>
<td>fast_float</td>
<td>193</td>
<td>27</td>
</tr>
</tbody>
</table>
<p>The optimization work on the fast_float library paid off. The difference is especially striking on the x64 processor.</p>
<p>What is also interesting in my little experiment is that I was able to get the AI to produce faster code with relatively little effort on my part. I did have to &#8216;guide&#8217; the AI. Does that mean that I can retire? Not yet. But I am happy that I can more quickly get good reference baselines, which allows me to better focus my work where it matters.</p>
<p><em>Reference</em>: <a href="https://github.com/fastfloat/fast_float">The fast_float C++ library</a> is a fast number parsing library part of GCC and major web browsers.</p>
]]></content:encoded>
					
					<wfw:commentRss>https://lemire.me/blog/2025/12/27/parsing-ip-addresses-quickly-portably-without-simd-magic/feed/</wfw:commentRss>
			<slash:comments>10</slash:comments>
		
		
			</item>
		<item>
		<title>Performance trick : optimistic vs pessimistic checks</title>
		<link>https://lemire.me/blog/2025/12/20/performance-trick-optimistic-vs-pessimistic-checks/</link>
					<comments>https://lemire.me/blog/2025/12/20/performance-trick-optimistic-vs-pessimistic-checks/#comments</comments>
		
		<dc:creator><![CDATA[Daniel Lemire]]></dc:creator>
		<pubDate>Sat, 20 Dec 2025 23:26:09 +0000</pubDate>
				<category><![CDATA[]]></category>
		<guid isPermaLink="false">https://lemire.me/blog/?p=22386</guid>

					<description><![CDATA[<img width="150" height="150" src="https://lemire.me/blog/wp-content/uploads/2025/12/Capture-decran-le-2025-12-20-a-18.25.46-150x150.png" class="webfeedsFeaturedVisual wp-post-image" alt="" style="display: block; margin-bottom: 5px; clear:both;max-width: 100%;" link_thumbnail="" decoding="async" loading="lazy" />Strings in programming are often represented as arrays of 8-bit words. The string is ASCII if and only if all 8-bit words have their most significant bit unset. In other words, the byte values must be no larger than 127 (or 0x7F in hexadecimal). A decent C function to check that the string is ASCII &#8230; <a href="https://lemire.me/blog/2025/12/20/performance-trick-optimistic-vs-pessimistic-checks/" class="more-link">Continue reading <span class="screen-reader-text">Performance trick : optimistic vs pessimistic checks</span></a>]]></description>
										<content:encoded><![CDATA[<img width="150" height="150" src="https://lemire.me/blog/wp-content/uploads/2025/12/Capture-decran-le-2025-12-20-a-18.25.46-150x150.png" class="webfeedsFeaturedVisual wp-post-image" alt="" style="display: block; margin-bottom: 5px; clear:both;max-width: 100%;" link_thumbnail="" decoding="async" loading="lazy" /><p>Strings in programming are often represented as arrays of 8-bit words. The string is ASCII if and only if all 8-bit words have their most significant bit unset. In other words, the byte values must be no larger than 127 (or <code>0x7F</code> in hexadecimal).</p>
<p>A decent C function to check that the string is ASCII is as follows.</p>
<div class="highlight" style="background: #f8f8f8;">
<pre style="line-height: 125%;"><code><span style="color: #204a87; font-weight: bold;">bool</span> <span style="color: #000;">is_ascii_pessimistic</span><span style="color: #000; font-weight: bold;">(</span><span style="color: #204a87; font-weight: bold;">const</span> <span style="color: #204a87; font-weight: bold;">char</span> <span style="color: #ce5c00; font-weight: bold;">*</span><span style="color: #000;">data</span><span style="color: #000; font-weight: bold;">,</span> <span style="color: #204a87; font-weight: bold;">size_t</span> <span style="color: #000;">length</span><span style="color: #000; font-weight: bold;">)</span> <span style="color: #000; font-weight: bold;">{</span>
  <span style="color: #204a87; font-weight: bold;">for</span> <span style="color: #000; font-weight: bold;">(</span><span style="color: #204a87; font-weight: bold;">size_t</span> <span style="color: #000;">i</span> <span style="color: #ce5c00; font-weight: bold;">=</span> <span style="color: #0000cf; font-weight: bold;">0</span><span style="color: #000; font-weight: bold;">;</span> <span style="color: #000;">i</span> <span style="color: #ce5c00; font-weight: bold;">&lt;</span> <span style="color: #000;">length</span><span style="color: #000; font-weight: bold;">;</span> <span style="color: #000;">i</span><span style="color: #ce5c00; font-weight: bold;">++</span><span style="color: #000; font-weight: bold;">)</span> <span style="color: #000; font-weight: bold;">{</span>
    <span style="color: #204a87; font-weight: bold;">if</span> <span style="color: #000; font-weight: bold;">(</span><span style="color: #000;">static_cast</span><span style="color: #ce5c00; font-weight: bold;">&lt;</span><span style="color: #204a87; font-weight: bold;">unsigned</span> <span style="color: #204a87; font-weight: bold;">char</span><span style="color: #ce5c00; font-weight: bold;">&gt;</span><span style="color: #000; font-weight: bold;">(</span><span style="color: #000;">data</span><span style="color: #000; font-weight: bold;">[</span><span style="color: #000;">i</span><span style="color: #000; font-weight: bold;">])</span> <span style="color: #ce5c00; font-weight: bold;">&gt;</span> <span style="color: #0000cf; font-weight: bold;">0x7F</span><span style="color: #000; font-weight: bold;">)</span> <span style="color: #000; font-weight: bold;">{</span>
      <span style="color: #204a87; font-weight: bold;">return</span> <span style="color: #204a87;">false</span><span style="color: #000; font-weight: bold;">;</span>
    <span style="color: #000; font-weight: bold;">}</span>
  <span style="color: #000; font-weight: bold;">}</span>
  <span style="color: #204a87; font-weight: bold;">return</span> <span style="color: #204a87;">true</span><span style="color: #000; font-weight: bold;">;</span>
<span style="color: #000; font-weight: bold;">}</span>
</code></pre>
</div>
<p>We go over each character, we compare it with <code>0x7F</code> and continue if the value is no larger than <code>0x7F</code>. If you have scanned the entire string and all tests have passed, you know that your string is ASCII.</p>
<p>Notice how I called this function <code>pessimistic</code>. What do I mean? I mean that it expects, in some sense, that it will find some non-ASCII character. If so, the best option is to immediately return and not scan the whole string.</p>
<p>What if you expect the string to almost always be ASCII? An alternative then is to effectively do a bitwise OR reduction of the string: you OR all characters together and you check just once that the result is bounded by <code>0x7F</code>. If any character has its most significant bit set, then the bitwise OR of all characters will also have its most significant bit set. So you might write your function as follows.</p>
<div class="highlight" style="background: #f8f8f8;">
<pre style="line-height: 125%;"><code><span style="color: #204a87; font-weight: bold;">bool</span> <span style="color: #000;">is_ascii_optimistic</span><span style="color: #000; font-weight: bold;">(</span><span style="color: #204a87; font-weight: bold;">const</span> <span style="color: #204a87; font-weight: bold;">char</span> <span style="color: #ce5c00; font-weight: bold;">*</span><span style="color: #000;">data</span><span style="color: #000; font-weight: bold;">,</span> <span style="color: #204a87; font-weight: bold;">size_t</span> <span style="color: #000;">length</span><span style="color: #000; font-weight: bold;">)</span> <span style="color: #000; font-weight: bold;">{</span>
  <span style="color: #204a87; font-weight: bold;">unsigned</span> <span style="color: #204a87; font-weight: bold;">char</span> <span style="color: #000;">result</span> <span style="color: #ce5c00; font-weight: bold;">=</span> <span style="color: #0000cf; font-weight: bold;">0</span><span style="color: #000; font-weight: bold;">;</span>
  <span style="color: #204a87; font-weight: bold;">for</span> <span style="color: #000; font-weight: bold;">(</span><span style="color: #204a87; font-weight: bold;">size_t</span> <span style="color: #000;">i</span> <span style="color: #ce5c00; font-weight: bold;">=</span> <span style="color: #0000cf; font-weight: bold;">0</span><span style="color: #000; font-weight: bold;">;</span> <span style="color: #000;">i</span> <span style="color: #ce5c00; font-weight: bold;">&lt;</span> <span style="color: #000;">length</span><span style="color: #000; font-weight: bold;">;</span> <span style="color: #000;">i</span><span style="color: #ce5c00; font-weight: bold;">++</span><span style="color: #000; font-weight: bold;">)</span> <span style="color: #000; font-weight: bold;">{</span>
    <span style="color: #000;">result</span> <span style="color: #ce5c00; font-weight: bold;">|=</span> <span style="color: #000;">static_cast</span><span style="color: #ce5c00; font-weight: bold;">&lt;</span><span style="color: #204a87; font-weight: bold;">unsigned</span> <span style="color: #204a87; font-weight: bold;">char</span><span style="color: #ce5c00; font-weight: bold;">&gt;</span><span style="color: #000; font-weight: bold;">(</span><span style="color: #000;">data</span><span style="color: #000; font-weight: bold;">[</span><span style="color: #000;">i</span><span style="color: #000; font-weight: bold;">]);</span>
  <span style="color: #000; font-weight: bold;">}</span>
  <span style="color: #204a87; font-weight: bold;">return</span> <span style="color: #000;">result</span> <span style="color: #ce5c00; font-weight: bold;">&lt;=</span> <span style="color: #0000cf; font-weight: bold;">0x7F</span><span style="color: #000; font-weight: bold;">;</span>
<span style="color: #000; font-weight: bold;">}</span>
</code></pre>
</div>
<p>If you have strings that are all pure ASCII, which function will be fastest? Maybe surprisingly, the optimistic might be several times faster. <a href="https://github.com/lemire/Code-used-on-Daniel-Lemire-s-blog/tree/master/2025/12/20/optimistic">I wrote a benchmark</a> and ran it with GCC 15 on an Intel Ice Lake processor. I get the following results.</p>
<table>
<thead>
<tr>
<th>function</th>
<th>speed</th>
</tr>
</thead>
<tbody>
<tr>
<td>pessimistic</td>
<td> 1.8 GB/s</td>
</tr>
<tr>
<td>optimistic</td>
<td> 13 GB/s</td>
</tr>
</tbody>
</table>
<p>Why is the optimistic faster? Mostly because the compiler is better able to optimize it. Among other possibilities, it can use autovectorization to automatically use data-level parallelization (e.g., SIMD instructions).</p>
<p>Which function is best depends on your use case.</p>
<p>What if you would prefer a pessimistic function, that is, one that returns early when non-ASCII characters are encountered, but you still want high speed? Then you can use <a href="https://github.com/simdutf/simdutf">a dedicated library like simdutf</a> where we have hand-coded the logic. In simdutf, the pessimistic function is called <code>validate_ascii_with_errors</code>. Your results will vary but I got that it has the same speed as optimistic function.</p>
<table>
<thead>
<tr>
<th>function</th>
<th>speed</th>
</tr>
</thead>
<tbody>
<tr>
<td>pessimistic</td>
<td> 1.8 GB/s</td>
</tr>
<tr>
<td>pessimistic (simdutf)</td>
<td> 14 GB/s</td>
</tr>
<tr>
<td>optimistic</td>
<td> 13 GB/s</td>
</tr>
</tbody>
</table>
<p>So it is possible to combine the benefits of pessimism and optimism although it requires a bit of care.</p>
]]></content:encoded>
					
					<wfw:commentRss>https://lemire.me/blog/2025/12/20/performance-trick-optimistic-vs-pessimistic-checks/feed/</wfw:commentRss>
			<slash:comments>5</slash:comments>
		
		
			</item>
		<item>
		<title>JSON-complete data formats and programming languages</title>
		<link>https://lemire.me/blog/2025/12/20/json-complete-data-format-and-programming-languages/</link>
					<comments>https://lemire.me/blog/2025/12/20/json-complete-data-format-and-programming-languages/#comments</comments>
		
		<dc:creator><![CDATA[Daniel Lemire]]></dc:creator>
		<pubDate>Sat, 20 Dec 2025 21:24:33 +0000</pubDate>
				<category><![CDATA[]]></category>
		<guid isPermaLink="false">https://lemire.me/blog/?p=22382</guid>

					<description><![CDATA[<img width="150" height="150" src="https://lemire.me/blog/wp-content/uploads/2025/12/Capture-decran-le-2025-12-20-a-16.24.01-150x150.png" class="webfeedsFeaturedVisual wp-post-image" alt="" style="display: block; margin-bottom: 5px; clear:both;max-width: 100%;" link_thumbnail="" decoding="async" loading="lazy" />Much of the data on the Internet is shared using a simple format called JSON. JSON is made of two composite types (arrays and key-value maps) and a small number of primitive types (64-bit floating-point numbers, strings, null, Booleans). That JSON became ubiquitous despite its simplicity is telling. { "name": "Nova Starlight", "age": 28, "powers": &#8230; <a href="https://lemire.me/blog/2025/12/20/json-complete-data-format-and-programming-languages/" class="more-link">Continue reading <span class="screen-reader-text">JSON-complete data formats and programming languages</span></a>]]></description>
										<content:encoded><![CDATA[<img width="150" height="150" src="https://lemire.me/blog/wp-content/uploads/2025/12/Capture-decran-le-2025-12-20-a-16.24.01-150x150.png" class="webfeedsFeaturedVisual wp-post-image" alt="" style="display: block; margin-bottom: 5px; clear:both;max-width: 100%;" link_thumbnail="" decoding="async" loading="lazy" /><p>Much of the data on the Internet is shared using a simple format called JSON. JSON is made of two composite types (arrays and key-value maps) and a small number of primitive types (64-bit floating-point numbers, strings, null, Booleans). That JSON became ubiquitous despite its simplicity is telling.</p>
<pre class="code_syntax" style="color: #000000; background: #ffffff;"><span class="line_wrapper"><span style="color: #800080;">{</span></span>
<span class="line_wrapper"> <span style="color: #800000;">"</span><span style="color: #0000e6;">name</span><span style="color: #800000;">"</span><span style="color: #808030;">:</span> <span style="color: #800000;">"</span><span style="color: #0000e6;">Nova Starlight</span><span style="color: #800000;">"</span><span style="color: #808030;">,</span></span>
<span class="line_wrapper"> <span style="color: #800000;">"</span><span style="color: #0000e6;">age</span><span style="color: #800000;">"</span><span style="color: #808030;">:</span> <span style="color: #008c00;">28</span><span style="color: #808030;">,</span></span>
<span class="line_wrapper"> <span style="color: #800000;">"</span><span style="color: #0000e6;">powers</span><span style="color: #800000;">"</span><span style="color: #808030;">:</span> <span style="color: #808030;">[</span><span style="color: #800000;">"</span><span style="color: #0000e6;">telekinesis</span><span style="color: #800000;">"</span><span style="color: #808030;">,</span> <span style="color: #800000;">"</span><span style="color: #0000e6;">flight</span><span style="color: #800000;">"</span><span style="color: #808030;">,</span><span style="color: #800000;">"</span><span style="color: #0000e6;">energy blasts</span><span style="color: #800000;">"</span><span style="color: #808030;">]</span></span>
<span class="line_wrapper"><span style="color: #800080;">}</span></span></pre>
<p>Interestingly, JSON matches closely the data structures provided by default in the popular language Go. Go gives you arrays/slices and maps&#8230; in addition to the standard primitive types. It is a bit more than C which does not provide maps by default. But it is significantly simpler than Java, C++, C#, and many other programming languages where the standard library covers much of the data structures found in textbooks.</p>
<p>There is at least one obvious data structure that is missing in JSON, and in Go, the set. Because objects are supposed to have no duplicate keys, you can implement a set of strings by assigning keys to an arbitrary value like true.</p>
<pre class="code_syntax" style="color: #000000; background: #ffffff;"><span class="line_wrapper"><span style="color: #800080;">{</span><span style="color: #800000;">"</span><span style="color: #0000e6;">element1</span><span style="color: #800000;">"</span><span style="color: #808030;">:</span> true<span style="color: #808030;">,</span> <span style="color: #800000;">"</span><span style="color: #0000e6;">element2</span><span style="color: #800000;">"</span><span style="color: #808030;">:</span> true<span style="color: #800080;">}</span></span></pre>
<p>But I believe that it is a somewhat unusual pattern. Most times, when we mean to represent a set of objects, an array suffices. We just need to handle the duplicates somehow.</p>
<p>There have been many attempts at adding more concepts to JSON, more complexity, but none of them have achieved much traction. I believe that it reflects the fact that JSON is good enough as a data format.</p>
<p>I refer to any format that allows you to represent JSON data, such as YAML, as a JSON-complete data format. If it is at least equivalent to JSON, it is rich enough for most problems.</p>
<p>Similarly, I suggest that new programming languages should aim to be JSON-complete: they should provide a map with key-value pairs, arrays, and basic primitive types. In this light, the C and the Pascal programming languages are not JSON-complete.</p>
]]></content:encoded>
					
					<wfw:commentRss>https://lemire.me/blog/2025/12/20/json-complete-data-format-and-programming-languages/feed/</wfw:commentRss>
			<slash:comments>1</slash:comments>
		
		
			</item>
		<item>
		<title>Multiplying the Shuffle Speed in Go with Batched Shuffling</title>
		<link>https://lemire.me/blog/2025/12/15/multiplying-the-shuffle-speed-in-go-with-batched-shuffling/</link>
					<comments>https://lemire.me/blog/2025/12/15/multiplying-the-shuffle-speed-in-go-with-batched-shuffling/#respond</comments>
		
		<dc:creator><![CDATA[Daniel Lemire]]></dc:creator>
		<pubDate>Mon, 15 Dec 2025 01:42:10 +0000</pubDate>
				<category><![CDATA[]]></category>
		<guid isPermaLink="false">https://lemire.me/blog/?p=22369</guid>

					<description><![CDATA[<img width="150" height="150" src="https://lemire.me/blog/wp-content/uploads/2025/12/pcgshuffle_benchmark-150x150.png" class="webfeedsFeaturedVisual wp-post-image" alt="" style="display: block; margin-bottom: 5px; clear:both;max-width: 100%;" link_thumbnail="" decoding="async" loading="lazy" />Programmers often want to randomly shuffle arrays. Evidently, we want to do so as efficiently as possible. Maybe surprisingly, I found that the performance of random shuffling was not limited by memory bandwidth or latency, but rather by computation. Specifically, it is the computation of the random indexes itself that is slow. Earlier in 2025, &#8230; <a href="https://lemire.me/blog/2025/12/15/multiplying-the-shuffle-speed-in-go-with-batched-shuffling/" class="more-link">Continue reading <span class="screen-reader-text">Multiplying the Shuffle Speed in Go with Batched Shuffling</span></a>]]></description>
										<content:encoded><![CDATA[<img width="150" height="150" src="https://lemire.me/blog/wp-content/uploads/2025/12/pcgshuffle_benchmark-150x150.png" class="webfeedsFeaturedVisual wp-post-image" alt="" style="display: block; margin-bottom: 5px; clear:both;max-width: 100%;" link_thumbnail="" decoding="async" loading="lazy" /><p>Programmers often want to randomly shuffle arrays. Evidently, we want to do so as efficiently as possible. Maybe surprisingly, I found that the performance of random shuffling was not limited by memory bandwidth or latency, but rather by computation. Specifically, it is the computation of the random indexes itself that is slow.</p>
<p><a href="https://lemire.me/blog/2025/04/06/faster-shuffling-in-go-with-batching/">Earlier in 2025</a>, I reported how you could more than double the speed of a random shuffle in Go using a new algorithm (<a href="https://arxiv.org/pdf/2408.06213">Brackett-Rozinsky and Lemire, 2025</a>). However, I was using custom code that could not serve as a drop-in replacement for the standard Go Shuffle function. I decided to write a <a href="https://github.com/lemire/batchedrand">proper library called <code>batchedrand</code></a>. You can use it just like the <code>math/rand/v2</code> standard library.</p>
<div class="highlight" style="background: #f8f8f8;">
<pre style="line-height: 125%;"><code><span style="color: #000;">rng</span> <span style="color: #ce5c00; font-weight: bold;">:=</span> <span style="color: #000;">batchedrand</span><span style="color: #000; font-weight: bold;">.</span><span style="color: #000;">Rand</span><span style="color: #000; font-weight: bold;">{</span><span style="color: #000;">rand</span><span style="color: #000; font-weight: bold;">.</span><span style="color: #000;">New</span><span style="color: #000; font-weight: bold;">(</span><span style="color: #000;">rand</span><span style="color: #000; font-weight: bold;">.</span><span style="color: #000;">NewPCG</span><span style="color: #000; font-weight: bold;">(</span><span style="color: #0000cf; font-weight: bold;">1</span><span style="color: #000; font-weight: bold;">,</span> <span style="color: #0000cf; font-weight: bold;">2</span><span style="color: #000; font-weight: bold;">))}</span>

<span style="color: #000;">data</span> <span style="color: #ce5c00; font-weight: bold;">:=</span> <span style="color: #000; font-weight: bold;">[]</span><span style="color: #204a87; font-weight: bold;">int</span><span style="color: #000; font-weight: bold;">{</span><span style="color: #0000cf; font-weight: bold;">1</span><span style="color: #000; font-weight: bold;">,</span> <span style="color: #0000cf; font-weight: bold;">2</span><span style="color: #000; font-weight: bold;">,</span> <span style="color: #0000cf; font-weight: bold;">3</span><span style="color: #000; font-weight: bold;">,</span> <span style="color: #0000cf; font-weight: bold;">4</span><span style="color: #000; font-weight: bold;">,</span> <span style="color: #0000cf; font-weight: bold;">5</span><span style="color: #000; font-weight: bold;">}</span>
<span style="color: #000;">rng</span><span style="color: #000; font-weight: bold;">.</span><span style="color: #000;">Shuffle</span><span style="color: #000; font-weight: bold;">(</span><span style="color: #204a87;">len</span><span style="color: #000; font-weight: bold;">(</span><span style="color: #000;">data</span><span style="color: #000; font-weight: bold;">),</span> <span style="color: #204a87; font-weight: bold;">func</span><span style="color: #000; font-weight: bold;">(</span><span style="color: #000;">i</span><span style="color: #000; font-weight: bold;">,</span> <span style="color: #000;">j</span> <span style="color: #204a87; font-weight: bold;">int</span><span style="color: #000; font-weight: bold;">)</span> <span style="color: #000; font-weight: bold;">{</span>
    <span style="color: #000;">data</span><span style="color: #000; font-weight: bold;">[</span><span style="color: #000;">i</span><span style="color: #000; font-weight: bold;">],</span> <span style="color: #000;">data</span><span style="color: #000; font-weight: bold;">[</span><span style="color: #000;">j</span><span style="color: #000; font-weight: bold;">]</span> <span style="color: #000; font-weight: bold;">=</span> <span style="color: #000;">data</span><span style="color: #000; font-weight: bold;">[</span><span style="color: #000;">j</span><span style="color: #000; font-weight: bold;">],</span> <span style="color: #000;">data</span><span style="color: #000; font-weight: bold;">[</span><span style="color: #000;">i</span><span style="color: #000; font-weight: bold;">]</span>
<span style="color: #000; font-weight: bold;">})</span>
</code></pre>
</div>
<p>How fast is it? The standard library provides two generators, PCG and ChaCha8. ChaCha8 should be slower than PCG, because it has better cryptographic guarantees. However, both have somewhat comparable speeds because ChaCha8 is heavily optimized with assembly code in the Go runtime while the PCG implementation is conservative and not focused on speed.<br />
On my Apple M4 processor with Go 1.25, I get the following results. I report the time per input element, not the total time.</p>
<table>
<thead>
<tr>
<th style="text-align: left;">Benchmark</th>
<th style="text-align: right;">Size</th>
<th style="text-align: right;">Batched (ns/item)</th>
<th style="text-align: right;">Standard (ns/item)</th>
<th style="text-align: right;">speedup</th>
</tr>
</thead>
<tbody>
<tr>
<td style="text-align: left;">ChaChaShuffle</td>
<td style="text-align: right;">30</td>
<td style="text-align: right;">1.8</td>
<td style="text-align: right;">4.6</td>
<td style="text-align: right;">2.6</td>
</tr>
<tr>
<td style="text-align: left;">ChaChaShuffle</td>
<td style="text-align: right;">100</td>
<td style="text-align: right;">1.8</td>
<td style="text-align: right;">4.7</td>
<td style="text-align: right;">2.5</td>
</tr>
<tr>
<td style="text-align: left;">ChaChaShuffle</td>
<td style="text-align: right;">500000</td>
<td style="text-align: right;">2.6</td>
<td style="text-align: right;">5.1</td>
<td style="text-align: right;">1.9</td>
</tr>
<tr>
<td style="text-align: left;">PCGShuffle</td>
<td style="text-align: right;">30</td>
<td style="text-align: right;">1.5</td>
<td style="text-align: right;">3.9</td>
<td style="text-align: right;">2.6</td>
</tr>
<tr>
<td style="text-align: left;">PCGShuffle</td>
<td style="text-align: right;">100</td>
<td style="text-align: right;">1.5</td>
<td style="text-align: right;">4.2</td>
<td style="text-align: right;">2.8</td>
</tr>
<tr>
<td style="text-align: left;">PCGShuffle</td>
<td style="text-align: right;">500000</td>
<td style="text-align: right;">1.9</td>
<td style="text-align: right;">3.8</td>
<td style="text-align: right;">2.0</td>
</tr>
</tbody>
</table>
<p>Thus, from tiny to large arrays, the batched approach is two to three times faster. Not bad for a drop-in replacement!</p>
<p>Get the Go library at <a class="article-editor-link article-editor-link" href="https://github.com/lemire/batchedrand" rel="noopener noreferrer">https://github.com/lemire/batchedrand</a></p>
<p><a href="http://lemire.me/blog/wp-content/uploads/2025/12/pcgshuffle_benchmark.png"><img loading="lazy" decoding="async" class="alignnone size-full wp-image-22370" src="http://lemire.me/blog/wp-content/uploads/2025/12/pcgshuffle_benchmark.png" alt="" width="1000" height="600" srcset="https://lemire.me/blog/wp-content/uploads/2025/12/pcgshuffle_benchmark.png 1000w, https://lemire.me/blog/wp-content/uploads/2025/12/pcgshuffle_benchmark-300x180.png 300w, https://lemire.me/blog/wp-content/uploads/2025/12/pcgshuffle_benchmark-768x461.png 768w" sizes="auto, (max-width: 1000px) 100vw, 1000px" /></a></p>
<p><strong>Further reading:</strong></p>
<ul>
<li>Nevin Brackett-Rozinsky, Daniel Lemire, <a href="https://arxiv.org/pdf/2408.06213">Batched Ranged Random Integer Generation</a>, Software: Practice and Experience 55 (1), 2025</li>
<li>Daniel Lemire, Fast Random Integer Generation in an Interval, ACM Transactions on Modeling and Computer Simulation, Volume 29 Issue 1, February 2019</li>
</ul>
]]></content:encoded>
					
					<wfw:commentRss>https://lemire.me/blog/2025/12/15/multiplying-the-shuffle-speed-in-go-with-batched-shuffling/feed/</wfw:commentRss>
			<slash:comments>0</slash:comments>
		
		
			</item>
		<item>
		<title>Why speed matters</title>
		<link>https://lemire.me/blog/2025/12/05/why-speed-matters/</link>
					<comments>https://lemire.me/blog/2025/12/05/why-speed-matters/#comments</comments>
		
		<dc:creator><![CDATA[Daniel Lemire]]></dc:creator>
		<pubDate>Fri, 05 Dec 2025 19:24:50 +0000</pubDate>
				<category><![CDATA[]]></category>
		<guid isPermaLink="false">https://lemire.me/blog/?p=22361</guid>

					<description><![CDATA[<img width="150" height="150" src="https://lemire.me/blog/wp-content/uploads/2025/12/Capture-decran-le-2025-12-06-a-14.08.31-150x150.png" class="webfeedsFeaturedVisual wp-post-image" alt="" style="display: block; margin-bottom: 5px; clear:both;max-width: 100%;" link_thumbnail="" decoding="async" loading="lazy" />The one constant that I have observed in my professional life is that people underestimate the need to move fast. Of course, doing good work takes time. I once spent six months writing a URL parser. But the fact that it took so long is not a feature, it is not a positive, it is &#8230; <a href="https://lemire.me/blog/2025/12/05/why-speed-matters/" class="more-link">Continue reading <span class="screen-reader-text">Why speed matters</span></a>]]></description>
										<content:encoded><![CDATA[<img width="150" height="150" src="https://lemire.me/blog/wp-content/uploads/2025/12/Capture-decran-le-2025-12-06-a-14.08.31-150x150.png" class="webfeedsFeaturedVisual wp-post-image" alt="" style="display: block; margin-bottom: 5px; clear:both;max-width: 100%;" link_thumbnail="" decoding="async" loading="lazy" /><p>The one constant that I have observed in my professional life is that people underestimate the need to move fast.</p>
<p>Of course, doing good work takes time. I once spent six months writing a URL parser. But the fact that it took so long is not a feature, it is not a positive, it is a negative.</p>
<p>If everything is slow-moving around you, it is likely not going to be good. To fully make use of your brain, you need to move as close as possible to the speed of your thought.</p>
<p>If I give you two PhD students, one who completed their thesis in two years and one who took eight years… you can be almost certain that the two-year thesis will be much better.</p>
<p>Moving fast does not mean that you complete your projects quickly. Projects have many parts, and getting everything right may take a long time.</p>
<p>Nevertheless, you should move as fast as you can.</p>
<p>For multiple reasons:</p>
<p>1. A common mistake is to spend a lot of time—too much time—on a component of your project that does not matter. I once spent a lot of time building a podcast-like version of a course… only to find out later that students had no interest in the podcast format.</p>
<p>2. You learn by making mistakes. The faster you make mistakes, the faster you learn.</p>
<p>3. Your work degrades, becomes less relevant with time. And if you work slowly, you will be more likely to stick with your slightly obsolete work. You know that professor who spent seven years preparing lecture notes twenty years ago? He is not going to throw them away and start again, as that would be a new seven-year project. So he will keep teaching using aging lecture notes until he retires and someone finally updates the course.</p>
<p>What if you are doing open-heart surgery? Don’t you want someone who spends days preparing and who works slowly? No. You almost surely want the surgeon who does many, many open-heart surgeries. They are very likely to be the best one.</p>
<p>Now stop being so slow. Move!</p>
]]></content:encoded>
					
					<wfw:commentRss>https://lemire.me/blog/2025/12/05/why-speed-matters/feed/</wfw:commentRss>
			<slash:comments>12</slash:comments>
		
		
			</item>
		<item>
		<title>We see something that works, and then we understand it</title>
		<link>https://lemire.me/blog/2025/12/04/we-see-something-that-works-and-then-we-understand-it/</link>
					<comments>https://lemire.me/blog/2025/12/04/we-see-something-that-works-and-then-we-understand-it/#respond</comments>
		
		<dc:creator><![CDATA[Daniel Lemire]]></dc:creator>
		<pubDate>Thu, 04 Dec 2025 15:40:59 +0000</pubDate>
				<category><![CDATA[]]></category>
		<guid isPermaLink="false">https://lemire.me/blog/?p=22357</guid>

					<description><![CDATA[<img width="150" height="150" src="https://lemire.me/blog/wp-content/uploads/2025/12/Capture-decran-le-2025-12-04-a-10.40.38-150x150.png" class="webfeedsFeaturedVisual wp-post-image" alt="" style="display: block; margin-bottom: 5px; clear:both;max-width: 100%;" link_thumbnail="" decoding="async" loading="lazy" />“We see something that works, and then we understand it.” (Thomas Dullien) It is a deeper insight than it seems. Young people spend years in school learning the reverse: understanding happens before progress. That is the linear theory of innovation. So Isaac Newton comes up with his three laws of mechanics, and we get a &#8230; <a href="https://lemire.me/blog/2025/12/04/we-see-something-that-works-and-then-we-understand-it/" class="more-link">Continue reading <span class="screen-reader-text">We see something that works, and then we understand it</span></a>]]></description>
										<content:encoded><![CDATA[<img width="150" height="150" src="https://lemire.me/blog/wp-content/uploads/2025/12/Capture-decran-le-2025-12-04-a-10.40.38-150x150.png" class="webfeedsFeaturedVisual wp-post-image" alt="" style="display: block; margin-bottom: 5px; clear:both;max-width: 100%;" link_thumbnail="" decoding="async" loading="lazy" /><p>“We see something that works, and then we understand it.” (Thomas Dullien)</p>
<p>It is a deeper insight than it seems.</p>
<p>Young people spend years in school learning the reverse: understanding happens before progress. That is the linear theory of innovation.</p>
<p>So Isaac Newton comes up with his three laws of mechanics, and we get a clockmaking boom. Of course, that’s not what happened: we get the pendulum clock in 1656, then Hooke (1660) and Newton (1665–1666) get to think about forces, speed, motion, and latent energy.</p>
<p>The linear model of innovation makes as much sense as the waterfall model in software engineering. In the waterfall model, you are taught that you first need to design every detail of your software application (e.g., using a language like UML) before you implement it. To this day, half of the information technology staff members at my school are made up of “analysts” whose main job is supposedly to create such designs based on requirements and supervise execution.</p>
<p>Both the linear theory and the waterfall model are forms of thinkism, a term I learned from Kevin Kelly. Thinkism sets aside practice and experience. It is the belief that given a problem, you should just think long and hard about it, and if you spend enough time thinking, you will solve it.</p>
<p>Thinkism works well in school. The teacher gives you all the concepts, then gives you a problem that, by a wonderful coincidence, can be solved just by thinking with the tools the same teacher just gave you.</p>
<p>As a teacher, I can tell you that students get really angry if you put a question on an exam that requires a concept not explicitly covered in class. Of course, if you work as an engineer and you’re stuck on a problem and you tell your boss it cannot be solved with the ideas you learned in college… you’re going to look like a fool.</p>
<p>If you’re still in school, here’s a fact: you will learn as much or more every year of your professional life than you learned during an entire university degree—assuming you have a real engineering job.</p>
<p>Thinkism also works well in other limited domains beyond school. It works well in bureaucratic settings where all the rules are known and you’re expected to apply them without question. There are many jobs where you first learn and then apply. And if you ever encounter new conditions where your training doesn’t directly apply, you’re supposed to report back to your superiors, who will then tell you what to do.</p>
<p>But if you work in research and development, you always begin with incomplete understanding. And most of the time, even if you could read everything ever written about your problem, you still wouldn’t understand enough to solve it. The way you make discoveries is often to either try something that seems sensible, or to observe something that happens to work—maybe your colleague has a practical technique that just works—and then you start thinking about it, formalizing it, putting it into words… and it becomes a discovery.</p>
<p>And the reason it often works this way is that “nobody knows anything.” The world is so complex that even the smartest individual knows only a fraction of what there is to know, and much of what they think they know is slightly wrong—and they don’t know which part is wrong.</p>
<p>So why should you care about how progress happens? You should care because…<br />
1. It gives you a recipe for breakthroughs: spend more time observing and trying new things… and less time thinking abstractly.<br />
2. Stop expecting an AI to cure all diseases or solve all problems just because it can read all the scholarship and “think” for a very long time. No matter how much an AI “knows,” it is always too little.</p>
<p><strong>Further reading</strong>: Godin, Benoît (2017). Models of innovation: The history of an idea. MIT press.</p>
]]></content:encoded>
					
					<wfw:commentRss>https://lemire.me/blog/2025/12/04/we-see-something-that-works-and-then-we-understand-it/feed/</wfw:commentRss>
			<slash:comments>0</slash:comments>
		
		
			</item>
		<item>
		<title>Why don&#8217;t we get more scientific breakthroughs?</title>
		<link>https://lemire.me/blog/2025/12/03/why-dont-we-get-more-scientific-breakthroughs/</link>
					<comments>https://lemire.me/blog/2025/12/03/why-dont-we-get-more-scientific-breakthroughs/#comments</comments>
		
		<dc:creator><![CDATA[Daniel Lemire]]></dc:creator>
		<pubDate>Wed, 03 Dec 2025 20:41:26 +0000</pubDate>
				<category><![CDATA[]]></category>
		<guid isPermaLink="false">https://lemire.me/blog/?p=22354</guid>

					<description><![CDATA[<img width="150" height="150" src="https://lemire.me/blog/wp-content/uploads/2025/12/Capture-decran-le-2025-12-03-a-15.40.56-150x150.png" class="webfeedsFeaturedVisual wp-post-image" alt="" style="display: block; margin-bottom: 5px; clear:both;max-width: 100%;" link_thumbnail="" decoding="async" loading="lazy" />It is absolutely clear to me that large language models represent the most significant scientific breakthrough of the past fifty years. The nature of that breakthrough has far reaching implications for what is happening in science today. And I believe that the entire scientific establishment is refusing to acknowledge it. We often excuse our slow &#8230; <a href="https://lemire.me/blog/2025/12/03/why-dont-we-get-more-scientific-breakthroughs/" class="more-link">Continue reading <span class="screen-reader-text">Why don&#8217;t we get more scientific breakthroughs?</span></a>]]></description>
										<content:encoded><![CDATA[<img width="150" height="150" src="https://lemire.me/blog/wp-content/uploads/2025/12/Capture-decran-le-2025-12-03-a-15.40.56-150x150.png" class="webfeedsFeaturedVisual wp-post-image" alt="" style="display: block; margin-bottom: 5px; clear:both;max-width: 100%;" link_thumbnail="" decoding="async" loading="lazy" /><p>It is absolutely clear to me that large language models represent the most significant scientific breakthrough of the past fifty years. The nature of that breakthrough has far reaching implications for what is happening in science today. And I believe that the entire scientific establishment is refusing to acknowledge it.</p>
<p>We often excuse our slow progress with tired clichés like “all the low-hanging fruit has been picked.” It is an awfully convenient excuse if you run a scientific institution that pretends to lead the world in research&#8212;but in reality is mired in bureaucracy, stagnation and tradition.</p>
<p>A quick look at the world around us tells a different story, progress is possible and even moderately easy, even through the lens of everyday experience. I have been programming in Python for twenty years and even wrote a book about it. Managing dependencies has always been a painful, frustrating process—seemingly unsolvable. The best anyone could manage was to set up a virtual environment. Yes, it was clumsy and awkward as you know if you programmed in Python, but that was the state of the art after decades of effort by millions of Python developers. Then, in 2024, a single tool called uv appeared and suddenly made the Python ecosystem feel sane, bringing it in line with the elegance of Go or JavaScript runtimes. In retrospect, the solution seems almost obvious.</p>
<p>NASA has twice the budget of SpaceX. Yet SpaceX has launched more missions to orbit in the past decade than NASA managed in the previous fifty years. The difference is not money; it is culture, agility, and a willingness to embrace new ideas.</p>
<p>Large language models have answered many profound scientific questions, yet one of the deepest concerns the very nature of language itself. For generations, the prevailing view was that human language depends on a vast set of logical rules that the brain applies unconsciously. That rule-based paradigm dominated much of twentieth-century linguistics and even shaped the early web. We spent an entire decade chasing the dream of the Semantic Web, convinced that if we all shared formal, machine-readable metadata, rule engines would deliver web-scale intelligence. Thanks to large language models, we now know that language does not need to be rule-based at all. Verbal intelligence does not need to require on explicit rules.<br />
It is a tremendous scientific insight that overturns decades of established thinking.</p>
<p>A common objection is that I am conflating engineering with science. Large language models are just engineering. I invite you to examine the history of science more closely. Scientific progress has always depended on the tools we build.</p>
<p>You need a seaworthy boat before you can sail to distant islands, observe wildlife, and formulate the theory of natural selection. Measuring the Earth’s radius with the precision achieved by the ancient Greeks required both sophisticated engineering and non-trivial mathematics. Einstein’s insights into relativity emerged in an era when people routinely experienced relative motion on trains; the phenomenon was staring everyone in the face.</p>
<p>The tidy, linear model of scientific progress—professors thinking deep thoughts in ivory towers, then handing blueprints to engineers—is indefensible. Fast ships and fast trains are not just consequences of scientific discovery; they are also wellsprings of it. Real progress is messy, iterative, and deeply intertwined with the tools we build. Large language models are the latest, most dramatic example of that truth.</p>
<p>So what does it tell us about science? I believe it is telling us that we need to rethink our entire approach to scientific research. We need to embrace agility, experimentation, and a willingness to challenge established paradigms. The bureaucratization of science was a death sentence for progress.</p>
]]></content:encoded>
					
					<wfw:commentRss>https://lemire.me/blog/2025/12/03/why-dont-we-get-more-scientific-breakthroughs/feed/</wfw:commentRss>
			<slash:comments>4</slash:comments>
		
		
			</item>
		<item>
		<title>How fast can browsers process base64 data?</title>
		<link>https://lemire.me/blog/2025/11/29/how-fast-can-browsers-process-base64-data/</link>
					<comments>https://lemire.me/blog/2025/11/29/how-fast-can-browsers-process-base64-data/#comments</comments>
		
		<dc:creator><![CDATA[Daniel Lemire]]></dc:creator>
		<pubDate>Sat, 29 Nov 2025 05:00:03 +0000</pubDate>
				<category><![CDATA[]]></category>
		<guid isPermaLink="false">https://lemire.me/blog/?p=22340</guid>

					<description><![CDATA[<img width="150" height="150" src="https://lemire.me/blog/wp-content/uploads/2025/11/Capture-decran-le-2025-11-28-a-23.50.58-150x150.png" class="webfeedsFeaturedVisual wp-post-image" alt="" style="display: block; margin-bottom: 5px; clear:both;max-width: 100%;" link_thumbnail="" decoding="async" loading="lazy" />Base64 is a binary-to-text encoding scheme that converts arbitrary binary data (like images, files, or any sequence of bytes) into a safe, printable ASCII string using a 64-character alphabet (A–Z, a–z, 0–9, +, /). Browsers use it in JavaScript to embedding binary data directly in code or HTML or to transmitting binary data as text. &#8230; <a href="https://lemire.me/blog/2025/11/29/how-fast-can-browsers-process-base64-data/" class="more-link">Continue reading <span class="screen-reader-text">How fast can browsers process base64 data?</span></a>]]></description>
										<content:encoded><![CDATA[<img width="150" height="150" src="https://lemire.me/blog/wp-content/uploads/2025/11/Capture-decran-le-2025-11-28-a-23.50.58-150x150.png" class="webfeedsFeaturedVisual wp-post-image" alt="" style="display: block; margin-bottom: 5px; clear:both;max-width: 100%;" link_thumbnail="" decoding="async" loading="lazy" /><p>Base64 is a binary-to-text encoding scheme that converts arbitrary binary data (like images, files, or any sequence of bytes) into a safe, printable ASCII string using a 64-character alphabet (A–Z, a–z, 0–9, +, /). Browsers use it in JavaScript to embedding binary data directly in code or HTML or to transmitting binary data as text.</p>
<p>Browsers recently added convenient and safe functions to process base 64 functions <code>Uint8Array.toBase64()</code> and <code>Uint8Array.fromBase64()</code>. Though they are several parameters, it comes down to an encoding and a decoding function.</p>
<pre class="code_syntax" style="color: #000000; background: #ffffff;"><span class="line_wrapper">const b64 <span style="color: #808030;">=</span> Uint8Array<span style="color: #808030;">.</span>toBase64<span style="color: #808030;">(</span><span style="color: #400000;">bytes</span><span style="color: #808030;">)</span><span style="color: #808030;">;</span>      <span style="color: #44aadd;">//</span> string          </span>
<span class="line_wrapper">const recovered <span style="color: #808030;">=</span> Uint8Array<span style="color: #808030;">.</span>fromBase64<span style="color: #808030;">(</span>b64<span style="color: #808030;">)</span><span style="color: #808030;">;</span> <span style="color: #44aadd;">//</span> Uint8Array </span></pre>
<p>When encoding, it takes 24 bits from the input. These 24 bits are divided into four 6-bit segments, and each 6-bit value (ranging from 0 to 63) is mapped to a specific character from the Base64 alphabet: the first 26 characters are uppercase letters A-Z, the next 26 are lowercase a-z, then digits 0-9, followed by + and / as the 62nd and 63rd characters. The equals sign = is used as padding when the input length is not a multiple of 3 bytes.</p>
<p>How fast can they be ?</p>
<p>Suppose that you consumed 3 bytes and produced 4 bytes per CPU cycle. At 4.5 GHz, it would be that you would encode to base64 at 13.5 GB/s. We expect lower performance going in the other direction. When encoding, any input is valid: any binary data will do. However, when decoding, we must handle errors and skip spaces.</p>
<p><a href="https://simdutf.github.io/browserbase64/">I wrote an in-browser benchmark</a>. You can try it out in your favorite browser.</p>
<p>I decided to try it out on my Apple M4 processor, to see how fast the various browsers are. I use blocks of 64 kiB. The speed is measured with respect to the binary data.</p>
<table>
<tbody>
<tr>
<th>browser</th>
<th>encoding speed</th>
<th>decoding speed</th>
</tr>
<tr>
<td>Safari</td>
<td>17 GB/s</td>
<td>9.4 GB/s</td>
</tr>
<tr>
<td>SigmaOS</td>
<td>17 GB/s</td>
<td>9.4 GB/s</td>
</tr>
<tr>
<td>Chrome</td>
<td>19 GB/s</td>
<td>4.6 GB/s</td>
</tr>
<tr>
<td>Edge</td>
<td>19 GB/s</td>
<td>4.6 GB/s</td>
</tr>
<tr>
<td>Brave</td>
<td>19 GB/s</td>
<td>4.6 GB/s</td>
</tr>
<tr>
<td>Servo</td>
<td>0.34 GB/s</td>
<td>0.40 GB/s</td>
</tr>
<tr>
<td>Firefox (old)</td>
<td>0.34 GB/s</td>
<td>0.40 GB/s</td>
</tr>
<tr>
<td>Firefox (version 146)</td>
<td>2.2 GB/s</td>
<td>2.6 GB/s</td>
</tr>
</tbody>
</table>
<p><a href="http://lemire.me/blog/wp-content/uploads/2025/11/browser_encoding_decoding_speed-scaled.png"><img loading="lazy" decoding="async" class="alignnone size-large wp-image-22349" src="http://lemire.me/blog/wp-content/uploads/2025/11/browser_encoding_decoding_speed-1024x424.png" alt="" width="660" height="273" srcset="https://lemire.me/blog/wp-content/uploads/2025/11/browser_encoding_decoding_speed-1024x424.png 1024w, https://lemire.me/blog/wp-content/uploads/2025/11/browser_encoding_decoding_speed-300x124.png 300w, https://lemire.me/blog/wp-content/uploads/2025/11/browser_encoding_decoding_speed-768x318.png 768w, https://lemire.me/blog/wp-content/uploads/2025/11/browser_encoding_decoding_speed-1536x636.png 1536w, https://lemire.me/blog/wp-content/uploads/2025/11/browser_encoding_decoding_speed-2048x848.png 2048w" sizes="auto, (max-width: 660px) 100vw, 660px" /></a></p>
<p>Safari seems to have slightly slower encoding speed than the Chromium browsers (Chome, Edge, Brave), however it is about twice as fast at decoding. Servo and Firefox have similarly poor performance with the unexpected result of having faster decoding speed than encoding speed. (Newer versions of Firefox, released after this post, have better performance.) I could have tried other browsers but most seem to be derivatives of Chromium or WebKit.</p>
<p>For context, the disk of a good laptop can sustain over 3 GB/s of read or write speed. Some high-end laptops have disks that are faster than 5 GB/s. In theory, your wifi connections might get close to 5 GB/s with Wifi 7. Some Internet providers might get close to providing similar network speeds although your Internet connection is likely several times slower.</p>
<p>The speeds on most browsers are faster than you might naively guess. They are faster than networks or disks.</p>
<p><strong>Note.</strong> The slower decoding speed on Chromium-based browsers appears to depend on the v8 JavaScript engine which decodes the string first to a temporary buffer, before finally copying from the temporary buffer to the final destination. (See BUILTIN(Uint8ArrayFromBase64) in v8/src/builtins/builtins-typed-array.cc.)</p>
<p><strong>Note</strong>. Denis Palmeiro from Mozzila let me know that upcoming changes in Firefox will speed up performance of the base64 functions. I have since updated the numbers above.</p>
]]></content:encoded>
					
					<wfw:commentRss>https://lemire.me/blog/2025/11/29/how-fast-can-browsers-process-base64-data/feed/</wfw:commentRss>
			<slash:comments>4</slash:comments>
		
		
			</item>
		<item>
		<title>Antifragile Programming and Why AI Won’t Steal Your Job</title>
		<link>https://lemire.me/blog/2025/11/29/antifragile-programming-and-why-ai-wont-steal-your-job/</link>
					<comments>https://lemire.me/blog/2025/11/29/antifragile-programming-and-why-ai-wont-steal-your-job/#comments</comments>
		
		<dc:creator><![CDATA[Daniel Lemire]]></dc:creator>
		<pubDate>Sat, 29 Nov 2025 02:39:56 +0000</pubDate>
				<category><![CDATA[]]></category>
		<guid isPermaLink="false">https://lemire.me/blog/?p=22337</guid>

					<description><![CDATA[<img width="150" height="150" src="https://lemire.me/blog/wp-content/uploads/2025/11/G6dHib4XsAEyBYe-150x150.jpeg" class="webfeedsFeaturedVisual wp-post-image" alt="" style="display: block; margin-bottom: 5px; clear:both;max-width: 100%;" link_thumbnail="" decoding="async" loading="lazy" />Whenever I say I dislike debugging and organize my programming habits around avoiding it, there is always pushback: “You must not use a good debugger.” To summarize my view: I want my software to be antifragile (credit to Nassim Taleb for the concept). The longer I work on a codebase, the easier it should become &#8230; <a href="https://lemire.me/blog/2025/11/29/antifragile-programming-and-why-ai-wont-steal-your-job/" class="more-link">Continue reading <span class="screen-reader-text">Antifragile Programming and Why AI Won’t Steal Your Job</span></a>]]></description>
										<content:encoded><![CDATA[<img width="150" height="150" src="https://lemire.me/blog/wp-content/uploads/2025/11/G6dHib4XsAEyBYe-150x150.jpeg" class="webfeedsFeaturedVisual wp-post-image" alt="" style="display: block; margin-bottom: 5px; clear:both;max-width: 100%;" link_thumbnail="" decoding="async" loading="lazy" /><p>Whenever I say I dislike debugging and organize my programming habits around avoiding it, there is always pushback: “You must not use a good debugger.”</p>
<p>To summarize my view: I want my software to be antifragile (credit to Nassim Taleb for the concept). The longer I work on a codebase, the easier it should become to fix bugs.</p>
<p>Each addition to a pieces of code can be viewed as a stress. If nothing is done, the code gets slightly worse, harder to maintain, more prone to bugs. Thankfully, you can avoid such outcome.</p>
<p>That’s not natural. For most developers lacking deep expertise, as the codebase grows, bugs become harder to fix: you chase symptoms through layers of code, hunt heisenbugs that vanish in the debugger, or fix one bug only to create another. The more code you have, the worse it gets. Such code is fragile. Adding a new feature risks breaking old, seemingly unrelated parts.</p>
<p>In my view, the inability to produce antifragile code explains the extreme power-law distribution in programming: most of the code we rely on daily was written by a tiny fraction of all programmers who have mastered antifragility.</p>
<p>How do you reverse this? How do you ensure that the longer you work on the code, the shallower the bugs become?<br />
There are well-known techniques, and adding lots of tests and checks definitely helps. You can write antifragile code without tests or debug-time checks… but you’ll need something functionally equivalent.</p>
<p>Far-reaching prescriptions (“you must use language X, tool Y, method Z”) are usually cargo-cult nonsense. Copying Linus Torvalds’ tools or swearing style won’t guarantee success. But antifragillity is not a prescription, it is a desired outcome.</p>
<p>Defensive programming itself is uncontroversial—yet it wasn’t common in the 1980s and still isn’t the default for many today.<br />
Of course, a full defensive approach isn’t always applicable or worth the cost.</p>
<p>For example, if I’m vibe-coding a quick web app with more JavaScript than I care to read, I’ll just run it in the browser’s debugger. It works fine. I’m not using that code to control a pacemaker, and I’m not expecting to be woken up at midnight on Christmas to fix it.</p>
<p>If your program is 500 lines and you’ll run it 20 times a year, antifragility isn’t worth pursuing.</p>
<p>Large language models can generate defensive code, but if you’ve never written defensively yourself and you learn to program primarily with AI assistance, your software will probably remain fragile.</p>
<p>You can add code quickly, but the more you add, the bigger your problems become.</p>
<p>That’s the crux of the matter. Writing code was never the hard part—I could write code at 12, and countless 12-year-olds today can write simple games and apps. In the same way, a 12-year-old can build a doghouse with a hammer, nails, and wood. Getting 80% of the way has always been easy.</p>
<p>Scaling complexity without everything collapsing—that’s the hard part.</p>
]]></content:encoded>
					
					<wfw:commentRss>https://lemire.me/blog/2025/11/29/antifragile-programming-and-why-ai-wont-steal-your-job/feed/</wfw:commentRss>
			<slash:comments>5</slash:comments>
		
		
			</item>
		<item>
		<title>How good are Chinese CPUs? Benchmarking the Loongson 3A6000</title>
		<link>https://lemire.me/blog/2025/11/23/how-good-are-chinese-cpus-benchmarking-the-loongson-3a6000/</link>
					<comments>https://lemire.me/blog/2025/11/23/how-good-are-chinese-cpus-benchmarking-the-loongson-3a6000/#comments</comments>
		
		<dc:creator><![CDATA[Daniel Lemire]]></dc:creator>
		<pubDate>Sun, 23 Nov 2025 23:09:59 +0000</pubDate>
				<category><![CDATA[]]></category>
		<guid isPermaLink="false">https://lemire.me/blog/?p=22328</guid>

					<description><![CDATA[<img width="150" height="150" src="https://lemire.me/blog/wp-content/uploads/2025/11/Capture-decran-le-2025-11-23-a-18.06.52-150x150.png" class="webfeedsFeaturedVisual wp-post-image" alt="" style="display: block; margin-bottom: 5px; clear:both;max-width: 100%;" link_thumbnail="" decoding="async" loading="lazy" />I maintain a few widely used libraries that have optimized code paths based on the specific processor being used. We started supporting Loongson processors in recent years, but I did not have access to a Loongson processor until now. To my knowledge, they are not widely distributed in North America. This made it difficult for &#8230; <a href="https://lemire.me/blog/2025/11/23/how-good-are-chinese-cpus-benchmarking-the-loongson-3a6000/" class="more-link">Continue reading <span class="screen-reader-text">How good are Chinese CPUs? Benchmarking the Loongson 3A6000</span></a>]]></description>
										<content:encoded><![CDATA[<img width="150" height="150" src="https://lemire.me/blog/wp-content/uploads/2025/11/Capture-decran-le-2025-11-23-a-18.06.52-150x150.png" class="webfeedsFeaturedVisual wp-post-image" alt="" style="display: block; margin-bottom: 5px; clear:both;max-width: 100%;" link_thumbnail="" decoding="async" loading="lazy" /><p>I maintain a few widely used libraries that have optimized code paths based on the specific processor being used. We started supporting Loongson processors in recent years, but I did not have access to a Loongson processor until now. To my knowledge, they are not widely distributed in North America. This made it difficult for me to do any performance tuning. Thankfully, kind people from the Loongson Hobbyists&#8217; Community helped me acquire a small computer with a Loongson processors.</p>
<p>My understanding is that Loongson processors serve to reduce the dependence of China on architectures like x64 and ARM. They use their own proprietary architecture called LoongArch. These processors have two generations of SIMD (single instruction, multiple data) vector extensions designed for parallel processing : LSX and LASX. LSX (Loongson SIMD Extension) provides 128-bit wide vector registers and instructions roughly comparable to ARM NEON or early x64 SSE extensions. LASX (Loongson Advanced SIMD Extension), first appearing in the Loongson 3A5000 (2021), is the 256-bit successor that is somewhat comparable with x64 AVX/AVX2 present in most x64 (Intel and AMD) processors.</p>
<p>The LoongArch architecture is not yet universally supported. You can run most of Linux (Debian), but Visual Studio Code cannot ssh into a LoongArch system although there is community support in VSCodium. However, recent versions of the GCC and LLVM compilers support LoongArch.</p>
<p>My Loongson-3A6000 processor supports both LASX and LSX. However, I do not know how to do runtime dispatching under LoongArch: check whether LASX is supported as the program is running and switching on LASX support dynamically. I can force the compiler to use LASX (by compiling with <code>-march=native</code>) but my early experiments show that LASX routines are no faster than LSX routines&#8230; possibly a sign of poor optimization on our part.</p>
<p>I decided to run some tests to see how this Chinese processor compares with a relatively recent Intel processor (Ice Lake). The comparison is not meant to be fair. The Ice Lake processor is somewhat older but it is an expensive server-class processor. Further, the code that I am using is likely to have been tuned for x64 processors much more than for Loongson processors. I am also not trying to be exhaustive: I just want a broad idea.</p>
<p>Let us first consider number parsing. My test is reproducible.</p>
<div class="highlight" style="background: #f8f8f8;">
<pre style="line-height: 125%;"><code><span style="color: #000;">git</span> <span style="color: #000;">clone</span> <span style="color: #000;">https</span><span style="color: #000; font-weight: bold;">:</span><span style="color: #8f5902; font-style: italic;">//github.com/lemire/simple_fastfloat_benchmark.git</span>
<span style="color: #000;">cd</span> <span style="color: #000;">simple_fastfloat_benchmark</span>
<span style="color: #000;">cmake</span> <span style="color: #ce5c00; font-weight: bold;">-</span><span style="color: #000;">B</span> <span style="color: #000;">build</span> 
<span style="color: #000;">cmake</span> <span style="color: #ce5c00; font-weight: bold;">--</span><span style="color: #000;">build</span> <span style="color: #000;">build</span>
<span style="color: #000; font-weight: bold;">.</span><span style="color: #ce5c00; font-weight: bold;">/</span><span style="color: #000;">build</span><span style="color: #ce5c00; font-weight: bold;">/</span><span style="color: #000;">benchmarks</span><span style="color: #ce5c00; font-weight: bold;">/</span><span style="color: #000;">benchmark</span> <span style="color: #a40000; border: 1px solid #EF2929;">#</span> <span style="color: #000;">use</span> <span style="color: #000;">sudo</span> <span style="color: #204a87; font-weight: bold;">for</span> <span style="color: #000;">perf</span> <span style="color: #000;">counters</span>
</code></pre>
</div>
<p>This will parse random numbers. I focus on the fast_float results. I use GCC 15 in both instances.</p>
<table>
<thead>
<tr>
<th>processor</th>
<th>instructions/float</th>
<th>ins/cycle</th>
<th>GHz</th>
</tr>
</thead>
<tbody>
<tr>
<td>Loongson-3A6000</td>
<td> 377</td>
<td>4.92</td>
<td>2.50</td>
</tr>
<tr>
<td>Xeon Gold 6338</td>
<td>295</td>
<td>5.07</td>
<td>3.19</td>
</tr>
</tbody>
</table>
<p>So the Loongson-3A6000 retires about as many instructions per cycle as the Intel processor. However, it requires more instructions and its clock frequency is lower. So the Intel processor wins this round.</p>
<p>What if we replace the fast_float function by abseil&#8217;s number parse (from Google). I get that both processors are entirely comparable, except for the clock frequency.</p>
<table>
<thead>
<tr>
<th>processor</th>
<th>instructions/float</th>
<th>ins/cycle</th>
<th>GHz</th>
</tr>
</thead>
<tbody>
<tr>
<td>Loongson-3A6000</td>
<td>562</td>
<td>4.42</td>
<td>2.50</td>
</tr>
<tr>
<td>Xeon Gold 6338</td>
<td>571</td>
<td>5.08</td>
<td>3.19</td>
</tr>
</tbody>
</table>
<p>Intel still wins due to the higher frequency, but by a narrower margin.</p>
<p>I wanted to test the Loongson processor on SIMD intensive tasks. So I used the simdutf library to do some string transcoding.</p>
<div class="highlight" style="background: #f8f8f8;">
<pre style="line-height: 125%;"><code>git clone https://github.com/simdutf/simdutf/git
cd simdutf
cmake -B build -D SIMDUTF_BENCHMARKS=ON
cmake --build build --target benchmark
./build/benchmarks/benchmark -P utf8_to_utf16le -F README.md 
<span style="color: #000080; font-weight: bold;">#</span> use sudo for perf counters
</code></pre>
</div>
<p>My results are as follows, depending on which instructions are used. The Intel processor has three options (128-bit with SSSE3, 256-bit with AVX2 and 512-bit with AVX-512) while the Loongson processor has two options (128-bit with LSX and 256-bit with LASX).</p>
<table>
<thead>
<tr>
<th>processor</th>
<th>ins/byte</th>
<th>ins/cycle</th>
<th>GHz</th>
</tr>
</thead>
<tbody>
<tr>
<td>Loongson-3A6000 (LSX)</td>
<td> 0.562</td>
<td>2.633</td>
<td>2.50</td>
</tr>
<tr>
<td>Loongson-3A6000 (LASX)</td>
<td> 0.390</td>
<td>1.549</td>
<td>2.50</td>
</tr>
<tr>
<td>Xeon Gold 6338 (SSSE3)</td>
<td>0.617</td>
<td>5.07</td>
<td>3.236</td>
</tr>
<tr>
<td>Xeon Gold 6338 (AVX2)</td>
<td>0.364</td>
<td>2.625</td>
<td>3.19</td>
</tr>
<tr>
<td>Xeon Gold 6338 (AVX-512)</td>
<td>0.271</td>
<td>1.657</td>
<td>3.127</td>
</tr>
</tbody>
</table>
<p>Roughly speaking, the Loongson transcodes a simple ASCII file (the README.md file) at 10 GB/s whereas the Intel processor does it slightly faster than 20 GB/s.</p>
<p>Overall, I find these results quite good for the Loongson processor.</p>
<p><a href="https://chipsandcheese.com/p/loongson-3a6000-a-star-among-chinese-cpus">The folks at Chips and Cheese have a more extensive review</a>. They put the Chinese processor somewhere between the first AMD Zen processors and the AMD Zen 2 processors on a per core basis. The AMD Zen 2 processors power current gaming consoles such as the PlayStation 5. Chips and Cheese concluded &#8220;Engineers at Loongson have a lot to be proud of&#8221;: I agree.</p>
]]></content:encoded>
					
					<wfw:commentRss>https://lemire.me/blog/2025/11/23/how-good-are-chinese-cpus-benchmarking-the-loongson-3a6000/feed/</wfw:commentRss>
			<slash:comments>6</slash:comments>
		
		
			</item>
		<item>
		<title>AMD vs. Intel: a Unicode benchmark</title>
		<link>https://lemire.me/blog/2025/11/16/amd-vs-intel-a-unicode-benchmark/</link>
					<comments>https://lemire.me/blog/2025/11/16/amd-vs-intel-a-unicode-benchmark/#comments</comments>
		
		<dc:creator><![CDATA[Daniel Lemire]]></dc:creator>
		<pubDate>Sun, 16 Nov 2025 01:04:16 +0000</pubDate>
				<category><![CDATA[]]></category>
		<guid isPermaLink="false">https://lemire.me/blog/?p=22309</guid>

					<description><![CDATA[<img width="150" height="150" src="https://lemire.me/blog/wp-content/uploads/2025/11/Capture-decran-le-2025-11-15-a-20.08.20-150x150.png" class="webfeedsFeaturedVisual wp-post-image" alt="" style="display: block; margin-bottom: 5px; clear:both;max-width: 100%;" link_thumbnail="" decoding="async" loading="lazy" />Roughly speaking, our processors come in two types, the ARM processors found in your phone and the x64 processors made by Intel and AMD. The best server processors used to be made by Intel. Increasingly, Intel is struggling to keep up. Recently, Amazon has made available the latest AMD microarchitecture (Zen 5). Specifically, if you &#8230; <a href="https://lemire.me/blog/2025/11/16/amd-vs-intel-a-unicode-benchmark/" class="more-link">Continue reading <span class="screen-reader-text">AMD vs. Intel: a Unicode benchmark</span></a>]]></description>
										<content:encoded><![CDATA[<img width="150" height="150" src="https://lemire.me/blog/wp-content/uploads/2025/11/Capture-decran-le-2025-11-15-a-20.08.20-150x150.png" class="webfeedsFeaturedVisual wp-post-image" alt="" style="display: block; margin-bottom: 5px; clear:both;max-width: 100%;" link_thumbnail="" decoding="async" loading="lazy" /><p>Roughly speaking, our processors come in two types, the ARM processors found in your phone and the x64 processors made by Intel and AMD. The best server processors used to be made by Intel. Increasingly, Intel is struggling to keep up.</p>
<p>Recently, Amazon has made available the latest AMD microarchitecture (Zen 5). Specifically, if you start an <code>r8a</code> instance, you get an AMD EPYC 9R45 processor. The Intel counterpart (<code>r8i</code>) has an Intel Xeon 6975P-C processor. This Intel processor is from the Granite Rapids family (2024).</p>
<p>Michael Larabel at Phoronix has a couple of articles on the new AMD processors. One of them is entitled <a href="https://www.phoronix.com/review/ec2-m8a-amd-epyc-turin">AMD EPYC 9005 Brings Incredible Performance</a>. The article is well worth reading. He finds that, compared with the prior AMD processor (with a Zen 4 microarchitecture), the AMD EPYC 9R4 is 1.6 times faster. In a <a href="https://www.phoronix.com/review/aws-m8a-m8g-m8i-benchmarks">second article</a>, Michael compares the AMD processor with the corresponding Intel processor. He finds that the AMD processor is 1.6 times faster than the Intel processor.</p>
<p>I decided to take them out for a spin. I happened to be working on a new release of the <a href="https://github.com/simdutf/simdutf">simdutf library</a>. The simdutf library allows fast transcoding between UTF-8, UTF-16, and UTF-32 encodings, among other features. It is used by major browsers and JavaScript runtimes like Node.js or Bun. A common operation that matters is the conversion from UTF-16 to UTF-8. Internally, JavaScript relies on UTF-16, thus most characters use 2 bytes, whereas the Internet defaults on UTF-8 where characters can use between 1 and 4 bytes.</p>
<p>UTF-16 is a variable-length Unicode encoding that represents most common characters using a single 16-bit code unit (values from <code>0x0000</code> to <code>0xd7ff</code> and <code>0xe000</code> to <code>0xffff</code>), but extends to the full Unicode range beyond <code>U+FFFF</code> by using surrogate pairs: a high surrogate (<code>0xd800</code> to <code>0xdbff</code>) followed by a low surrogate (<code>0xdc00</code> to <code>0xdfff</code>), which together encode a single supplementary character which maps to four UTF-8 bytes. Thus we may consider that each element of a surrogate pair counts for two bytes in UTF-8. A non-surrogate code unit in the range <code>0x0000</code> to <code>0x007f</code> (ASCII) becomes one byte, <code>0x0080</code> to <code>0x07ff</code> becomes two bytes, and <code>0x0800</code> to <code>0xffff</code> (excluding surrogates) becomes three bytes.</p>
<p>My benchmark code first determines how much output memory is required and then it does the transcoding.</p>
<div class="highlight" style="background: #f8f8f8;">
<pre style="line-height: 125%;"><code><span style="color: #204a87; font-weight: bold;">size_t</span> <span style="color: #000;">utf8_length</span> <span style="color: #ce5c00; font-weight: bold;">=</span> <span style="color: #000;">simdutf</span><span style="color: #ce5c00; font-weight: bold;">::</span><span style="color: #000;">utf8_length_from_utf16</span><span style="color: #000; font-weight: bold;">(</span><span style="color: #000;">str</span><span style="color: #000; font-weight: bold;">.</span><span style="color: #000;">data</span><span style="color: #000; font-weight: bold;">(),</span> <span style="color: #000;">str</span><span style="color: #000; font-weight: bold;">.</span><span style="color: #000;">size</span><span style="color: #000; font-weight: bold;">());</span>
<span style="color: #204a87; font-weight: bold;">if</span><span style="color: #000; font-weight: bold;">(</span><span style="color: #000;">buffer</span><span style="color: #000; font-weight: bold;">.</span><span style="color: #000;">size</span><span style="color: #000; font-weight: bold;">()</span> <span style="color: #ce5c00; font-weight: bold;">&lt;</span> <span style="color: #000;">utf8_length</span><span style="color: #000; font-weight: bold;">)</span> <span style="color: #000; font-weight: bold;">{</span>
  <span style="color: #000;">buffer</span><span style="color: #000; font-weight: bold;">.</span><span style="color: #000;">resize</span><span style="color: #000; font-weight: bold;">(</span><span style="color: #000;">utf8_length</span><span style="color: #000; font-weight: bold;">);</span>
<span style="color: #000; font-weight: bold;">}</span>
<span style="color: #000;">simdutf</span><span style="color: #ce5c00; font-weight: bold;">::</span><span style="color: #000;">convert_utf16_to_utf8</span><span style="color: #000; font-weight: bold;">(</span><span style="color: #000;">str</span><span style="color: #000; font-weight: bold;">.</span><span style="color: #000;">data</span><span style="color: #000; font-weight: bold;">(),</span> <span style="color: #000;">str</span><span style="color: #000; font-weight: bold;">.</span><span style="color: #000;">size</span><span style="color: #000; font-weight: bold;">(),</span> <span style="color: #000;">buffer</span><span style="color: #000; font-weight: bold;">.</span><span style="color: #000;">data</span><span style="color: #000; font-weight: bold;">());</span>
</code></pre>
</div>
<p>The transcoding code on a recent processor is not trivial (<a href="https://arxiv.org/pdf/2212.05098">Clausecker and Lemire, 2023</a>). However, the computation of the UTF-8 length from the UTF-16 data is a bit simpler.</p>
<p>These Intel and AMD processors support AVX-512 instructions: they are instructions that can operate on up to 64-byte registers compared to the 64-bit registers we normally use. It is an instance of SIMD: single instruction on multiple data. With AVX-512, you can load and process 32 UTF-16 units at once. Our main routine looks as follows.</p>
<div class="highlight" style="background: #f8f8f8;">
<pre style="line-height: 125%;"><code><span style="color: #000;">__m512i</span> <span style="color: #000;">input</span> <span style="color: #ce5c00; font-weight: bold;">=</span>
<span style="color: #000;">_mm512_loadu_si512</span><span style="color: #000; font-weight: bold;">(</span><span style="color: #000;">in</span><span style="color: #000; font-weight: bold;">);</span>
<span style="color: #000;">__mmask32</span> <span style="color: #000;">is_surrogate</span> <span style="color: #ce5c00; font-weight: bold;">=</span> <span style="color: #000;">_mm512_cmpeq_epi16_mask</span><span style="color: #000; font-weight: bold;">(</span>
<span style="color: #000;">_mm512_and_si512</span><span style="color: #000; font-weight: bold;">(</span><span style="color: #000;">input</span><span style="color: #000; font-weight: bold;">,</span> <span style="color: #000;">_mm512_set1_epi16</span><span style="color: #000; font-weight: bold;">(</span><span style="color: #0000cf; font-weight: bold;">0xf800</span><span style="color: #000; font-weight: bold;">)),</span>
<span style="color: #000;">_mm512_set1_epi16</span><span style="color: #000; font-weight: bold;">(</span><span style="color: #0000cf; font-weight: bold;">0xd800</span><span style="color: #000; font-weight: bold;">));</span>
<span style="color: #000;">__mmask32</span> <span style="color: #000;">c0</span> <span style="color: #ce5c00; font-weight: bold;">=</span>
<span style="color: #000;">_mm512_test_epi16_mask</span><span style="color: #000; font-weight: bold;">(</span><span style="color: #000;">input</span><span style="color: #000; font-weight: bold;">,</span> <span style="color: #000;">_mm512_set1_epi16</span><span style="color: #000; font-weight: bold;">(</span><span style="color: #0000cf; font-weight: bold;">0xff80</span><span style="color: #000; font-weight: bold;">));</span>
<span style="color: #000;">__mmask32</span> <span style="color: #000;">c1</span> <span style="color: #ce5c00; font-weight: bold;">=</span>
<span style="color: #000;">_mm512_test_epi16_mask</span><span style="color: #000; font-weight: bold;">(</span><span style="color: #000;">input</span><span style="color: #000; font-weight: bold;">,</span> <span style="color: #000;">_mm512_set1_epi16</span><span style="color: #000; font-weight: bold;">(</span><span style="color: #0000cf; font-weight: bold;">0xf800</span><span style="color: #000; font-weight: bold;">));</span>
<span style="color: #000;">count</span> <span style="color: #ce5c00; font-weight: bold;">+=</span> <span style="color: #000;">count_ones32</span><span style="color: #000; font-weight: bold;">(</span><span style="color: #000;">c0</span><span style="color: #000; font-weight: bold;">);</span>
<span style="color: #000;">count</span> <span style="color: #ce5c00; font-weight: bold;">+=</span> <span style="color: #000;">count_ones32</span><span style="color: #000; font-weight: bold;">(</span><span style="color: #000;">c1</span><span style="color: #000; font-weight: bold;">);</span>
<span style="color: #000;">count</span> <span style="color: #ce5c00; font-weight: bold;">-=</span> <span style="color: #000;">count_ones32</span><span style="color: #000; font-weight: bold;">(</span><span style="color: #000;">is_surrogate</span><span style="color: #000; font-weight: bold;">);</span>
</code></pre>
</div>
<p>The code processes a 512-bit vector of UTF-16 code units loaded from memory using <code>_mm512_loadu_si512</code>. It then identifies surrogate code units by first applying a bitwise AND (<code>_mm512_and_si512</code>) to mask each code unit with <code>0xf800</code>, retaining only the top five bits, and comparing the result (<code>_mm512_cmpeq_epi16_mask</code>) against <code>0xd800</code>; this produces a 32-bit mask where bits are set for any code unit in the surrogate range (<code>0xd800</code> to <code>0xdfff</code>), indicating potential UTF-16 surrogate pairs that should not contribute extra length in UTF-8. Next, we check (<code>_mm512_test_epi16_mask</code>) each code unit against a mask of <code>0xff80</code> using a bitwise test, setting bits in <code>c0</code> for any code unit that is not ASCII. Similarly, another <code>_mm512_test_epi16_mask</code> function against <code>0xf800</code> sets bits in <code>c1</code> for code units that require 3 bytes in UTF-8 (except for surrogate pairs). Finally, the code accumulates into a counter the number of set bits in <code>c0</code> and <code>c1</code>, then subtracts the popcount of the surrogate mask. Overall, we can process about 32 UTF-16 units using a dozen instructions. (Credit to Wojciech Muła for the insightful design and also to Yagiz Nizipli for helping me with related optimizations.)</p>
<p>A large Amazon instance with the AMD processor was 0.13892$/hour, while the Intel processor was 0.15976$/hour. I initiated both instances with Amazon Linux. I then ran the following commands in the shell.</p>
<div class="highlight" style="background: #f8f8f8;">
<pre style="line-height: 125%;"><code>sudo yum install cmake git gcc
sudo dnf install gcc14 gcc14-c++
git clone https://github.com/lemire/Code-used-on-Daniel-Lemire-s-blog.git
<span style="color: #204a87;">cd</span> Code-used-on-Daniel-Lemire-s-blog/2025/11/15
<span style="color: #000;">CXX</span><span style="color: #ce5c00; font-weight: bold;">=</span>gcc14-g++ cmake -B build
cmake --build build
./build/benchmark
</code></pre>
</div>
<p>I get the following results.</p>
<table>
<thead>
<tr>
<th>Processor</th>
<th>GB/s</th>
<th>GHz</th>
<th>Ins/Byte</th>
<th>Ins/Cycle</th>
</tr>
</thead>
<tbody>
<tr>
<td>AMD</td>
<td>11</td>
<td>4.5</td>
<td>1.7</td>
<td>4.0</td>
</tr>
<tr>
<td>Intel</td>
<td>6</td>
<td>3.9</td>
<td>1.7</td>
<td>2.6</td>
</tr>
</tbody>
</table>
<p>The benchmark results show that the AMD processor delivers nearly double the throughput of the Intel processor in UTF-16 to UTF-8 transcoding (10.53 GB/s versus 5.96 GB/s), aided in part by its higher operating frequency. Both systems require the same 1.71 instructions per byte, but AMD achieves markedly higher instructions per cycle (3.98 i/c versus 2.64 i/c), demonstrating superior execution efficiency within the AVX-512 pipeline. One of the reasons has to do with the number of execution units. The AMD processor has four units capable of doing compute on 512-bit registers while Intel is typically limited to only two such execution units.</p>
<p>My benchmark is more narrow than Larabel&#8217;s and they help show that AMD has a large advantage over Intel when using AVX-512 instructions. It is especially remarkable given that Intel invented AVX-512 and AMD was late in supporting it. One might say that AMD is beating Intel at its own game.</p>
<p><a href="https://github.com/lemire/Code-used-on-Daniel-Lemire-s-blog/tree/master/2025/11/15">My benchmarking code is available.</a></p>
<p><em>Further reading</em>: Robert Clausecker, Daniel Lemire, <a href="https://arxiv.org/pdf/2212.05098">Transcoding Unicode Characters with AVX-512 Instructions</a>, Software: Practice and Experience 53 (12), 2023.</p>
]]></content:encoded>
					
					<wfw:commentRss>https://lemire.me/blog/2025/11/16/amd-vs-intel-a-unicode-benchmark/feed/</wfw:commentRss>
			<slash:comments>1</slash:comments>
		
		
			</item>
		<item>
		<title>Automated Equality Checks in C++ with Reflection (C++26)</title>
		<link>https://lemire.me/blog/2025/11/09/automated-equality-checks-in-c-with-reflection-c26/</link>
					<comments>https://lemire.me/blog/2025/11/09/automated-equality-checks-in-c-with-reflection-c26/#comments</comments>
		
		<dc:creator><![CDATA[Daniel Lemire]]></dc:creator>
		<pubDate>Sun, 09 Nov 2025 22:36:55 +0000</pubDate>
				<category><![CDATA[]]></category>
		<guid isPermaLink="false">https://lemire.me/blog/?p=22300</guid>

					<description><![CDATA[<img width="150" height="150" src="https://lemire.me/blog/wp-content/uploads/2025/11/Capture-decran-le-2025-11-09-a-17.32.57-150x150.png" class="webfeedsFeaturedVisual wp-post-image" alt="" style="display: block; margin-bottom: 5px; clear:both;max-width: 100%;" link_thumbnail="" decoding="async" loading="lazy" />In C++, comparing two objects for equality is straightforward when they are simple types like integers or strings. But what about complex, nested structures? You may have to implement the comparison (operator==) manually for each class, which is error-prone and tedious. Consider a person class. class person { public: person(std::string n, int a) : name(n), &#8230; <a href="https://lemire.me/blog/2025/11/09/automated-equality-checks-in-c-with-reflection-c26/" class="more-link">Continue reading <span class="screen-reader-text">Automated Equality Checks in C++ with Reflection (C++26)</span></a>]]></description>
										<content:encoded><![CDATA[<img width="150" height="150" src="https://lemire.me/blog/wp-content/uploads/2025/11/Capture-decran-le-2025-11-09-a-17.32.57-150x150.png" class="webfeedsFeaturedVisual wp-post-image" alt="" style="display: block; margin-bottom: 5px; clear:both;max-width: 100%;" link_thumbnail="" decoding="async" loading="lazy" /><p>In C++, comparing two objects for equality is straightforward when they are simple types like integers or strings. But what about complex, nested structures? You may have to implement the comparison (<code>operator==</code>) manually for each class, which is error-prone and tedious.</p>
<p>Consider a <code>person</code> class.</p>
<div class="highlight" style="background: #f8f8f8;">
<pre style="line-height: 125%;"><code><span style="color: #008000; font-weight: bold;">class</span> <span style="color: #00f; font-weight: bold;">person</span> {
<span style="color: #008000; font-weight: bold;">public</span><span style="color: #666;">:</span>
    person(std<span style="color: #666;">::</span>string n, <span style="color: #b00040;">int</span> a) <span style="color: #666;">:</span> name(n), age(a) {}
<span style="color: #008000; font-weight: bold;">private</span><span style="color: #666;">:</span>
    std<span style="color: #666;">::</span>string name;
    <span style="color: #b00040;">int</span> age;
    std<span style="color: #666;">::</span>vector<span style="color: #666;">&lt;</span>hobby<span style="color: #666;">&gt;</span> hobbies;
    std<span style="color: #666;">::</span>optional<span style="color: #666;">&lt;</span><span style="color: #b00040;">uint64_t</span><span style="color: #666;">&gt;</span> salary;
};
</code></pre>
</div>
<p>To compare two <code>person</code> objects, we need to check if their names, ages, hobbies, and salaries match. Hobbies are a vector of <code>hobby</code> objects, each with a name. Salaries are optional. Manually writing <code>operator==</code> for this would involve checking each field, and if <code>hobby</code> changes, you would have to update it.</p>
<p>Without reflection, you would have to write something ugly of the sort if <a href="https://en.cppreference.com/w/cpp/language/default_comparisons.html">you don&#8217;t want to rely on C++ default comparisons</a>.</p>
<div class="highlight" style="background: #f8f8f8;">
<pre style="line-height: 125%;"><code><span style="color: #b00040;">bool</span> <span style="color: #008000; font-weight: bold;">operator</span><span style="color: #666;">==</span>(<span style="color: #008000; font-weight: bold;">const</span> person<span style="color: #666;">&amp;</span> a, <span style="color: #008000; font-weight: bold;">const</span> person<span style="color: #666;">&amp;</span> b) {
    <span style="color: #008000; font-weight: bold;">return</span> a.name <span style="color: #666;">==</span> b.name <span style="color: #666;">&amp;&amp;</span> a.age <span style="color: #666;">==</span> b.age <span style="color: #666;">&amp;&amp;</span> a.hobbies <span style="color: #666;">==</span> b.hobbies <span style="color: #666;">&amp;&amp;</span> a.salary <span style="color: #666;">==</span> b.salary;
}
</code></pre>
</div>
<p>But this assumes <code>hobby</code> has <code>operator==</code>, and <code>std::vector&lt;hobby&gt;</code> has it only if <code>hobby</code> does. If <code>hobby</code> lacks <code>==</code>, compilation fails. Moreover, for large classes, it becomes annoying.</p>
<p>Instead, I wrote a small example that fully automates the process. You just add an operator overload.</p>
<div class="highlight" style="background: #f8f8f8;">
<pre style="line-height: 125%;"><code><span style="color: #b00040;">bool</span> <span style="color: #008000; font-weight: bold;">operator</span><span style="color: #666;">==</span>(<span style="color: #008000; font-weight: bold;">const</span> person<span style="color: #666;">&amp;</span> a, <span style="color: #008000; font-weight: bold;">const</span> person<span style="color: #666;">&amp;</span> b) {
    <span style="color: #008000; font-weight: bold;">return</span> deep_equal<span style="color: #666;">::</span>compare(a, b);
}
</code></pre>
</div>
<p>The trick is that C++26 allows you to query a type&#8217;s members at compile time and iterate over them. The function <code>std::meta::nonstatic_data_members_of</code>, which gives us a list of members.</p>
<p>My complete implementation is less than a hundred lines of code, and it comes down to the following lines of code.</p>
<div class="highlight" style="background: #f8f8f8;">
<pre style="line-height: 125%;"><code><span style="color: #b00040;">bool</span> <span style="color: #00f;">compare_same</span>(<span style="color: #008000; font-weight: bold;">const</span> T<span style="color: #666;">&amp;</span> a, <span style="color: #008000; font-weight: bold;">const</span> T<span style="color: #666;">&amp;</span> b) {
    <span style="color: #008000; font-weight: bold;">template</span> <span style="color: #008000; font-weight: bold;">for</span> (<span style="color: #008000; font-weight: bold;">constexpr</span> <span style="color: #008000; font-weight: bold;">auto</span> mem <span style="color: #666;">:</span> std<span style="color: #666;">::</span>define_static_array(std<span style="color: #666;">::</span>meta<span style="color: #666;">::</span>nonstatic_data_members_of(<span style="color: #666;">^^</span>T, std<span style="color: #666;">::</span>meta<span style="color: #666;">::</span>access_context<span style="color: #666;">::</span>unchecked()))) {
        <span style="color: #008000; font-weight: bold;">if</span> (<span style="color: #666;">!</span>compare_same(a.[<span style="color: #666;">:</span>mem<span style="color: #666;">:</span>], b.[<span style="color: #666;">:</span>mem<span style="color: #666;">:</span>])) {
                <span style="color: #008000; font-weight: bold;">return</span> <span style="color: #008000;">false</span>;
        }
    }
    <span style="color: #008000; font-weight: bold;">return</span> <span style="color: #008000;">true</span>;
}
</code></pre>
</div>
<p>It relies on the reflection operator <code>^^T</code>, which produces a <code>std::meta::info</code> value representing the type itself. Inside the function body, <code>std::meta::nonstatic_data_members_of(^^T, ...)</code> queries all non-static data members of <code>T</code>, returning a vector of reflection values in declaration order. The <code>unchecked</code> access context deliberately bypasses visibility rules, allowing comparison of private and protected members. This reflection vector is then wrapped in <code>std::define_static_array</code>, which materializes it into a static array, making it iterable in a compile-time loop. The <code>template for</code> loop iterates over each member reflection <code>mem</code> at compile time. For each, the splice expression <code>a.[:mem:]</code> directly accesses the corresponding member. The <code>[: :]</code> is more or less the inverse of the reflection operator (<code>^^T</code>): think of it as a going into a meta universe with <code>^^</code> and coming back into the standard C++ universe with <code>[: :]</code>.</p>
<p>The approach ensures zero runtime overhead, as all reflection, splicing, and looping are resolved during compilation. As a programmer, it gives you full control on how you want to implement the comparison in your application. <a href="https://github.com/lemire/Code-used-on-Daniel-Lemire-s-blog/tree/master/2025/11/09">I posted a full demonstration</a>.</p>
]]></content:encoded>
					
					<wfw:commentRss>https://lemire.me/blog/2025/11/09/automated-equality-checks-in-c-with-reflection-c26/feed/</wfw:commentRss>
			<slash:comments>7</slash:comments>
		
		
			</item>
		<item>
		<title>As smart as a PhD</title>
		<link>https://lemire.me/blog/2025/10/30/as-smart-as-a-phd/</link>
					<comments>https://lemire.me/blog/2025/10/30/as-smart-as-a-phd/#comments</comments>
		
		<dc:creator><![CDATA[Daniel Lemire]]></dc:creator>
		<pubDate>Thu, 30 Oct 2025 22:54:44 +0000</pubDate>
				<category><![CDATA[]]></category>
		<guid isPermaLink="false">https://lemire.me/blog/?p=22293</guid>

					<description><![CDATA[<img width="150" height="150" src="https://lemire.me/blog/wp-content/uploads/2025/10/G4gzGEFX0AA9hJO-150x150.jpeg" class="webfeedsFeaturedVisual wp-post-image" alt="" style="display: block; margin-bottom: 5px; clear:both;max-width: 100%;" link_thumbnail="" decoding="async" loading="lazy" />There is sometimes confusion about what a PhD is. The main signal that you should derive from the fact that someone has a PhD is that they are well suited to the university campus environment. Maybe people who complete a PhD are especially thorough and finish their work to perfection? In computer science, academic projects &#8230; <a href="https://lemire.me/blog/2025/10/30/as-smart-as-a-phd/" class="more-link">Continue reading <span class="screen-reader-text">As smart as a PhD</span></a>]]></description>
										<content:encoded><![CDATA[<img width="150" height="150" src="https://lemire.me/blog/wp-content/uploads/2025/10/G4gzGEFX0AA9hJO-150x150.jpeg" class="webfeedsFeaturedVisual wp-post-image" alt="" style="display: block; margin-bottom: 5px; clear:both;max-width: 100%;" link_thumbnail="" decoding="async" loading="lazy" /><p>There is sometimes confusion about what a PhD is.</p>
<p>The main signal that you should derive from the fact that someone has a PhD is that they are well suited to the university campus environment.</p>
<p>Maybe people who complete a PhD are especially thorough and finish their work to perfection? In computer science, academic projects have a reputation for being of relative quality. Stonebraker, a famous computer scientist, attributes part of his success to his dedication to finishing up the work:</p>
<blockquote><p>The smartest things we ever did, was to then put in the effort to make it really work. (Stonebraker, 2014)</p></blockquote>
<p>Did you know that a lot of people have a PhD?</p>
<p>Over 3% of the population in a country like Switzerland has a PhD. Hence, in a city of 1 million people, you may have 30,000 people with a PhD. In Germany, that would be 15,000 people.</p>
<p>Do PhDs lead to great jobs and higher incomes? Not really. If you do get your PhD and secure a good job and keep it for a long time (so no early retirement), then you can do a bit better than the <em>average person</em> who stopped with a professional degree. But even that is misleading because we don&#8217;t know how the person smart enough to outcompete 100 other PhDs for a prestigious job would have done had they not gone for the PhD.</p>
<p>You know, Joe, who has a PhD and has become a full professor at Prestigious University&#8230; well&#8230; it is likely that Joe has been working week-ends and nights for years. Joe is excessively well connected. Joe was always smarter than anyone else in his classes. Joe can sit down and write a great 20-page scientific essay without ChatGPT and without much effort. Joe can navigate politics better than most.</p>
<p>Sometimes young people think&#8230; well, it is terrible to be job hunting at 22 without experience. Right, but try job hunting at 30 with a PhD and no actual job experience. It is worse! Much worse!</p>
<p>Given these facts, do you really want to be &#8216;as smart as a PhD&#8217;?</p>
<blockquote><p>« In the short run, pursuing a PhD entails substantial opportunity costs. Early-career earnings for PhD graduates are significantly lower than those of individuals with master’s or professional degrees. (&#8230;) Over the lifecycle, earnings do eventually recover but only under specific conditions. The most favourable long-run outcomes are concentrated among those who secure academic employment and remain in full-time work late into life. (&#8230;) However, the structure of this system increasingly resembles a tournament: the payoﬀ remains high for those who reach the top, but the odds of doing so have declined. Our analysis documents that the economic outcomes of recent PhD graduates have worsened over time. The bottom of the earnings distribution has grown more populated, and early-career returns have declined even as aggregate statistics appear stable due to rising returns among older cohorts. » (Benjamin et al., 2025)</p></blockquote>
<p>&nbsp;</p>
]]></content:encoded>
					
					<wfw:commentRss>https://lemire.me/blog/2025/10/30/as-smart-as-a-phd/feed/</wfw:commentRss>
			<slash:comments>1</slash:comments>
		
		
			</item>
		<item>
		<title>Flame Graphs in Go</title>
		<link>https://lemire.me/blog/2025/10/26/flame-graphs-in-go/</link>
					<comments>https://lemire.me/blog/2025/10/26/flame-graphs-in-go/#respond</comments>
		
		<dc:creator><![CDATA[Daniel Lemire]]></dc:creator>
		<pubDate>Sun, 26 Oct 2025 20:13:34 +0000</pubDate>
				<category><![CDATA[]]></category>
		<guid isPermaLink="false">https://lemire.me/blog/?p=22281</guid>

					<description><![CDATA[<img width="150" height="150" src="https://lemire.me/blog/wp-content/uploads/2025/10/Capture-decran-le-2025-10-26-a-16.14.04-150x150.png" class="webfeedsFeaturedVisual wp-post-image" alt="" style="display: block; margin-bottom: 5px; clear:both;max-width: 100%;" link_thumbnail="" decoding="async" loading="lazy" />The hardest problem in software performance is often to understand your code and why it might be slow. One approach to this problem is called profiling. Profiling tries to count the time spent in the various functions of your program. It can be difficult to understand the result of profiling. Furthermore, it is more complicated &#8230; <a href="https://lemire.me/blog/2025/10/26/flame-graphs-in-go/" class="more-link">Continue reading <span class="screen-reader-text">Flame Graphs in Go</span></a>]]></description>
										<content:encoded><![CDATA[<img width="150" height="150" src="https://lemire.me/blog/wp-content/uploads/2025/10/Capture-decran-le-2025-10-26-a-16.14.04-150x150.png" class="webfeedsFeaturedVisual wp-post-image" alt="" style="display: block; margin-bottom: 5px; clear:both;max-width: 100%;" link_thumbnail="" decoding="async" loading="lazy" /><p>The hardest problem in software performance is often to understand your code and why it might be slow. One approach to this problem is called profiling. Profiling tries to count the time spent in the various functions of your program.</p>
<p>It can be difficult to understand the result of profiling. Furthermore, it is more complicated than it seems. For example, how do you count the time spent in a function A that calls another function B. Does the time spent in function B count for the time spent in function A? Suppose function A executes code for 2 seconds, then calls function B which takes 3 seconds to run, and finally A continues with 1 additional second. The total time for A would be 6 seconds if everything were included, but that does not accurately reflect where the time is truly spent. We might prefer the exclusive or flat time: the time spent only in the body of the function itself, without counting calls to other functions. In our example, for A, that would be 2 + 1 = 3 seconds (the time before and after the call to B). For B, it would be 3 seconds. The cumulative time is the total time spent in the function and all the functions it calls, recursively. For A, that would include the 3 own seconds plus the 3 seconds of B, totaling 6 seconds. For B, it would just be its 3 seconds, unless it calls other functions.</p>
<p>Go has builtin support for profiling. It requires two steps.</p>
<ol>
<li>Use <code>runtime/pprof</code> to collect profiles.</li>
<li>Use <code>go tool pprof</code> to review the data.</li>
</ol>
<p>Let us consider a concrete example. We have a benchmark function parsing repeatedly strings as floating-point numbers.</p>
<div class="highlight" style="background: #f8f8f8;">
<pre style="line-height: 125%;"><code><span style="color: #008000; font-weight: bold;">func</span> BenchmarkParseFloat(b <span style="color: #666;">*</span>testing.B) {
    <span style="color: #008000; font-weight: bold;">for</span> i <span style="color: #666;">:=</span> <span style="color: #666;">0</span>; i &lt; b.N; i<span style="color: #666;">++</span> {
        idx <span style="color: #666;">:=</span> i <span style="color: #666;">%</span> <span style="color: #008000;">len</span>(floatStrings)
        _, _ = strconv.ParseFloat(floatStrings[idx], <span style="color: #666;">64</span>)
    }
}
</code></pre>
</div>
<p>In effect, we want to profile the <code>strconv.ParseFloat</code> function to see where it is spending its time. We run:</p>
<div class="highlight" style="background: #f8f8f8;">
<pre style="line-height: 125%;"><code><span style="color: #008000; font-weight: bold;">go</span> test <span style="color: #666;">-</span>bench<span style="color: #666;">=</span>. <span style="color: #666;">-</span>cpuprofile<span style="color: #666;">=</span>cpu.prof    
</code></pre>
</div>
<p>Notice the <code>-cpuprofile=cpu.prof</code> flag which instructs go to dump its profiling data to the file <code>cpu.prof</code>. Go profiles the code using a sampling-based approach, where it periodically interrupts the execution to capture stack traces of running goroutines. It has low overhead—only a small fraction of execution time is spent on sampling—and it works across multiple goroutines and threads.</p>
<p>If you are trying to profile a long-running application, like a server, you can use the net/http/pprof package to request profiling data dynamically using network requests. However, it requires modifying your application accordingly.</p>
<p>We can call <code>go tool pprof</code> on the <code>cpu.prof</code> to examine its result. The simplest flag is <code>-text</code>, it prints out a summary in the console.</p>
<div class="highlight" style="background: #f8f8f8;">
<pre style="line-height: 125%;"><code>go <span style="color: #008000; font-weight: bold;">tool</span> pprof <span style="color: #666;">-</span>text cpu<span style="color: #666;">.</span>prof
</code></pre>
</div>
<p>You might see the following output.</p>
<div class="highlight" style="background: #f8f8f8;">
<pre style="line-height: 125%;"><code>Time<span style="color: #666;">:</span> <span style="color: #666;">2025-10-26</span> <span style="color: #666;">14:35:03</span> EDT
Duration<span style="color: #666;">:</span> <span style="color: #666;">1.31</span>s<span style="color: #666;">,</span> Total samples <span style="color: #666;">=</span> <span style="color: #666;">990</span>ms <span style="color: #666;">(75.45%)</span>
Showing nodes accounting <span style="color: #008000; font-weight: bold;">for</span> <span style="color: #666;">990</span>ms<span style="color: #666;">,</span> <span style="color: #666;">100%</span> of <span style="color: #666;">990</span>ms total
      flat  flat<span style="color: #666;">%</span>   sum<span style="color: #666;">%</span>        cum   cum<span style="color: #666;">%</span>
     <span style="color: #666;">830</span>ms <span style="color: #666;">83.84%</span> <span style="color: #666;">83.84%</span>      <span style="color: #666;">830</span>ms <span style="color: #666;">83.84%</span>  strconv<span style="color: #666;">.</span><span style="color: #687822;">readFloat</span>
      <span style="color: #666;">20</span>ms  <span style="color: #666;">2.02%</span> <span style="color: #666;">92.93%</span>      <span style="color: #666;">900</span>ms <span style="color: #666;">90.91%</span>  strconv<span style="color: #666;">.</span><span style="color: #687822;">atof64</span>
      <span style="color: #666;">20</span>ms  <span style="color: #666;">2.02%</span> <span style="color: #666;">94.95%</span>       <span style="color: #666;">20</span>ms  <span style="color: #666;">2.02%</span>  strconv<span style="color: #666;">.</span><span style="color: #687822;">eiselLemire64</span>
      <span style="color: #666;">20</span>ms  <span style="color: #666;">2.02%</span> <span style="color: #666;">96.97%</span>       <span style="color: #666;">20</span>ms  <span style="color: #666;">2.02%</span>  strconv<span style="color: #666;">.</span><span style="color: #687822;">special</span>
      <span style="color: #666;">10</span>ms  <span style="color: #666;">1.01%</span> <span style="color: #666;">98.99%</span>      <span style="color: #666;">910</span>ms <span style="color: #666;">91.92%</span>  strconv<span style="color: #666;">.</span><span style="color: #687822;">ParseFloat</span>
      <span style="color: #666;">10</span>ms  <span style="color: #666;">1.01%</span>   <span style="color: #666;">100%</span>       <span style="color: #666;">10</span>ms  <span style="color: #666;">1.01%</span>  strconv<span style="color: #666;">.</span><span style="color: #687822;">atof64exact</span>
</code></pre>
</div>
<p>It can be difficult to interpret this result. Thus we want to visualize the result. My favorite technique is the flame graph. It is a useful technique invented by Brendan Gregg.</p>
<p>A flamegraph is a stacked bar chart where each bar represents a function call, and the width of the bar indicates the estimated time spent in that function. The time can be replaced by other metrics such as CPU instructions, memory allocations or any other metrics that can be matched to a software function. The y-axis represents the call stack depth, with the top being the root of the stack. Colors can be used to make it more elegant.</p>
<p>The name flame graph comes from the shape that often resembles flames, with wider bases indicating more time-consuming functions and narrower tops showing less frequent paths. There are many variations such as flame charts where the x-axis represents time progression, and y-axis shows call stack at each moment. But flame graphs are more common.</p>
<p>Given that you have generated the file <code>cpu.prof</code>, you can visualize it by first starting a web server:</p>
<div class="highlight" style="background: #f8f8f8;">
<pre style="line-height: 125%;"><code>go <span style="color: #008000; font-weight: bold;">tool</span> pprof <span style="color: #666;">-</span>http<span style="color: #666;">=</span>:<span style="color: #666;">8123</span> cpu<span style="color: #666;">.</span>prof
</code></pre>
</div>
<p>It assumes that port 8123 is free on your system: you can change the port 8123 if needed. Open your browser at the URL <code>http://localhost:8123/ui/flamegraph</code>.</p>
<p>You should see a graph:</p>
<p><a href="http://lemire.me/blog/wp-content/uploads/2025/10/flamegraph.png"><img loading="lazy" decoding="async" class="alignnone size-large wp-image-22282" src="http://lemire.me/blog/wp-content/uploads/2025/10/flamegraph-1024x198.png" alt="" width="660" height="128" srcset="https://lemire.me/blog/wp-content/uploads/2025/10/flamegraph-1024x198.png 1024w, https://lemire.me/blog/wp-content/uploads/2025/10/flamegraph-300x58.png 300w, https://lemire.me/blog/wp-content/uploads/2025/10/flamegraph-768x149.png 768w, https://lemire.me/blog/wp-content/uploads/2025/10/flamegraph-1536x298.png 1536w, https://lemire.me/blog/wp-content/uploads/2025/10/flamegraph.png 1744w" sizes="auto, (max-width: 660px) 100vw, 660px" /></a></p>
<p>Our first step is to narrow our focus on the function we care about. For more clarity, double click on <code>strconv.ParseFloat</code>, or right click on it and select focus. The bulk of the time (over 80%) of the <code>ParseFloat</code> is spent in the <code>readFloat</code> function. This function converts the string into a mantissa and an exponent. There are other functions being called such as <code>strconv.special</code>, <code>strconv.atof64exact</code>, and <code>strconv.eiselLemire64</code>. However, they account for little time in this specific benchmark.</p>
<p>Thus, if we wanted to improve the performance of <code>strconv.ParseFloat</code> function, we might want to focus on optimizing the <code>readFloat</code> function. Furthermore, at least given our specific test, optimizing the <code>strconv.special</code>, <code>strconv.atof64exact</code>, and <code>strconv.eiselLemire64</code> functions is unlikely to be productive.</p>
<p>In our specific example, the profiling gives us a clear answer. However, it is not always so. Profiling can be misleading. For one thing, profiling is typically statistical. If your program is short-lived or has uneven load, it might provide incorrect data. Furthermore, inefficient work spread out over many functions could not show up in a flame graph. Thus you should always interpret profiling results with care.</p>
]]></content:encoded>
					
					<wfw:commentRss>https://lemire.me/blog/2025/10/26/flame-graphs-in-go/feed/</wfw:commentRss>
			<slash:comments>0</slash:comments>
		
		
			</item>
		<item>
		<title>Thinking Clearly</title>
		<link>https://lemire.me/blog/2025/10/26/thinking-clearly/</link>
					<comments>https://lemire.me/blog/2025/10/26/thinking-clearly/#respond</comments>
		
		<dc:creator><![CDATA[Daniel Lemire]]></dc:creator>
		<pubDate>Sun, 26 Oct 2025 14:59:29 +0000</pubDate>
				<category><![CDATA[]]></category>
		<guid isPermaLink="false">https://lemire.me/blog/?p=22278</guid>

					<description><![CDATA[<img width="150" height="150" src="https://lemire.me/blog/wp-content/uploads/2025/10/0e2e1703-f9c7-4d54-8124-0863262d108d-150x150.jpg" class="webfeedsFeaturedVisual wp-post-image" alt="" style="display: block; margin-bottom: 5px; clear:both;max-width: 100%;" link_thumbnail="" decoding="async" loading="lazy" />If you have ever met me in person, you know that when you share an idea with me, I simplify it to its core and reflect it back to you, focusing on its essential parts. I dissect each statement for precision. “What do you mean by this word?” I have two decades of experience working &#8230; <a href="https://lemire.me/blog/2025/10/26/thinking-clearly/" class="more-link">Continue reading <span class="screen-reader-text">Thinking Clearly</span></a>]]></description>
										<content:encoded><![CDATA[<img width="150" height="150" src="https://lemire.me/blog/wp-content/uploads/2025/10/0e2e1703-f9c7-4d54-8124-0863262d108d-150x150.jpg" class="webfeedsFeaturedVisual wp-post-image" alt="" style="display: block; margin-bottom: 5px; clear:both;max-width: 100%;" link_thumbnail="" decoding="async" loading="lazy" /><p>If you have ever met me in person, you know that when you share an idea with me, I simplify it to its core and reflect it back to you, focusing on its essential parts. I dissect each statement for precision. “What do you mean by this word?”</p>
<p>I have two decades of experience working with academics who overcomplicate everything. Humans are easily confused. A project proposal with ten moving parts and five objectives is overwhelming. Most people cannot think it through critically, which can lead to disaster.</p>
<p>By instinct, I simplify problems as my first step, reducing them to their “minimum viable product,” as they say in Silicon Valley.</p>
<p>Some people avoid simplicity to sound smarter. They won&#8217;t admit it, maybe not even to themselves, but that&#8217;s what they are thinking: “Oh no! I’m not doing this simple thing; my work is much more sophisticated.”</p>
<p>That’s a terrible idea. Even simple projects become challenging if you are ambitious. There is no need to complexify them. For example, seven years ago, we aimed to create a JSON parser faster than anything on the market—a simple idea. A senior colleague in computer science saw me at a campus coffee shop while I was working. He asked what I was busy doing&#8230; When I told him, “We’re writing a fast JSON parser,” he laughed. He would later admit that he thought I was joking: how could I work on something so mundane. I wasn’t kidding. The result was simdjson, a JSON parser four times faster than anything else at the time. By keeping our project conceptually simple, we made success easier.</p>
<p>Complexity is a burden, not a badge of pride.</p>
<p>Clear thinking demands precision.</p>
<p>People use emotionally charged words like “safe.” “My car is safe.” “My software is safe.” What does that mean? Define it precisely.</p>
<p>A graduate student of mine recently proposed reducing the cognitive load of agentic AI on developers. “Cognitive load” sounds great—thousands of papers discuss it—but what does it mean? How do measure it?</p>
<p>Take “AI” as another example. Nobody knows what AI is. Is a Google search AI? Is image search AI? Is an expert system AI? Or do you mean large language models? Clarify what you mean.</p>
<p>Too often, we accidentally hide behind overly abstract language. Not only does this harm how we think, but it also harms how we are perceived. People who avoid jargon are viewed as more honest, trustworthy and benevolent (Fick et al. 2025).</p>
<p>Your motivation should be clear from the start. Here is an example. I am often asked: “Should I go to graduate school?” What’s the motivation? Often, it’s to get a great job. Years ago, I had this conversation with a research assistant:</p>
<ul>
<li>Should I get a Master’s degree? Everyone with good grades does it.</li>
<li>What is your objective?</li>
<li>I want a good, well-paying job.</li>
</ul>
<p>Given his exceptional technical skills, I told him he could already land that job soon. He did, and now he’s in a leadership role at one of Montreal’s top companies. He is outearning me by a wide margin, no doubt.</p>
<p>People often lose sight of their motivation and follow trends. Reconnect with your motivation and adjust your means accordingly.</p>
<p>To sum up, my advice for clear thinking:</p>
<ul>
<li>Simplify projects to their essentials early and often.</li>
<li>Use precise language. Avoid jargon.</li>
<li>Focus on your motivation first, then choose the appropriate means.</li>
</ul>
]]></content:encoded>
					
					<wfw:commentRss>https://lemire.me/blog/2025/10/26/thinking-clearly/feed/</wfw:commentRss>
			<slash:comments>0</slash:comments>
		
		
			</item>
		<item>
		<title>Speeding up C++ functions with a thread_local cache</title>
		<link>https://lemire.me/blog/2025/10/19/speeding-up-c-functions-with-a-thread_local-cache/</link>
					<comments>https://lemire.me/blog/2025/10/19/speeding-up-c-functions-with-a-thread_local-cache/#comments</comments>
		
		<dc:creator><![CDATA[Daniel Lemire]]></dc:creator>
		<pubDate>Sun, 19 Oct 2025 21:28:24 +0000</pubDate>
				<category><![CDATA[]]></category>
		<guid isPermaLink="false">https://lemire.me/blog/?p=22238</guid>

					<description><![CDATA[<img width="150" height="150" src="https://lemire.me/blog/wp-content/uploads/2025/10/9af1530f-1aca-4fbc-8796-ff74a1a8b544-150x150.jpg" class="webfeedsFeaturedVisual wp-post-image" alt="" style="display: block; margin-bottom: 5px; clear:both;max-width: 100%;" link_thumbnail="" decoding="async" loading="lazy" />In large code bases, we are often stuck with unpleasant designs that are harming our performance. We might be looking for a non-intrusive method to improve the performance. For example, you may not want to change the function signatures. Let us consider a concrete example. Maybe someone designed the programming interface so that you have &#8230; <a href="https://lemire.me/blog/2025/10/19/speeding-up-c-functions-with-a-thread_local-cache/" class="more-link">Continue reading <span class="screen-reader-text">Speeding up C++ functions with a thread_local cache</span></a>]]></description>
										<content:encoded><![CDATA[<img width="150" height="150" src="https://lemire.me/blog/wp-content/uploads/2025/10/9af1530f-1aca-4fbc-8796-ff74a1a8b544-150x150.jpg" class="webfeedsFeaturedVisual wp-post-image" alt="" style="display: block; margin-bottom: 5px; clear:both;max-width: 100%;" link_thumbnail="" decoding="async" loading="lazy" /><p>In large code bases, we are often stuck with unpleasant designs that are harming our performance. We might be looking for a non-intrusive method to improve the performance. For example, you may not want to change the function signatures.</p>
<p>Let us consider a concrete example. Maybe someone designed the programming interface so that you have to access the values from a map using an index. They may have code like so:</p>
<div class="highlight" style="background: #f8f8f8;">
<pre style="line-height: 125%;"><code><span style="color: #008000; font-weight: bold;">auto</span> at_index(map_like <span style="color: #008000; font-weight: bold;">auto</span><span style="color: #666;">&amp;</span> index_map, <span style="color: #b00040;">size_t</span> idx) {
  <span style="color: #b00040;">size_t</span> count <span style="color: #666;">=</span> <span style="color: #666;">0</span>;
  <span style="color: #008000; font-weight: bold;">for</span> (<span style="color: #008000; font-weight: bold;">const</span> <span style="color: #008000; font-weight: bold;">auto</span> <span style="color: #666;">&amp;</span>[key, value] <span style="color: #666;">:</span> index_map) {
    <span style="color: #008000; font-weight: bold;">if</span>(count <span style="color: #666;">==</span> idx)
      <span style="color: #008000; font-weight: bold;">return</span> value;
    count<span style="color: #666;">++</span>;
  }
  <span style="color: #008000; font-weight: bold;">throw</span> std<span style="color: #666;">::</span>out_of_range(<span style="color: #ba2121;">"Index out of range"</span>);
}
</code></pre>
</div>
<p>This code goes through the keys of the map idx times. Typically, it implies some kind of linked list traversal. If you are stuck with this interface, going through the values might imply repeated calls to the at_index function:</p>
<div class="highlight" style="background: #f8f8f8;">
<pre style="line-height: 125%;"><code>for (size_t i = 0; i &lt; input_size; ++i) {
    at_index(index_map, i);
}
</code></pre>
</div>
<p>If you took any kind of computer science, you will immediately see the problem: my code has quadratic complexity. If you double the map size, you may quadruple the running time. It is likely fine if you have 2 or 4 elements in the map, but definitely not fine if you have 400 elements.</p>
<p>The proper solution is to avoid such a design. If you can have access directly to the map, you can just iterate through it directly:</p>
<div class="highlight" style="background: #f8f8f8;">
<pre style="line-height: 125%;"><code><span style="color: #008000; font-weight: bold;">for</span> (<span style="color: #008000; font-weight: bold;">auto</span><span style="color: #666;">&amp;</span> [key, value] <span style="color: #666;">:</span> index_map) {
   sum <span style="color: #666;">+=</span> value;
}
</code></pre>
</div>
<p>But what if you are stuck? Assume that the map is never modified in practice. Then you can use a static or thread_local cache. The key insight is to keep in cache your location in the map, and start from there on the next query. If the user is typically querying in sequence, then your cache should speed up tremendously the function.</p>
<div class="highlight" style="background: #f8f8f8;">
<pre style="line-height: 125%;"><code><span style="color: #008000; font-weight: bold;">auto</span> 
at_index_thread_local_cache(map_like <span style="color: #008000; font-weight: bold;">auto</span><span style="color: #666;">&amp;</span> index_map,
    <span style="color: #b00040;">size_t</span> idx) {
  <span style="color: #008000; font-weight: bold;">using</span> iterator <span style="color: #666;">=</span> <span style="color: #008000; font-weight: bold;">decltype</span>(index_map.begin());
  <span style="color: #008000; font-weight: bold;">struct</span> <span style="color: #00f; font-weight: bold;">Cache</span> {
    iterator last_iterator;
    <span style="color: #b00040;">size_t</span> last_index <span style="color: #666;">=</span> <span style="color: #666;">-1</span>;
    <span style="color: #008000; font-weight: bold;">decltype</span>(<span style="color: #666;">&amp;</span>index_map) map_ptr <span style="color: #666;">=</span> <span style="color: #008000; font-weight: bold;">nullptr</span>;
  };
  <span style="color: #008000; font-weight: bold;">thread_local</span> Cache cache;
  <span style="color: #008000; font-weight: bold;">if</span> (cache.map_ptr <span style="color: #666;">==</span> <span style="color: #666;">&amp;</span>index_map
      <span style="color: #666;">&amp;&amp;</span> idx <span style="color: #666;">==</span> cache.last_index <span style="color: #666;">+</span> <span style="color: #666;">1</span>
      <span style="color: #666;">&amp;&amp;</span> cache.last_iterator <span style="color: #666;">!=</span> index_map.end()) {
    cache.last_iterator<span style="color: #666;">++</span>;
    cache.last_index <span style="color: #666;">=</span> idx;
    <span style="color: #008000; font-weight: bold;">if</span> (cache.last_iterator <span style="color: #666;">!=</span> index_map.end()) {
      <span style="color: #008000; font-weight: bold;">return</span> cache.last_iterator<span style="color: #666;">-&gt;</span>second;
    } <span style="color: #008000; font-weight: bold;">else</span> {
      <span style="color: #008000; font-weight: bold;">throw</span> std<span style="color: #666;">::</span>out_of_range(<span style="color: #ba2121;">"Index out of range"</span>);
    }
  } <span style="color: #008000; font-weight: bold;">else</span> {
    cache.last_iterator <span style="color: #666;">=</span> index_map.begin();
    cache.last_index <span style="color: #666;">=</span> <span style="color: #666;">-1</span>;
    cache.map_ptr <span style="color: #666;">=</span> <span style="color: #666;">&amp;</span>index_map;
    <span style="color: #b00040;">size_t</span> count <span style="color: #666;">=</span> <span style="color: #666;">0</span>;
    <span style="color: #008000; font-weight: bold;">for</span> (<span style="color: #008000; font-weight: bold;">auto</span> it <span style="color: #666;">=</span> index_map.begin();
        it <span style="color: #666;">!=</span> index_map.end(); <span style="color: #666;">++</span>it) {
      <span style="color: #008000; font-weight: bold;">if</span> (count <span style="color: #666;">==</span> idx) {
        cache.last_iterator <span style="color: #666;">=</span> it;
        cache.last_index <span style="color: #666;">=</span> idx;
        <span style="color: #008000; font-weight: bold;">return</span> it<span style="color: #666;">-&gt;</span>second;
      }
      count<span style="color: #666;">++</span>;
    }
    <span style="color: #008000; font-weight: bold;">throw</span> std<span style="color: #666;">::</span>out_of_range(<span style="color: #ba2121;">"Index out of range"</span>);
  }
}
</code></pre>
</div>
<p>In C++, a thread_local variable is such that there is just one instance of the variable (shared by all function calls) within the same thread. If you wish to have just one instance of the variable for the entire program, you can use static instead, but thread_local is the best choice in our case. You might be worried about the performance implication of a thread_local variable, but it is generally quite cheap: we only add a few instructions when accessing it or modifying it.</p>
<p>Our cache variable remembers the last accessed iterator and index per thread. If the next index is requested, we just increment the iterator and return. If the access is non-sequential or the first call, it falls back to a linear scan from the beginning, rebuilding the cache along the way.</p>
<p>The code is more complicated, and if you are not accessing the key in sequence, it might be slower. However, the performance gains can be enormous. <a href="https://github.com/lemire/Code-used-on-Daniel-Lemire-s-blog/tree/master/2025/10/19/">I wrote a benchmark to test it out with maps containing 400 elements</a>.</p>
<table>
<thead>
<tr>
<th>Method</th>
<th>ns/key</th>
<th>instructions/key</th>
</tr>
</thead>
<tbody>
<tr>
<td>original</td>
<td>300</td>
<td>2000</td>
</tr>
<tr>
<td>cache</td>
<td>2</td>
<td>20</td>
</tr>
</tbody>
</table>
<p>In my case, the cache multiplied the performance by 150. Not bad.</p>
]]></content:encoded>
					
					<wfw:commentRss>https://lemire.me/blog/2025/10/19/speeding-up-c-functions-with-a-thread_local-cache/feed/</wfw:commentRss>
			<slash:comments>5</slash:comments>
		
		
			</item>
	</channel>
</rss>
