<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	
  <title>The Old New Thing - Dreamcatcher Edition</title>
	<atom:link href="https://devblogs.microsoft.com/oldnewthing/feed" rel="self" type="application/rss+xml" />
	<link>https://devblogs.microsoft.com/oldnewthing</link>
	<description>Practical development throughout the evolution of Windows.</description>
	<lastBuildDate>Tue, 16 Jun 2026 03:29:26 +0000</lastBuildDate>
	<language>en-US</language>
	<sy:updatePeriod>
	hourly	</sy:updatePeriod>
	<sy:updateFrequency>
	1	</sy:updateFrequency>
	

<image>
	<url>https://devblogs.microsoft.com/oldnewthing/wp-content/uploads/sites/38/2021/03/Microsoft-Favicon.png</url>
	
  <title>The Old New Thing - Dreamcatcher Edition</title>
	<link>https://devblogs.microsoft.com/oldnewthing</link>
	<width>32</width>
	<height>32</height>
</image> 
	
  <item>
    <title>The time the x86 emulator team found code so bad that they fixed it during emulation</title>
    <link>https://devblogs.microsoft.com/oldnewthing/20260615-00/?p=112419</link>
    <comments>https://devblogs.microsoft.com/oldnewthing/20260615-00/?p=112419#comments</comments>
    <dc:creator><![CDATA[Raymond Chen]]></dc:creator>
    <pubDate>Mon, 15 Jun 2026 14:00:00 +0000</pubDate>
    <category><![CDATA[Old New Thing]]></category>
    <category><![CDATA[History]]></category>
    <guid isPermaLink="false">https://devblogs.microsoft.com/oldnewthing/?p=112419</guid>
    <content:encoded><![CDATA[<p>During an exchange of war stories, a colleague of mine told one from back in the days when Windows included a processor emulator for x86-32 on systems that natively ran some other processor. (This has happened many times. And no, I don&#8217;t know which processor this particular story applied to.)</p>
<p>This particular emulator employed binary translation, generating native code to perform the equivalent operations of the original x86-32 code. This offered a significant performance improvement over emulation via interpreter. You can imagine that x86-32 is just a bytecode, and the emulator is a JIT compiler.</p>
<p>Anyway, my colleague found that there was one program that needed to allocate around 64KB of memory on the stack and initialize it. The standard way of doing this is to <a title="How do compilers ensure that large stack allocations do not skip over the guard page?" href="https://devblogs.microsoft.com/oldnewthing/20260311-00/?p=112134"> perform a stack probe to ensure that 64KB of memory is available</a>, then subtracting 65536 from the stack pointer, and then initializing the memory in a small, tight loop.</p>
<p>But using a loop to initialize the memory was too mundane for whatever compiler was used to compile this code. Instead of generating a loop to initialize each byte of the buffer, the compiler &#8220;optimized&#8221; the code by unrolling the loop into 65,536 individual &#8220;write byte to memory&#8221; instructions, each 4 bytes long.</p>
<p>All in all, it took this program 256 kilobytes of code to initialize 64 kilobytes of data.</p>
<p>This offended the team so much that they added special code to the translator to detect this horrible function and replace it with the equivalent tight loop.</p>
<p>The post <a href="https://devblogs.microsoft.com/oldnewthing/20260615-00/?p=112419">The time the x86 emulator team found code so bad that they fixed it during emulation</a> appeared first on <a href="https://devblogs.microsoft.com/oldnewthing">The Old New Thing</a>.</p>
]]></content:encoded>
    <wfw:commentRss>https://devblogs.microsoft.com/oldnewthing/20260615-00/?p=112419/feed</wfw:commentRss>
    <slash:comments>10</slash:comments>
    <image type="image/png" url="https://devblogs.microsoft.com/oldnewthing/wp-content/uploads/sites/38/2025/10/banner-oldnewthing-blue.webp"/>
  </item>
		
  <item>
    <title>How can I schedule work on a thread pool with low latency?</title>
    <link>https://devblogs.microsoft.com/oldnewthing/20260612-00/?p=112417</link>
    <comments>https://devblogs.microsoft.com/oldnewthing/20260612-00/?p=112417#comments</comments>
    <dc:creator><![CDATA[Raymond Chen]]></dc:creator>
    <pubDate>Fri, 12 Jun 2026 14:00:00 +0000</pubDate>
    <category><![CDATA[Old New Thing]]></category>
    <category><![CDATA[Code]]></category>
    <guid isPermaLink="false">https://devblogs.microsoft.com/oldnewthing/?p=112417</guid>
    <content:encoded><![CDATA[<p>A customer had a callback that was used to report data being produced by a hardware device. The rule for the callback is that it has to return quickly so that the code wouldn&#8217;t miss the next batch of data because the device itself has a very small buffer: If they spend too much time in the callback, the buffer will overflow and data will be lost.</p>
<p>To avoid clogging the receiving thread, the customer queued a work item to the thread pool to process the data that was just received. However, they found that sometimes, the work item doesn&#8217;t run immediately but rather has a 100ms latency. But their program needs to process the data within 20ms. Is there a way to set a deadline on a thread pool work item, so that the system will make sure that it runs before a certain period of time elapses?</p>
<p>As I&#8217;ve noted before, <a title="Trying to make the thread pool more responsive to a large queue of long-running work items" href="https://devblogs.microsoft.com/oldnewthing/20170623-00/?p=96455"> the thread pool is designed for throughput, not latency</a>. There is no option to set a deadline on a work item.</p>
<p>One reason why the thread pool is being slow to dispatch the work items is that there are other unrelated work items in the thread pool, and those other tasks are competing with your data processing task for the thread pool&#8217;s attention. On top of that, some of those other tasks might be long-running, which takes a thread pool thread out of commission for an extended period.</p>
<p>You can take these conflicting work items out of the picture by creating your own custom thread pool: Call <code>Create­Thread­Pool</code> and queue your work to that thread pool (by setting that thread pool in the work item&#8217;s environment). Now you won&#8217;t have any competing work items getting in front of you in the thread pool work queue because those competing work items are going to the process default thread pool and not to your private thread pool.</p>
<p>Note however that even though your work items are no longer fighting with other work items for the attention of your private thread pool, those other work items are still running on the process default thread pool, so they are still competing against your work items for CPU. But at least your work item got dispatched.</p>
<p>I&#8217;m guessing that the order in which the batches are processed is important, so you should set your private thread pool&#8217;s maximum thread count to 1 so that you don&#8217;t start processing one batch of data until you finish processing the previous batch. This effectively serializes the work items, but that&#8217;s what you want if you intend to process the batches in order.</p>
<p>In the case where you have a single-minded thread pool, you can prepare everything ahead of time so that all you have to do in the callback itself is call <code>SubmitThreadpoolWork</code> on a pre-created reusable work item.</p>
<pre>// One-time preparation
pool = CreateThreadpool();
if (!pool) ⟦ error ⟧

TP_CALLBACK_ENVIRON env;
InitializeThreadpoolEnvironment(&amp;env);
SetThreadpoolCallbackPool(&amp;env, pool);
work = CreateThreadpoolWork(ProcessData, nullptr, &amp;env);
if (!work) ⟦ error ⟧

void Callback()
{
    ⟦ add data to data queue ⟧
    SubmitThreadpoolWork(work); // request another callback
}
</pre>
<p>If you step back and look at this, you might realize that all we did was create a worker thread, but one where we delegated all the bookkeeping to the thread pool. Also, this particular customer was writing code in C#, and the BCL doesn&#8217;t have built-in support for custom thread pools.</p>
<p>So if all we have is a worker thread, maybe we can just make a worker thread. Here&#8217;s a really simple one.</p>
<pre>Queue&lt;Data&gt; queue = new Queue&lt;Data&gt;();

Data WaitForWork()
{
    while (true) {
        lock (queue) {
            if (queue.Count &gt; 0) {
                return queue.Dequeue();
            }
            Monitor.Wait(queue);
        }
    }
}

void WorkerThread()
{
    Data data;
    while ((data = WaitForWork()) != null) {
        ⟦ process the data #&amp;x27e7;
    }
}

void QueueWork(Data data)
{
    lock (queue) {
        queue.Enqueue(data);
        Monitor.Pulse(queue);
    }
}

void EndWork()
{
    QueueWork(null);
}
</pre>
<p>The worker thread waits for elements to show up in the queue, and once one appears, it dequeues it and does whatever processing you want. If the queued value is <code>null</code>, that means that the worker thread is no longer needed, and it exits.</p>
<p>You can do something similar in C++ with a <code>std::<wbr />queue</code> and a condition variable.</p>
<p>The post <a href="https://devblogs.microsoft.com/oldnewthing/20260612-00/?p=112417">How can I schedule work on a thread pool with low latency?</a> appeared first on <a href="https://devblogs.microsoft.com/oldnewthing">The Old New Thing</a>.</p>
]]></content:encoded>
    <wfw:commentRss>https://devblogs.microsoft.com/oldnewthing/20260612-00/?p=112417/feed</wfw:commentRss>
    <slash:comments>6</slash:comments>
    <image type="image/png" url="https://devblogs.microsoft.com/oldnewthing/wp-content/uploads/sites/38/2025/10/banner-oldnewthing-blue.webp"/>
  </item>
		
  <item>
    <title>Understanding the rationale behind a rule when trying to circumvent it</title>
    <link>https://devblogs.microsoft.com/oldnewthing/20260611-00/?p=112415</link>
    <comments>https://devblogs.microsoft.com/oldnewthing/20260611-00/?p=112415#comments</comments>
    <dc:creator><![CDATA[Raymond Chen]]></dc:creator>
    <pubDate>Thu, 11 Jun 2026 14:00:00 +0000</pubDate>
    <category><![CDATA[Old New Thing]]></category>
    <category><![CDATA[Code]]></category>
    <guid isPermaLink="false">https://devblogs.microsoft.com/oldnewthing/?p=112415</guid>
    <content:encoded><![CDATA[<p>In the documentation for <a href="https://learn.microsoft.com/en-us/windows-hardware/drivers/kernel/windows-kernel-mode-process-and-thread-manager#best-practices-for-implementing-process-and-thread-related-callback-functions"> best practices for implementing process and thread-related callback functions</a>, it calls out</p>
<blockquote class="q">
<ul>
<li>Keep routines short and simple.</li>
<li>Don&#8217;t make calls into a user mode service to validate the process, thread, or image.</li>
<li>Don&#8217;t make registry calls.</li>
<li>Don&#8217;t make blocking and/or Interprocess Communication (IPC) function calls.</li>
<li>Don&#8217;t synchronize with other threads because it can lead to reentrancy deadlocks.</li>
</ul>
</blockquote>
<p>So far so good. It seems that these callback functions need to operate quickly and cannot block. These are callbacks that are invoked when a process starts or exits, when a thread starts or exits, when a DLL or EXE is loaded or unloaded, and various other low-level events.</p>
<p>The various prohibitions above suggest that these callouts are called during the process creation/termination sequence, so if you take a long time to deal with them, you are slowing down the entire system. And the rather extreme requirements, like &#8220;Don&#8217;t make registry calls,&#8221; suggest that they might even be called while the system holds internal locks.</p>
<p>The list of best practices continues:</p>
<blockquote class="q">
<ul>
<li>Use System Worker Threads to queue work especially work involving:
<ul>
<li>Slow APIs or APIs that call into other process.</li>
<li>Any blocking behavior that could interrupt threads in core services.</li>
</ul>
</li>
</ul>
</blockquote>
<p>Okay, so this is providing a suggestion on how you can offload expensive work to code running outside the callback. This once again highlights that the callback itself needs to be fast with minimal blocking.</p>
<p>My colleagues in enterprise support often run into cases where the reason for a system hang is a driver violating the rule that these callbacks must return quickly. For example, a common anti-pattern is a driver whose callback starts by following the guidance above to queue work to a System Worker Thread, but then they block until the work item completes.</p>
<p>This is a case of following the rules without understanding why the rules are there.</p>
<p>The rule is that the callback needs to be fast and return quickly. The driver followed the letter of the law by delegating the work to a System Worker Thread, and there&#8217;s no rule that says &#8220;Don&#8217;t wait for work items&#8221;, so they must have figured that this gave them a loophole for executing synchronous long-running work.</p>
<p>But the rules &#8220;Don&#8217;t make blocking and/or Interprocess Communication (IPC) function calls&#8221; and &#8220;Don&#8217;t synchronize with other threads because it can lead to reentrancy deadlocks&#8221; make it clear that you shouldn&#8217;t be blocking in your callback for extended periods of time. The &#8220;Don&#8217;t&#8221;s are just calling out some common ways that your callback can block.</p>
<p>And it looks like the documentation was updated in 2020 to call out this specific case:</p>
<blockquote>
<ul>
<li>If you use System Worker Threads, don&#8217;t wait on the work to complete. Doing so defeats the purpose of queuing the work to be completed asynchronously.</li>
</ul>
</blockquote>
<p>One could argue that this rule is already covered by the &#8220;Don&#8217;t synchronize with other threads&#8221; rule, but I guess the driver vendor interpreted it as &#8220;But I&#8217;m not synchronizing with another thread. I&#8217;m synchronizing on an event!&#8221; But of course, the event is set by another thread, so you are effectively synchronizing with another thread.</p>
<p>My colleague in enterprise support describes this as the &#8220;It wasn&#8217;t me, it was my brother&#8221; excuse. You are told by your parents not to turn on the television set, so you tell your brother to do it. Technically, you didn&#8217;t turn the television set on, but in effect, you did because your brother is acting under your instructions. (This is why contracts often contain wording like &#8220;may not disclose or cause to be disclosed,&#8221; so that you can&#8217;t say &#8220;No, I totally didn&#8217;t disclose it. I gave the information to Bob, and it was Bob who disclosed it!&#8221;)</p>
<p>The documentation should open with something like this:</p>
<blockquote class="q"><p>The callback function must perform its work quickly without blocking. If you need to do complex work or synchronize with other threads or processes, do the work asynchronously, such as by using System Worker Threads.</p></blockquote>
<p>And then it can give a list of examples of things that count as blocking.</p>
<blockquote class="q"><p>Some examples of blocking that is not allowed from the callback function:</p></blockquote>
<p>And then it can follow up with additional constraints.</p>
<blockquote class="q"><p>Furthermore, the callback function may not perform any of the following operations:</p></blockquote>
<p>The post <a href="https://devblogs.microsoft.com/oldnewthing/20260611-00/?p=112415">Understanding the rationale behind a rule when trying to circumvent it</a> appeared first on <a href="https://devblogs.microsoft.com/oldnewthing">The Old New Thing</a>.</p>
]]></content:encoded>
    <wfw:commentRss>https://devblogs.microsoft.com/oldnewthing/20260611-00/?p=112415/feed</wfw:commentRss>
    <slash:comments>5</slash:comments>
    <image type="image/png" url="https://devblogs.microsoft.com/oldnewthing/wp-content/uploads/sites/38/2025/10/banner-oldnewthing-blue.webp"/>
  </item>
		
  <item>
    <title>What&#8217;s the opposite of Clip&#173;Cursor that lets me exclude the cursor from a region?</title>
    <link>https://devblogs.microsoft.com/oldnewthing/20260610-00/?p=112412</link>
    <comments>https://devblogs.microsoft.com/oldnewthing/20260610-00/?p=112412#comments</comments>
    <dc:creator><![CDATA[Raymond Chen]]></dc:creator>
    <pubDate>Wed, 10 Jun 2026 14:00:00 +0000</pubDate>
    <category><![CDATA[Old New Thing]]></category>
    <category><![CDATA[Code]]></category>
    <guid isPermaLink="false">https://devblogs.microsoft.com/oldnewthing/?p=112412</guid>
    <content:encoded><![CDATA[<p>A customer wanted to prevent the user from dragging an object into a specific region of their window. Their current implementation detects that the mouse is an in illegal location and uses <code>Set­Cursor­Pos</code> to move it to a nearby legal location. However, this creates flicker because the cursor actually does enter the illegal region and then jumps out.</p>
<p>Let&#8217;s illustrate this with <a title="The scratch program" href="https://devblogs.microsoft.com/oldnewthing/20030723-00/?p=43073"> our scratch program</a>.</p>
<pre>POINT g_pt;
const RECT g_rcExclude = { 100, 100, 200, 200 };

RECT ItemRect(POINT pt)
{
    return RECT{ pt.x - 10, pt.y - 10, pt.x + 10, pt.y + 10 };
}
</pre>
<p>The <code>g_pt</code> variable holds the location of our object, and the <code>g_rcExclude</code> is the rectangle in which the object is forbidden. The <code>ItemRect</code> function produces a bounding rectangle for our object so we can draw something there.</p>
<pre>void
PaintContent(HWND hwnd, PAINTSTRUCT* pps)
{
    FillRect(pps-&gt;hdc, &amp;g_rcExclude, (HBRUSH)(COLOR_WINDOWTEXT + 1));
    RECT rcItem = ItemRect(g_pt);
    FillRect(pps-&gt;hdc, &amp;rcItem, (HBRUSH)(COLOR_APPWORKSPACE + 1));
}
</pre>
<p>Painting our content is a straightforward matter of drawing the forbidden rectangle in the text color and drawing the object in the app workspace color.</p>
<pre>void OnMouseMove(HWND hwnd, int x, int y, UINT keyFlags)
{
    POINT ptNew = { x, y };

    if (PtInRect(&amp;g_rcExclude, ptNew)) {
        // Clamp to nearest legal position
        int leftMargin = ptNew.x - g_rcExclude.left;
        int topMargin = ptNew.y - g_rcExclude.top;
        int rightMargin = g_rcExclude.right - ptNew.x;
        int bottomMargin = g_rcExclude.bottom - ptNew.y;

        int dx, dy;
        int x, y;
        if (leftMargin &lt; rightMargin) {
            x = g_rcExclude.left;
            dx = leftMargin;
        } else {
            x = g_rcExclude.right;
            dx = rightMargin;
        }
        if (topMargin &lt; bottomMargin) {
            y = g_rcExclude.top;
            dy = topMargin;
        } else {
            y = g_rcExclude.bottom;
            dy = bottomMargin;
        }
        if (dx &lt; dy) {
            ptNew.x = x;
        } else {
            ptNew.y = y;
        }
        POINT ptScreen = ptNew;
        ClientToScreen(hwnd, &amp;ptScreen);
        SetCursorPos(ptScreen.x, ptScreen.y);
    }

    if (g_pt.x != ptNew.x || g_pt.y != ptNew.y) {
        RECT rcItem = ItemRect(g_pt);
        InvalidateRect(hwnd, &amp;rcItem, TRUE);
        g_pt = ptNew;
        rcItem = ItemRect(g_pt);
        InvalidateRect(hwnd, &amp;rcItem, TRUE);
    }
}

// Add to WndProc

        HANDLE_MSG(hwnd, WM_MOUSEMOVE, OnMouseMove);
</pre>
<p>When the mouse moves, we take the mouse position and see if it is in the forbidden rectangle. If so, we update the coordinates to the nearest legal position and move the mouse there with <code>Set­Cursor­Pos</code>.</p>
<p>Whether or not we had to update the coordinates, if the result produces a new location, then invalidate the object&#8217;s old location (so it will be erased at the next paint cycle), update the object position, and then invalidate the object&#8217;s new position (so it will be drawn at the next paint cycle).</p>
<p>When you run this program, you can try to move the mouse into the forbidden rectangle, but the program will shove the mouse back out. However, it flickers a lot bcause the mouse briefly enters the forbidden rectangle before it is expelled from it.</p>
<p>The customer saw that there is a <code>Clip­Cursor</code> function to constrain the mouse to remain <i>inside</i> a rectangle, but is there an inverse version that forces the mouse to remain <i>outside</i> a rectangle?</p>
<p>There is no such function, but that&#8217;s okay.</p>
<p>What you can do when the mouse is in an illegal position is just pretend that it&#8217;s in a legal position. Let the user move the mouse into a illegal position, but show the feedback at the nearest legal position, and if they drop the object, let it drop at the nearest legal position.</p>
<p>In the above program, that means we remove the call to <code>Set­Cursor­Pos</code>.</p>
<pre>void OnMouseMove(HWND hwnd, int x, int y, UINT keyFlags)
{
    POINT ptNew = { x, y };

    if (PtInRect(&amp;g_rcExclude, ptNew)) {
        // Clamp to nearest legal position
        int leftMargin = ptNew.x - g_rcExclude.left;
        int topMargin = ptNew.y - g_rcExclude.top;
        int rightMargin = g_rcExclude.right - ptNew.x;
        int bottomMargin = g_rcExclude.bottom - ptNew.y;

        int dx, dy;
        int x, y;
        if (leftMargin &lt; rightMargin) {
            x = g_rcExclude.left;
            dx = leftMargin;
        } else {
            x = g_rcExclude.right;
            dx = rightMargin;
        }
        if (topMargin &lt; bottomMargin) {
            y = g_rcExclude.top;
            dy = topMargin;
        } else {
            y = g_rcExclude.bottom;
            dy = bottomMargin;
        }
        if (dx &lt; dy) {
            ptNew.x = x;
        } else {
            ptNew.y = y;
        }
        <span style="border: dashed 1px currentcolor; border-bottom: none;">// <span style="text-decoration: line-through;">POINT ptScreen = ptNew;</span>              </span>
        <span style="border: 1px currentcolor; border-style: none dashed;">// <span style="text-decoration: line-through;">ClientToScreen(hwnd, &amp;ptScreen);</span>     </span>
        <span style="border: dashed 1px currentcolor; border-top: none;">// <span style="text-decoration: line-through;">SetCursorPos(ptScreen.x, ptScreen.y);</span></span>
    }

    if (g_pt.x != ptNew.x || g_pt.y != ptNew.y) {
        RECT rcItem = ItemRect(g_pt);
        InvalidateRect(hwnd, &amp;rcItem, TRUE);
        g_pt = ptNew;
        rcItem = ItemRect(g_pt);
        InvalidateRect(hwnd, &amp;rcItem, TRUE);
    }
}
</pre>
<p>This time, we don&#8217;t try to punish you for moving the mouse into the forbidden rectangle. But the object won&#8217;t follow the mouse into a forbidden region.</p>
<p>The post <a href="https://devblogs.microsoft.com/oldnewthing/20260610-00/?p=112412">What&#8217;s the opposite of &lt;CODE&gt;Clip&shy;Cursor&lt;/CODE&gt; that lets me &lt;I&gt;exclude&lt;/I&gt; the cursor from a region?</a> appeared first on <a href="https://devblogs.microsoft.com/oldnewthing">The Old New Thing</a>.</p>
]]></content:encoded>
    <wfw:commentRss>https://devblogs.microsoft.com/oldnewthing/20260610-00/?p=112412/feed</wfw:commentRss>
    <slash:comments>5</slash:comments>
    <image type="image/png" url="https://devblogs.microsoft.com/oldnewthing/wp-content/uploads/sites/38/2025/10/banner-oldnewthing-blue.webp"/>
  </item>
		
  <item>
    <title>The Microsoft Company Party where everybody played name tag swap</title>
    <link>https://devblogs.microsoft.com/oldnewthing/20260609-00/?p=112409</link>
    <comments>https://devblogs.microsoft.com/oldnewthing/20260609-00/?p=112409#comments</comments>
    <dc:creator><![CDATA[Raymond Chen]]></dc:creator>
    <pubDate>Tue, 09 Jun 2026 14:00:00 +0000</pubDate>
    <category><![CDATA[Old New Thing]]></category>
    <category><![CDATA[History]]></category>
    <guid isPermaLink="false">https://devblogs.microsoft.com/oldnewthing/?p=112409</guid>
    <content:encoded><![CDATA[<p>I learned from a long-retired Microsoft employee about a Company Party that took place around 1984 or so. The company was small enough that a single party could fit the entire company, but not so small that everybody knew everybody else, so each guest was issued a name tag.</p>
<p>During the evening, an unofficial game arose in which people started exchanging their name tags with others whom they met. It also served as a fun little conversation starter: If you swapped name tags with someone and ended up with the tag for somebody you didn&#8217;t know, it wasn&#8217;t hard to find a mutual acquaintance who could track them down and introduce you.</p>
<p>At one point, the employee who was retelling the story was in a group talking with Bill Gates, who was among the few attendees still wearing their original name tags. Bill spotted that one of the other people in the group had a &#8220;<a href="https://en.wikipedia.org/wiki/Gary_Kildall">Gary Kildall</a>&#8221; name tag. I don&#8217;t know whether Gary Kildall was actually invited to the party, or that somebody just created a Gary Kildall name tag as a joke. But Bill saw the &#8220;Gary Kildall&#8221; name tag and eagerly swapped his name tag for it.</p>
<p>The post <a href="https://devblogs.microsoft.com/oldnewthing/20260609-00/?p=112409">The Microsoft Company Party where everybody played name tag swap</a> appeared first on <a href="https://devblogs.microsoft.com/oldnewthing">The Old New Thing</a>.</p>
]]></content:encoded>
    <wfw:commentRss>https://devblogs.microsoft.com/oldnewthing/20260609-00/?p=112409/feed</wfw:commentRss>
    <slash:comments>2</slash:comments>
    <image type="image/png" url="https://devblogs.microsoft.com/oldnewthing/wp-content/uploads/sites/38/2025/10/banner-oldnewthing-blue.webp"/>
  </item>
		
  <item>
    <title>Rotation revisited: Shuffling more than three blocks, and other small notes</title>
    <link>https://devblogs.microsoft.com/oldnewthing/20260608-00/?p=112407</link>
    <comments>https://devblogs.microsoft.com/oldnewthing/20260608-00/?p=112407#comments</comments>
    <dc:creator><![CDATA[Raymond Chen]]></dc:creator>
    <pubDate>Mon, 08 Jun 2026 14:00:00 +0000</pubDate>
    <category><![CDATA[Old New Thing]]></category>
    <category><![CDATA[Code]]></category>
    <guid isPermaLink="false">https://devblogs.microsoft.com/oldnewthing/?p=112407</guid>
    <content:encoded><![CDATA[<p>A few small notes on rotation before you get sick of it. (Too late!)</p>
<p><a title="Swapping two blocks of memory that reside inside a larger block, in constant memory, refinement" href="https://devblogs.microsoft.com/oldnewthing/20260106-00/?p=111968"> Reducing the number of rotations in the discontiguous swap problem</a> from three to two also shows how the solution can be generalized to shuffling an arbitrary number of variable-sized blocks: Given <var>k</var> blocks, of total size <var>n</var>, you can shuffle them arbitrarily in at most <var>kn</var> swaps in constant space: Take the block that goes first and rotate it to the front, which takes <var>n</var> swaps. Then recurse on what&#8217;s left.</p>
<p>You can reduce the number of swaps by comparing the sizes of the block that goes first and the block that goes last and choose to swap the larger block to the corresponding extreme.</p>
<p>I guess you could use this for sorting, but it&#8217;s probably enough of a hassle that you&#8217;ll just take the penalty of allocating a second block of memory rather than trying to be clever and doing it in-place.</p>
<p>In online discussion of this article, I saw a number of people say, &#8220;You can do this with the XOR trick,&#8221; but I&#8217;m not sure what XOR trick they are referring to. If they are talking about using XOR to swap two integer variables without introducing a third variable, that&#8217;s a cute trick I don&#8217;t see how it helps with moving variable-sized blocks around. It also doesn&#8217;t help with swapping non-integers, since it&#8217;s not clear how your XOR two strings or two Widgets.</p>
<p>Another note is that my unit of accounting was the &#8220;swap&#8221;, but really I should be counting &#8220;assignments&#8221; because the cycle decomposition algorithm doesn&#8217;t use swaps. For the purpose of accounting, I&#8217;ve been counting a single assignment as half a swap, though depending on how expensive the move constructor is, a single assignment/construction might only cost a third of a swap.</p>
<p>Finally, a clarification on my description of the solution as &#8220;constant space without allocation&#8221;: Clearly any algorithm requires <i>some</i> space: space for the parameters, return address, any registers used by the code, and any local variables and temporaries. As long as the number and size of these things is bounded by a constant, this is considered a &#8220;constant space&#8221; algorithm. Note that the size of an element is not known to the generalized algorithm, but once you implement the algorithm for a concrete element type, the size becomes a constant.</p>
<p>My description of this as &#8220;without allocation&#8221; is a shorthand for &#8220;without requiring dynamic memory allocation (because the amount of memory needed is known at compile time).&#8221;</p>
<p>I have a soft spot for algorithms that run in constant space (where the constant is reasonably small) because they remove the need to worry about how to recover if there is a memory allocation failure.</p>
<p>The post <a href="https://devblogs.microsoft.com/oldnewthing/20260608-00/?p=112407">Rotation revisited: Shuffling more than three blocks, and other small notes</a> appeared first on <a href="https://devblogs.microsoft.com/oldnewthing">The Old New Thing</a>.</p>
]]></content:encoded>
    <wfw:commentRss>https://devblogs.microsoft.com/oldnewthing/20260608-00/?p=112407/feed</wfw:commentRss>
    <slash:comments>2</slash:comments>
    <image type="image/png" url="https://devblogs.microsoft.com/oldnewthing/wp-content/uploads/sites/38/2025/10/banner-oldnewthing-blue.webp"/>
  </item>
		
  <item>
    <title>The back cover of C++: The Programming Language also raises questions not answered by the front cover</title>
    <link>https://devblogs.microsoft.com/oldnewthing/20260605-01/?p=112391</link>
    <comments>https://devblogs.microsoft.com/oldnewthing/20260605-01/?p=112391#comments</comments>
    <dc:creator><![CDATA[Raymond Chen]]></dc:creator>
    <pubDate>Fri, 05 Jun 2026 14:00:01 +0000</pubDate>
    <category><![CDATA[Old New Thing]]></category>
    <category><![CDATA[Other]]></category>
    <guid isPermaLink="false">https://devblogs.microsoft.com/oldnewthing/?p=112391</guid>
    <content:encoded><![CDATA[<p>A little while ago, we considered how <a title="The cover of C++: The Programming Language raises questions not answered by the cover" href="https://devblogs.microsoft.com/oldnewthing/20260401-00/?p=112180"> the cover of the book <i>C++: The Programming Language</i> raises questions not answered by the cover</a>, since the cover illustration for a book putatively about the C++ programming language shows code written in JavaScript.¹ But there&#8217;s also a question raised by the back cover.</p>
<p>According to the blurb for the book,</p>
<blockquote class="q"><p>The topics included in it are of utmost significance and are bound to provide incredible insights to students. Some of the diverse topics covered in this text address the varied branches that fall under this category. Those in search of information to further their knowledge will be greatly assisted by this textbook.</p></blockquote>
<p>This sounds like a book report written by a student who didn&#8217;t read the book! Those sentences could be used to describe pretty much any textbook.</p>
<p>Indeed, I found nearly identical sentences in the blurb for <i>Casting Handbook</i> (Hannah Wells, editor).</p>
<blockquote class="q"><p>The topics included in this book on casting are of utmost significance and bound to provide incredible insights to readers. Some of the diverse topics covered in this book address the varied branches that fall under this category. It will serve as a valuable source of reference for graduate and post graduate students.</p></blockquote>
<p>And in <i>Food Industry: Processes and Technologies</i> (Kaden Hunt, editor):</p>
<blockquote class="q"><p>This book is compiled in such a manner, that it will provide in-depth knowledge about the theory and practice of the workings of food industry. Some of the diverse topics covered in this text address the varied branches that fall under this category. This textbook, with its detailed analyses and data, will prove immensely beneficial to professionals and students involved in this area at various levels.</p></blockquote>
<p>And in <i>Nutrition and Metabolism: Processes and Technologies</i> (Kaden Hunt, editor):</p>
<blockquote class="q"><p>This book provides comprehensive insights into the field of nutrition and metabolism. It provides deep insights about this field. Some of the diverse topics covered in this text address the varied branches that fall under this category. Such selected concepts that redefine this subject have been presented in it. This book aims to shed light on some of the unexplored aspects of this field. It is meant for students who are looking for an elaborate reference text on nutrition and metabolism.</p></blockquote>
<p>One more example: <i>Material Science and Engineering</i> (Emilio McMahon, editor)</p>
<blockquote class="q"><p>The book aims to shed light on some of the unexplored aspects of materials science and engineering. It describes in detail the various concepts and theories of this field. The topics included in it are of utmost significance and bound to provide incredible insights to students. Some of the diverse topics covered in this book address the varied branches that fall under this category. This textbook is an essential guide for both graduates and post-graduates in this discipline.</p></blockquote>
<p>The common thread is that all of these books are published by Larson and Keller. I guess they can&#8217;t be bothered to spend time crafting a blurb that suits the book, so they just use the same blurb template for all of their books.</p>
<p>¹ <a href="https://devblogs.microsoft.com/oldnewthing/20260401-00/?p=112180&amp;commentid=143992#comment-143992"> Rory Jaffe found</a> that the book cover image <a title="Program code on a monitor: Stock Photo - Alamy" href="https://www.alamy.com/program-code-on-a-monitor-image65859013.html"> it is an Alamy stock photo</a> from 2013 with the title &#8220;Program code on a monitor.&#8221;</p>
<p>The post <a href="https://devblogs.microsoft.com/oldnewthing/20260605-01/?p=112391">The back cover of &lt;I&gt;C++: The Programming Language&lt;/I&gt; also raises questions not answered by the front cover</a> appeared first on <a href="https://devblogs.microsoft.com/oldnewthing">The Old New Thing</a>.</p>
]]></content:encoded>
    <wfw:commentRss>https://devblogs.microsoft.com/oldnewthing/20260605-01/?p=112391/feed</wfw:commentRss>
    <slash:comments>9</slash:comments>
    <image type="image/png" url="https://devblogs.microsoft.com/oldnewthing/wp-content/uploads/sites/38/2025/10/banner-oldnewthing-blue.webp"/>
  </item>
		
  <item>
    <title>Rotation revisited: Avoiding having to calculate the gcd when doing cycle decomposition</title>
    <link>https://devblogs.microsoft.com/oldnewthing/20260605-00/?p=112389</link>
    <comments>https://devblogs.microsoft.com/oldnewthing/20260605-00/?p=112389#comments</comments>
    <dc:creator><![CDATA[Raymond Chen]]></dc:creator>
    <pubDate>Fri, 05 Jun 2026 14:00:00 +0000</pubDate>
    <category><![CDATA[Old New Thing]]></category>
    <category><![CDATA[Code]]></category>
    <guid isPermaLink="false">https://devblogs.microsoft.com/oldnewthing/?p=112389</guid>
    <content:encoded><![CDATA[<p>Last time, we looked at <a title="Rotation revisited: Cycle decomposition in clang's libcxx" href="https://devblogs.microsoft.com/oldnewthing/20260604-00/?p=112384"> how clang&#8217;s libcxx implementation of <code>std::<wbr />rotate</code> uses cycle decomposition</a> to minimize the number of swaps. Doing so requires calculating the greatest common divisor, but I noted that the OpenJDK implementation of the java standard library uses a trick to <a href="https://github.com/openjdk/jdk/blob/f1e0e0c25ec62a543b9cbfabd630fc4ef17a8b5c/src/java.base/share/classes/java/util/Collections.java#L822"> avoid doing the gcd calculation</a>.</p>
<p>The trick is realizing that the total number of elements is equal to the sum of the lengths of each of its cycles, and each of the initial elements belongs to a different cycle. Therefore, we can just keep rotating elements until the number of elements rotated is equal to the total. We don&#8217;t have to precalculate the number of cycles; we just let the counter tell us when we&#8217;re done.</p>
<pre>auto a = std::distance(first, mid); // number of "A" elements
auto n = std::distance(first, last); // total elements
auto count = 0;
auto k = 0;

while (count &lt; n) {
    // Rotate the elements in the cycle starting at k
    auto save = std::move(first[k]);
    auto i, next = k;
    while (i = next, next = (i + a) % n, next != k) {
        first[i] = std::move(first[next]);
        ++count;
    }
    first[i] = std::move(save);
    ++count;
}
</pre>
<p>The post <a href="https://devblogs.microsoft.com/oldnewthing/20260605-00/?p=112389">Rotation revisited: Avoiding having to calculate the gcd when doing cycle decomposition</a> appeared first on <a href="https://devblogs.microsoft.com/oldnewthing">The Old New Thing</a>.</p>
]]></content:encoded>
    <wfw:commentRss>https://devblogs.microsoft.com/oldnewthing/20260605-00/?p=112389/feed</wfw:commentRss>
    <slash:comments>2</slash:comments>
    <image type="image/png" url="https://devblogs.microsoft.com/oldnewthing/wp-content/uploads/sites/38/2025/10/banner-oldnewthing-blue.webp"/>
  </item>
		
  <item>
    <title>Rotation revisited: Cycle decomposition in clang&#8217;s libcxx</title>
    <link>https://devblogs.microsoft.com/oldnewthing/20260604-00/?p=112384</link>
    <comments>https://devblogs.microsoft.com/oldnewthing/20260604-00/?p=112384#comments</comments>
    <dc:creator><![CDATA[Raymond Chen]]></dc:creator>
    <pubDate>Thu, 04 Jun 2026 14:00:00 +0000</pubDate>
    <category><![CDATA[Old New Thing]]></category>
    <category><![CDATA[Code]]></category>
    <guid isPermaLink="false">https://devblogs.microsoft.com/oldnewthing/?p=112384</guid>
    <content:encoded><![CDATA[<p>We got distracted by the rotation algorithm in gcc&#8217;s libstdc++, <a title="Rotation revisited: Another unidirectional algorithm" href="https://devblogs.microsoft.com/oldnewthing/20260602-00/?p=112376"> but let&#8217;s get back to the cycle decomposition algorithm in </a><a href="https://github.com/llvm/llvm-project/blob/682c8e22e61f50ce2d9a0c42475a3aa6d578a1ad/libcxx/include/__algorithm/rotate.h"> clang&#8217;s libcxx</a><a title="Rotation revisited: Another unidirectional algorithm" href="https://devblogs.microsoft.com/oldnewthing/20260602-00/?p=112376">. </a></p>
<p>The implementation in clang&#8217;s libcxx performs the minimum number of swaps, roughly <var>n</var>/2, where <var>n</var> is the total number of elements. It does so by viewing the rotation as a permutation and walking through each of the cycles.</p>
<p>For notational convenience, let <var>a</var> be |A| and <var>n</var> be |A| + |B| (the total number of elements). The number of cycles is <a href="https://en.wikipedia.org/wiki/Greatest_common_divisor">gcd</a>(<var>a</var>, <var>b</var>), and the <var>k</var>&#8216;th cycle consists of the elements starting at <code>first</code> + <var>k</var>, and then stepping to the next element by moving forward another <var>a</var> elements, with wraparound, until you return back to the starting point.</p>
<p>For example, if you have |A| = 4 and |B| = 6, then the cycle that starts at A1 takes 4 steps forward to continues to B1; takes another 4 steps forward to B5; then takes 2 steps forward, wraps around, and then two more steps forward, landing on A3; then takes 4 steps forward to B3; and then takes 4 steps forward and wraps around to A1, which is the starting point.</p>
<table style="border-collapse: collapse; text-align: center;" title="See text" border="0" cellspacing="0" cellpadding="1">
<tbody>
<tr>
<td style="width: 2em; border: solid 1px currentcolor; background: lch(from currentcolor l c h / .1);">A1</td>
<td style="width: 2em; border: solid 1px currentcolor;">A2</td>
<td style="width: 2em; border: solid 1px currentcolor;">A3</td>
<td style="width: 2em; border: solid 1px currentcolor;">A4</td>
<td style="width: 2em; border: solid 1px currentcolor;">B1</td>
<td style="width: 2em; border: solid 1px currentcolor;">B2</td>
<td style="width: 2em; border: solid 1px currentcolor;">B3</td>
<td style="width: 2em; border: solid 1px currentcolor;">B4</td>
<td style="width: 2em; border: solid 1px currentcolor;">B5</td>
<td style="width: 2em; border: solid 1px currentcolor;">B6</td>
</tr>
<tr>
<td>↳</td>
<td>→</td>
<td>→</td>
<td>→</td>
<td>↴</td>
<td><!-- --></td>
<td><!-- --></td>
<td><!-- --></td>
<td><!-- --></td>
<td><!-- --></td>
</tr>
<tr>
<td style="width: 2em; border: solid 1px currentcolor;">A1</td>
<td style="width: 2em; border: solid 1px currentcolor;">A2</td>
<td style="width: 2em; border: solid 1px currentcolor;">A3</td>
<td style="width: 2em; border: solid 1px currentcolor;">A4</td>
<td style="width: 2em; border: solid 1px currentcolor; background: lch(from currentcolor l c h / .1);">B1</td>
<td style="width: 2em; border: solid 1px currentcolor;">B2</td>
<td style="width: 2em; border: solid 1px currentcolor;">B3</td>
<td style="width: 2em; border: solid 1px currentcolor;">B4</td>
<td style="width: 2em; border: solid 1px currentcolor;">B5</td>
<td style="width: 2em; border: solid 1px currentcolor;">B6</td>
</tr>
<tr>
<td><!-- --></td>
<td><!-- --></td>
<td><!-- --></td>
<td><!-- --></td>
<td>↳</td>
<td>→</td>
<td>→</td>
<td>→</td>
<td>↴</td>
<td><!-- --></td>
</tr>
<tr>
<td style="width: 2em; border: solid 1px currentcolor;">A1</td>
<td style="width: 2em; border: solid 1px currentcolor;">A2</td>
<td style="width: 2em; border: solid 1px currentcolor;">A3</td>
<td style="width: 2em; border: solid 1px currentcolor;">A4</td>
<td style="width: 2em; border: solid 1px currentcolor;">B1</td>
<td style="width: 2em; border: solid 1px currentcolor;">B2</td>
<td style="width: 2em; border: solid 1px currentcolor;">B3</td>
<td style="width: 2em; border: solid 1px currentcolor;">B4</td>
<td style="width: 2em; border: solid 1px currentcolor; background: lch(from currentcolor l c h / .1);">B5</td>
<td style="width: 2em; border: solid 1px currentcolor;">B6</td>
</tr>
<tr>
<td>→</td>
<td>→</td>
<td>↴</td>
<td><!-- --></td>
<td><!-- --></td>
<td><!-- --></td>
<td><!-- --></td>
<td><!-- --></td>
<td>↳</td>
<td>→</td>
</tr>
<tr>
<td style="width: 2em; border: solid 1px currentcolor;">A1</td>
<td style="width: 2em; border: solid 1px currentcolor;">A2</td>
<td style="width: 2em; border: solid 1px currentcolor; background: lch(from currentcolor l c h / .1);">A3</td>
<td style="width: 2em; border: solid 1px currentcolor;">A4</td>
<td style="width: 2em; border: solid 1px currentcolor;">B1</td>
<td style="width: 2em; border: solid 1px currentcolor;">B2</td>
<td style="width: 2em; border: solid 1px currentcolor;">B3</td>
<td style="width: 2em; border: solid 1px currentcolor;">B4</td>
<td style="width: 2em; border: solid 1px currentcolor;">B5</td>
<td style="width: 2em; border: solid 1px currentcolor;">B6</td>
</tr>
<tr>
<td><!-- --></td>
<td><!-- --></td>
<td>↳</td>
<td>→</td>
<td>→</td>
<td>→</td>
<td>↴</td>
<td><!-- --></td>
<td><!-- --></td>
<td><!-- --></td>
</tr>
<tr>
<td style="width: 2em; border: solid 1px currentcolor;">A1</td>
<td style="width: 2em; border: solid 1px currentcolor;">A2</td>
<td style="width: 2em; border: solid 1px currentcolor;">A3</td>
<td style="width: 2em; border: solid 1px currentcolor;">A4</td>
<td style="width: 2em; border: solid 1px currentcolor;">B1</td>
<td style="width: 2em; border: solid 1px currentcolor;">B2</td>
<td style="width: 2em; border: solid 1px currentcolor; background: lch(from currentcolor l c h / .1);">B3</td>
<td style="width: 2em; border: solid 1px currentcolor;">B4</td>
<td style="width: 2em; border: solid 1px currentcolor;">B5</td>
<td style="width: 2em; border: solid 1px currentcolor;">B6</td>
</tr>
<tr>
<td>↴</td>
<td><!-- --></td>
<td><!-- --></td>
<td><!-- --></td>
<td><!-- --></td>
<td><!-- --></td>
<td>↳</td>
<td>→</td>
<td>→</td>
<td>→</td>
</tr>
<tr>
<td style="width: 2em; border: solid 1px currentcolor; background: lch(from currentcolor l c h / .1);">A1</td>
<td style="width: 2em; border: solid 1px currentcolor;">A2</td>
<td style="width: 2em; border: solid 1px currentcolor;">A3</td>
<td style="width: 2em; border: solid 1px currentcolor;">A4</td>
<td style="width: 2em; border: solid 1px currentcolor;">B1</td>
<td style="width: 2em; border: solid 1px currentcolor;">B2</td>
<td style="width: 2em; border: solid 1px currentcolor;">B3</td>
<td style="width: 2em; border: solid 1px currentcolor;">B4</td>
<td style="width: 2em; border: solid 1px currentcolor;">B5</td>
<td style="width: 2em; border: solid 1px currentcolor;">B6</td>
</tr>
</tbody>
</table>
<p>There&#8217;s another cycle that starts at A2 and continues to B2, B6, A4, B4, then back to A2.</p>
<p>Now, we&#8217;ve been counting swaps, but a single-element rotation is not done as a sequence of swaps, but rather by picking up the first element, sliding all the other elements over, and then putting the original first element at the end. I&#8217;ve been informally calling an assignment &#8220;half of a swap&#8221;, though a swap is really a constructor, two assignments, and a destructor. But let&#8217;s stick with the &#8220;half a swap&#8221; accounting fiction.</p>
<p>The rotation algorithm goes like this:</p>
<pre>auto a = std::distance(first, mid); // number of "A" elements
auto n = std::distance(first, last); // total elements
auto g = gcd(a, n); // number of cycles

for (auto k = 0; k &lt; g; ++k) {
    // Rotate the elements in the cycle starting at k
    auto save = std::move(first[k]);
    auto i, next = k;
    while (i = next, next = (i + a) % n, next != k) {
        first[i] = std::move(first[next]);
    }
    first[i] = std::move(save);
}
</pre>
<p>For example, if rotating A1, A2, B1, B2, B3, B4, there are two cycles: A1, B1, B3; and A2, B2, B4. The elements within each cycle rotate one position.</p>
<table style="border-collapse: collapse; text-align: center;" title="See text" border="0" cellspacing="0" cellpadding="1">
<tbody>
<tr>
<td>⮣</td>
<td>→</td>
<td>→</td>
<td>→</td>
<td>→</td>
<td>→</td>
<td>⮧</td>
</tr>
<tr>
<td>⮤</td>
<td style="border: solid 1px currentcolor; border-bottom: none;"><!-- --></td>
<td>←</td>
<td style="border: solid 1px currentcolor; border-bottom: none;"><!-- --></td>
<td>←</td>
<td style="border: solid 1px currentcolor; border-bottom: none;"><!-- --></td>
<td>⮠</td>
</tr>
<tr style="height: 1ex;">
<td colspan="7"><!-- --></td>
</tr>
<tr>
<td style="width: 2em;"> </td>
<td style="width: 2em; border: solid 1px currentcolor; background: lch(from currentcolor l c h / .1);">A1</td>
<td style="width: 2em; border: solid 1px currentcolor; background: lch(from currentcolor l c h / .1);">A2</td>
<td style="width: 2em; border: solid 1px currentcolor;">B1</td>
<td style="width: 2em; border: solid 1px currentcolor;">B2</td>
<td style="width: 2em; border: solid 1px currentcolor;">B3</td>
<td style="width: 2em; border: solid 1px currentcolor;">B4</td>
</tr>
<tr style="height: 1ex;">
<td colspan="7"><!-- --></td>
</tr>
<tr>
<td>⮦</td>
<td style="border: solid 1px currentcolor; border-top: none;"><!-- --></td>
<td>←</td>
<td style="border: solid 1px currentcolor; border-top: none;"><!-- --></td>
<td>←</td>
<td style="border: solid 1px currentcolor; border-top: none;"><!-- --></td>
<td>⮢</td>
</tr>
<tr>
<td>⮡</td>
<td>→</td>
<td>→</td>
<td>→</td>
<td>→</td>
<td>→</td>
<td>⮥</td>
</tr>
</tbody>
</table>
<p>And when you&#8217;re done with all the cycles, you&#8217;ve rotated the entire A and B blocks.</p>
<table style="border-collapse: collapse; text-align: center;" title="See text" border="0" cellspacing="0" cellpadding="1">
<tbody>
<tr>
<td style="width: 2em; border: solid 1px currentcolor;">B1</td>
<td style="width: 2em; border: solid 1px currentcolor;">B2</td>
<td style="width: 2em; border: solid 1px currentcolor;">B3</td>
<td style="width: 2em; border: solid 1px currentcolor;">B4</td>
<td style="width: 2em; border: solid 1px currentcolor; background: lch(from currentcolor l c h / .1);">A1</td>
<td style="width: 2em; border: solid 1px currentcolor; background: lch(from currentcolor l c h / .1);">A2</td>
</tr>
</tbody>
</table>
<p>This performs <var>n</var>/2 swaps, which is the fewest swaps of all the algorithms we&#8217;ve looked at so far. However, it has terrible locality because the elements in the cycle are all spread out.</p>
<p>Calculating the greated common divisor of two numbers can be done in <var>O</var>(log <var>n</var>) steps via Euclid&#8217;s algorithm.</p>
<pre>int gcd(int a, int b)
{
    do {
        auto r = a % b;
        a = b;
        b = r;
    } while (r);
    return a;
}
</pre>
<p><a href="https://devblogs.microsoft.com/oldnewthing/20260101-00/?p=111955&amp;commentid=143690#comment-143690"> Commenter Brent thought that the cycle decomposition algorithm was obvious</a>. Of course, the trick is the step they called &#8220;Repeat&#8221;. How many times do you repeat?</p>
<p>The clang libcxx algorithm calculates the number of repeats by taking the gcd. But there&#8217;s a trick so we don&#8217;t have to calculated it at all. We&#8217;ll look at that trick next time.</p>
<p><b>Bonus chatter</b>: I think it&#8217;s interesting that of the three major implementations of the C++ standard library, each one uses a different rotation algorithm when given random-access iterators!</p>
<p>The post <a href="https://devblogs.microsoft.com/oldnewthing/20260604-00/?p=112384">Rotation revisited: Cycle decomposition in clang&#8217;s libcxx</a> appeared first on <a href="https://devblogs.microsoft.com/oldnewthing">The Old New Thing</a>.</p>
]]></content:encoded>
    <wfw:commentRss>https://devblogs.microsoft.com/oldnewthing/20260604-00/?p=112384/feed</wfw:commentRss>
    <slash:comments>1</slash:comments>
    <image type="image/png" url="https://devblogs.microsoft.com/oldnewthing/wp-content/uploads/sites/38/2025/10/banner-oldnewthing-blue.webp"/>
  </item>
		
  <item>
    <title>Rotation revisited: A shocking discovery about gcc&#8217;s unidirectional rotation algorithm</title>
    <link>https://devblogs.microsoft.com/oldnewthing/20260603-00/?p=112378</link>
    <comments>https://devblogs.microsoft.com/oldnewthing/20260603-00/?p=112378#comments</comments>
    <dc:creator><![CDATA[Raymond Chen]]></dc:creator>
    <pubDate>Wed, 03 Jun 2026 14:00:00 +0000</pubDate>
    <category><![CDATA[Old New Thing]]></category>
    <category><![CDATA[Code]]></category>
    <guid isPermaLink="false">https://devblogs.microsoft.com/oldnewthing/?p=112378</guid>
    <content:encoded><![CDATA[<p>Last time, we looked at <a title="Rotation revisited: Another unidirectional algorithm" href="https://devblogs.microsoft.com/oldnewthing/20260602-00/?p=112376"> the rotation algorithm used by gcc libstdc++ for random-access iterators</a>, and I concluded by noting that we&#8217;re going to make a shocking discovery.</p>
<p>As with all shocking discoveries, this one will <span style="text-decoration: line-through;">shock</span> <span style="border: solid 1px currentcolor;">disappoint</span> you.</p>
<p>The discovery is that the gcc libstdc++ algorithm is the same as <a title="How can you swap two non-adjacent blocks of memory using only forward iterators?" href="https://devblogs.microsoft.com/oldnewthing/20260105-00/?p=111962"> the forward-iterator algorithm</a>!</p>
<p>Let&#8217;s run both algorithms on a problem where the two blocks are A1, A2, A3, B1, B2, B3, B4, B5. I&#8217;ll put the old forward iterator algorithm on top and the new gcc libstdc++ algorithm below.</p>
<table style="border-collapse: collapse; text-align: center;" title="A1, A2, A3, B1, B2, B3, B4, B5" border="0" cellspacing="0" cellpadding="1">
<tbody>
<tr>
<td colspan="2" align="left">first</td>
<td>&nbsp;</td>
<td colspan="2" align="left">mid</td>
<td>&nbsp;</td>
<td>&nbsp;</td>
<td>&nbsp;</td>
<td align="left">last</td>
</tr>
<tr>
<td align="left">↓</td>
<td>&nbsp;</td>
<td>&nbsp;</td>
<td align="left">↓</td>
<td>&nbsp;</td>
<td>&nbsp;</td>
<td>&nbsp;</td>
<td>&nbsp;</td>
<td align="left">↓</td>
</tr>
<tr>
<td style="width: 2em; border: solid 1px currentcolor; background: lch(from currentcolor l c h / .1);">A1</td>
<td style="width: 2em; border: solid 1px currentcolor; background: lch(from currentcolor l c h / .1);">A2</td>
<td style="width: 2em; border: solid 1px currentcolor; background: lch(from currentcolor l c h / .1);">A3</td>
<td style="width: 2em; border: solid 1px currentcolor;">B1</td>
<td style="width: 2em; border: solid 1px currentcolor;">B2</td>
<td style="width: 2em; border: solid 1px currentcolor;">B3</td>
<td style="width: 2em; border: solid 1px currentcolor;">B4</td>
<td style="width: 2em; border: solid 1px currentcolor;">B5</td>
</tr>
<tr>
<td align="left">↑</td>
<td>&nbsp;</td>
<td>&nbsp;</td>
<td align="left">↑</td>
<td>&nbsp;</td>
<td>&nbsp;</td>
<td>&nbsp;</td>
<td>&nbsp;</td>
<td align="left">↑</td>
</tr>
<tr>
<td colspan="2" align="left">first</td>
<td>&nbsp;</td>
<td colspan="2" align="left">mid</td>
<td>&nbsp;</td>
<td>&nbsp;</td>
<td>&nbsp;</td>
<td align="left">last</td>
</tr>
</tbody>
</table>
<p>We swap at <code>first</code> and <code>mid</code>, then advance both pointers. The two algorithms agree until <code>first</code> reaches the end of the original A block.</p>
<table style="border-collapse: collapse; text-align: center;" title="A1, A2, A3, B1, B2, B3, B4, B5" border="0" cellspacing="0" cellpadding="1">
<tbody>
<tr>
<td>&nbsp;</td>
<td>&nbsp;</td>
<td>&nbsp;</td>
<td colspan="2" align="left">first</td>
<td>&nbsp;</td>
<td>mid</td>
<td>&nbsp;</td>
<td>last</td>
</tr>
<tr>
<td>&nbsp;</td>
<td>&nbsp;</td>
<td>&nbsp;</td>
<td align="left">↓</td>
<td>&nbsp;</td>
<td>&nbsp;</td>
<td align="left">↓</td>
<td>&nbsp;</td>
<td align="left">↓</td>
</tr>
<tr>
<td style="width: 2em; border: solid 1px currentcolor;">B1</td>
<td style="width: 2em; border: solid 1px currentcolor;">B2</td>
<td style="width: 2em; border: solid 1px currentcolor;">B3</td>
<td style="width: 2em; border: solid 1px currentcolor; background: lch(from currentcolor l c h / .1);">A1</td>
<td style="width: 2em; border: solid 1px currentcolor; background: lch(from currentcolor l c h / .1);">A2</td>
<td style="width: 2em; border: solid 1px currentcolor; background: lch(from currentcolor l c h / .1);">A3</td>
<td style="width: 2em; border: solid 1px currentcolor;">B4</td>
<td style="width: 2em; border: solid 1px currentcolor;">B5</td>
</tr>
<tr>
<td>&nbsp;</td>
<td>&nbsp;</td>
<td>&nbsp;</td>
<td align="left">↑</td>
<td>&nbsp;</td>
<td>&nbsp;</td>
<td align="left">↑</td>
<td>&nbsp;</td>
<td align="left">↑</td>
</tr>
<tr>
<td>&nbsp;</td>
<td>&nbsp;</td>
<td>&nbsp;</td>
<td colspan="2" align="left">first</td>
<td>&nbsp;</td>
<td>mid</td>
<td>&nbsp;</td>
<td>last</td>
</tr>
</tbody>
</table>
<p>The old algorithm recurses in order to exchange A1, A2, A3 with B4, B4. This happens by exchanging A1 with B4 and A2 with B5.</p>
<p>The new algorithm just keeps swapping <code>first</code> with <code>mid</code>, which also exchanges A1 with B4 and A2 with B5.</p>
<table style="border-collapse: collapse; text-align: center;" title="A1, A2, A3, B1, B2, B3, B4, B5" border="0" cellspacing="0" cellpadding="1">
<tbody>
<tr>
<td>&nbsp;</td>
<td>&nbsp;</td>
<td>&nbsp;</td>
<td>&nbsp;</td>
<td>&nbsp;</td>
<td colspan="2" align="left" valign="bottom">first</td>
<td>&nbsp;</td>
<td>mid<br />
last</td>
</tr>
<tr>
<td>&nbsp;</td>
<td>&nbsp;</td>
<td>&nbsp;</td>
<td>&nbsp;</td>
<td>&nbsp;</td>
<td align="left">↓</td>
<td>&nbsp;</td>
<td>&nbsp;</td>
<td align="left">↓</td>
</tr>
<tr>
<td style="width: 2em; border: solid 1px currentcolor;">B1</td>
<td style="width: 2em; border: solid 1px currentcolor;">B2</td>
<td style="width: 2em; border: solid 1px currentcolor;">B3</td>
<td style="width: 2em; border: solid 1px currentcolor;">B4</td>
<td style="width: 2em; border: solid 1px currentcolor;">B5</td>
<td style="width: 2em; border: solid 1px currentcolor; background: lch(from currentcolor l c h / .1);">A3</td>
<td style="width: 2em; border: solid 1px currentcolor; background: lch(from currentcolor l c h / .1);">A1</td>
<td style="width: 2em; border: solid 1px currentcolor; background: lch(from currentcolor l c h / .1);">A2</td>
</tr>
<tr>
<td>&nbsp;</td>
<td>&nbsp;</td>
<td>&nbsp;</td>
<td>&nbsp;</td>
<td>&nbsp;</td>
<td align="left">↑</td>
<td>&nbsp;</td>
<td>&nbsp;</td>
<td align="left">↑</td>
</tr>
<tr>
<td>&nbsp;</td>
<td>&nbsp;</td>
<td>&nbsp;</td>
<td>&nbsp;</td>
<td>&nbsp;</td>
<td colspan="2" align="left" valign="baseline">first</td>
<td>&nbsp;</td>
<td>last<br />
mid</td>
</tr>
</tbody>
</table>
<p>The old algorithm now recurses to swap the A3 block with the A1+A2 block. And that&#8217;s what the new algorithm does, too.</p>
<p>So it&#8217;s the same algorithm, just with a different point of view. It&#8217;s another case of <a title="The geeky thrill of discovering that two things are really the same thing, just with different labels" href="https://devblogs.microsoft.com/oldnewthing/20140414-01/?p=1253"> the geeky thrill of discovering that two things are really the same thing, just with different labels</a>.</p>
<p>Now, the two algorithms are not identical. The new algorithm is symmetric and performs its swaps from right to left if the larger block is on the right. The old algorithm always operates from left to right.</p>
<p>But the similarity is striking.</p>
<p>Next time, we&#8217;ll look at how clang performs rotation by decomposing into cycles.</p>
<p>The post <a href="https://devblogs.microsoft.com/oldnewthing/20260603-00/?p=112378">Rotation revisited: A shocking discovery about gcc&#8217;s unidirectional rotation algorithm</a> appeared first on <a href="https://devblogs.microsoft.com/oldnewthing">The Old New Thing</a>.</p>
]]></content:encoded>
    <wfw:commentRss>https://devblogs.microsoft.com/oldnewthing/20260603-00/?p=112378/feed</wfw:commentRss>
    <slash:comments>1</slash:comments>
    <image type="image/png" url="https://devblogs.microsoft.com/oldnewthing/wp-content/uploads/sites/38/2025/10/banner-oldnewthing-blue.webp"/>
  </item>
	</channel>
</rss>
