John D. Cook

In-shuffles and out-shuffles

John — Thu, 01 Jan 2026 20:48:02 +0000

The previous post talked about doing perfect shuffles: divide a deck in half, and alternately let one card from each half fall.

It matters which half lets a card fall first. If the top half’s bottom card falls first, this is called an in-shuffle. If the bottom half’s bottom card falls first, it’s called an out-shuffle.

With an out-shuffle, the top and bottom cards don’t move. Presumably it’s called an out-shuffle because the outside cards remain in place.

An out-shuffle amounts to an in-shuffle of the inner cards, i.e. the rest of the deck not including the top and bottom card.

The previous post had a Python function for doing an in-shuffle. Here we generalize the function to do either an in-shuffle or an out-shuffle. We also get rid of the list comprehension, making the code longer but easier to understand.

def shuffle2(deck, inside = True):
    n = len(deck)
    top = deck[: n//2]
    bottom = deck[n//2 :]
    if inside:
        first, second = bottom, top
    else:
        first, second = top, bottom
    newdeck = []
    for p in zip(first, second):
        newdeck.extend(p)
    return newdeck

Let’s use this code to demonstrate that an out-shuffle amounts to an in-shuffle of the inner cards.

deck = list(range(10))
d1 = shuffle2(deck, False) 
d2 = [deck[0]] + shuffle2(deck[1:9], True) + [deck[9]]
print(d1)
print(d2)

Both print statements produce [0, 5, 1, 6, 2, 7, 3, 8, 4, 9].

I said in the previous post that k perfect in-shuffles will restore the order of a deck of n cards if

2^k = 1 (mod n + 1).

It follows that k perfect out-shuffles will restore the order of a deck of n cards if

2^k = 1 (mod n − 1)

since an out-shuffle of n cards is essentially an in-shuffle of the n − 2 cards in the middle.

So, for example, it only takes 8 out-shuffles to return a deck of 52 cards to its original order. In the previous post we said it takes 52 in-shuffles, so it takes a lot fewer out-shuffles than in-shuffles.

It’s plausible to conjecture that it takes fewer out-shuffles than in-shuffles to return a deck to its initial order, since the former leaves the two outside cards in place. But that’s not always true. It’s true for a deck of 52 cards, but not for a deck of 14, for example. For a deck of 14 cards, it takes 4 in-shuffles or 12 out-shuffles to restore the deck.

The post In-shuffles and out-shuffles first appeared on John D. Cook.

Perfect and imperfect shuffles

John — Thu, 01 Jan 2026 13:31:18 +0000

Take a deck of cards and cut it in half, placing the top half of the deck in one hand and the bottom half in the other. Now bend the stack of cards in each hand and let cards alternately fall from each hand. This is called a rifle shuffle.

Random shuffles

Persi Diaconis proved that it takes seven shuffles to fully randomize a desk of 52 cards. He studied videos of people shuffling cards in order to construct a realistic model of the shuffling process.

Shuffling randomizes a deck of cards due to imperfections in the process. You may not cut the deck exactly in half, and you don’t exactly interleave the two halves of the deck. Maybe one card falls from your left hand, then two from your right, etc.

Diaconis modeled the process with a probability distribution on how many cards are likely to fall each time. And because his model was realistic, after seven shuffles a deck really is well randomized.

Perfect shuffles

Now suppose we take the imperfection out of shuffling. We do cut the deck of cards exactly in half each time, and we let exactly one card fall from each half each time. And to be specific, let’s say the first card will always fall from the top half of the deck. That is, we do an in-shuffle. (See the next post for a discussion of in-shuffles and out-shuffles.) A perfect shuffle does not randomize a deck because it’s a deterministic permutation.

To illustrate a perfect in-shuffle, suppose you start with a deck of these six cards.

Then you divide the deck into two halves.

Then after the shuffle you have the following.

Incidentally, I created the images above using a font that included glyphs for the Unicode characters for playing cards. More on that here. The font produced black-and-white images, so I edited the output in GIMP to turn things red that should be red.

Coming full circle

If you do enough perfect shuffles, the deck returns to its original order. This could be the basis for a magic trick, if the magician has the skill to repeatedly perform a perfect shuffle.

Performing k perfect in-shuffles will restore the order of a deck of n cards if

2^k = 1 (mod n + 1).

So, for example, after 52 in-shuffles, a deck of 52 cards returns to its original order. We can see this from a quick calculation at the Python REPL:

>>> 2**52 % 53
1

With slightly more work we can show that less than 52 shuffles won’t do.

>>> for k in range(1, 53):
    ... if 2**k % 53 == 1: print(k)
52

The minimum number of shuffles is not always the same as the size of the deck. For example, it takes 4 shuffles to restore the order of a desk of 14 cards.

>>> 2**4 % 15
1

Shuffle code

Here’s a function to perform a perfect in-shuffle.

def shuffle(deck):
    n = len(deck)
    return [item for pair in zip(deck[n//2 :], deck[:n//2]) for item in pair]

With this you can confirm the results above. For example,

n = 14
k = 4
deck = list(range(n))
for _ in range(k):
    deck = shuffle(deck)
print(deck)

This prints 0, 1, 2, …, 13 as expected.

The post Perfect and imperfect shuffles first appeared on John D. Cook.

Knight’s tour with fewest obtuse angles

John — Wed, 31 Dec 2025 13:48:24 +0000

Donald Knuth gives a public lecture each year around Christmas. This year was his 29th Christmas lecture, Adventures with Knight’s Tours.

I reproduced one of the images from his lecture.

This is the knight’s tour with the minimum number of obtuse angles, marked with red dots. The solution is unique, up to rotations and reflections.

Knuth said he thought this was one of the most beautiful knight’s tours. He discusses this tour about 44 minutes into the video here.

More knight’s tour posts

The post Knight’s tour with fewest obtuse angles first appeared on John D. Cook.

The center of the earth is not straight down

John — Mon, 29 Dec 2025 11:00:48 +0000

If the earth were a perfect sphere, “down” would be the direction to the center of the earth, wherever you stand. But because our planet is a bit flattened at the poles, a line perpendicular to the surface and a line to the center of the earth are not the same. They’re nearly the same because the earth is nearly a sphere, but not exactly, unless you’re at the equator or at one of the poles. Sometimes the difference matters and sometimes it does not.

From a given point on the earth’s surface, draw two lines: one straight down (i.e. perpendicular to the surface) and one straight to the center of the earth. The angle φ that the former makes with the equatorial plane is geographic latitude. The angle θ that the latter makes with the equatorial plane is geocentric latitude.

For illustration we will draw an ellipse that is far more eccentric than a polar cross-section of the earth.

At first it may not be clear why geographic latitude is defined the way it is; geocentric latitude is conceptually simpler. But geographic latitude is easier to measure: a plumb bob will show you which direction is straight down.

There may be some slight variation between the direction of a plumb bob and a perpendicular to the earth’s surface due to variations in surface gravity. However, the deviations due to gravity are a couple orders of magnitude smaller than the differences between geographic and geocentric latitude.

Conversion formulas

The conversion between the two latitudes is as follows.

Here e is eccentricity. The equations above work for any ellipsoid, but for earth in particular e² = 0.00669438.

The function atan2(y, x) returns an angle in the same quadrant as the point (x, y) whose tangent is y/x. [1]

As a quick sanity check on the equations, note that when eccentricity e is zero, i.e. in the case of a circle, φ = θ. Also, if φ = 0 then θ = φ for all eccentricity values.

Next we give a proof of the equations above.

Proof

We can parameterize an ellipse with semi-major axis a and semi-minor axis b by

The slope at a point (x(t), y(t)) is the ratio

and so the slope of a line perpendicular to the tangent, i.e tan φ, is

Now

and so

where e² = 1 − b²/a² is the eccentricity of the ellipse. Therefore

and the equations at the top of the post follow.

Difference

For the earth’s shape, e² = 0.006694 per WGS84. For small eccentricities, the difference between geographic and geocentric latitude is approximately symmetric around 45°.

But for larger values of eccentricity the asymmetry becomes more pronounced.

[1] There are a couple complications with programming language implementations of atan2. Some call the function arctan2 and some reverse the order of the arguments. More on that here.

The post The center of the earth is not straight down first appeared on John D. Cook.

Interesting categories are big

John — Mon, 29 Dec 2025 01:23:19 +0000

One of the things I found off-putting about category theory when I was first exposed to it was its reliance on the notion of “collections” that are not sets. That seemed to place the entire theory on dubious foundations with paradoxes looming around every corner.

It turns out you can mostly ignore such issues in application. You can, for example, talk about the forgetful functor that maps a group to the set of its elements, ignoring the group structure, without having to think deeply about the collection of all sets, which Russell’s paradox tells us cannot itself be a set.

And yet issues of cardinality are not entirely avoidable. There is a theorem [1] that says in effect that category theory would be uninteresting without collections too large to be sets.

Every category C with a set of arrows is isomorphic to one in which the objects are sets and the arrows are functions.

If the collection of arrows (morphisms) between objects in C is so small as to be a set, then C is a sub-category of the category of sets. As Awodey explains, “the only special properties such categories can possess are ones that are categorically irrelevant, such as features of the objects that do not aﬀect the arrows in any way.”

Most categories of interest have too many objects to be a set, and even more morphisms than objects.

[1] Category Theory by Steve Awodey. Theorem 1.6.

The post Interesting categories are big first appeared on John D. Cook.

Klein bottle

John — Sat, 27 Dec 2025 11:14:04 +0000

One of my daughters gave me a Klein bottle for Christmas.

Imagine starting with a cylinder and joining the two ends together. This makes a torus (doughnut). But if you twist the ends before joining them, much like you twist the ends of a rectangular strip to make a Möbius strip, you get a Klein bottle. This isn’t possible to do in 3D without making the cylinder pass through itself, so you’re supposed to imagine that the part where the bottle intersects itself isn’t there.

But is a Klein bottle real? My Christmas present is a real physical object, so it’s real in that sense. Is a Klein bottle real as a mathematical object? Can it be defined without any appeal to imagining things that aren’t true? Yes it can.

Formal definition

Start with a unit square, the set of points (x, y) with 0 ≤ x, y ≤ 1. If you identify the top and bottom of the square, the points with y coordinate equal to 0 or 1, you get a cylinder. You can imagine curling the square in 3D and taping the top and bottom together.

Similarly, if you start with the unit square and identify the vertical sides together with a twist, you get a Möbius strip. You won’t be able to physically do this with a square, but you could with a rectangle. Or you could imagine the square to be made out of rubber, and you stretch it before you twist it and join the edges together.

If you start with the unit square and do both things described above—join the top and bottom as-is and join the sides with a twist—you get a Klein bottle. You can’t quite physically do both at the same time in 3D; you’d have to cut a little hole in the square to let part of the square pass through, as in the glass bottle at the top of the post.

Although you can’t construct a physical Klein bottle without a bit of cheating, there’s nothing wrong with the mathematical definition. There are some details that have been left out, but there’s nothing illegal about the construction.

More formality

To fill in the missing details, we have to say just what we mean by identifying points. When we identify the top edge and bottom edge of the square to make a cylinder, we mean that we imagine that for every x, (x, 0) and (x, 1) are the same point. Similarly, when we identify the sides with a twist, we imagine that for all y, (0, y) and (1, 1 − y) are the same point.

But this is unsatisfying. What does all this imagining mean? How is this any better than imagining that the hole in the glass bottle isn’t there? We can define what it means to “identify” or “glue” edges together in a way that’s perfectly rigorous.

We can say that as a set of points, the Klein bottle is

K = [0, 1) × [0, 1),

removing the top and right edge. But what makes this set of points a Klein bottle is the topology we put on it, the way we define which points are close together.

We define an ε neighborhood of a point (x, 0) to be the union of two half disks, the intersection with K of an open disk of radius ε centered at (x, 0) and the intersection with K of an open disk centered at (x, 1). This is a way to make rigorous the idea of gluing (x, 0) and (x, 1) together.

Along the same lines, we define an ε neighborhood of a point (0, y) to be the intersection with K of an open disk of radius ε centered at (0, y) and an open disk of radius ε centered at (1, 1 − y).

The discussion with coordinates is more complicated than the talk about imagining this and that, but it’s more rigorous. You can’t have simplicity and rigor at the same time, so you alternate back and forth. You think in terms of the simple visualization, but when you’re concerned that you may be saying something untrue, you go down to the detail of coordinates and prove things carefully.

Topology can seem all hand-wavy because that’s how topologist communicate. They speak in terms of twisting this and glueing that. But they have in the back of their mind that all these manipulations can be justified. The formalism may be left implicit, even in a scholarly publication, when it’s assumed that the reader could fill in the details. But when things are more subtle, the formalism is written out.

Escaping 3D

In the construction above, we define the Klein bottle as a set of points in the 2D plane with a new topology. That works, but there’s another approach. I said above that you can’t join the edges to make a Klein bottle in three dimensions. I added this disclaimer because you can join the edges without cheating if you work in higher dimensions.

If you’d like a parameterization of the Klein bottle, say because you want to calculate something, you can do that, but you’ll need to work in four dimensions. There’s more room to move around in higher dimensions, letting you do things you can’t do in three dimensions.

The post Klein bottle first appeared on John D. Cook.

Automation and Validation

John — Wed, 24 Dec 2025 14:06:41 +0000

It’s been said whatever you can validate, you can automate. An AI that produces correct work 90% of the time could be very valuable, provided you have a way to identify the 10% of the cases where it is wrong. Often verifying a solution takes far less computation than finding a solution. Examples here.

Validating AI output can be tricky since the results are plausible by construction, though not always correct.

Consistency checks

One way to validate output is to apply consistency checks. Such checks are necessary, but not sufficient, and often easy to implement. An simple consistency check might be that inputs to a transaction equal outputs. A more sophisticated consistency check might be conservation of energy or something analogous to it.

Certificates

Some problems have certificates, ways of verifying that a calculation is correct that can be evaluated with far less effort than finding the solution that they verify. I’ve written about certificates in the context of optimization, solving equations, and finding prime numbers.

Formal methods

Correctness is more important in some contexts than others. If a recommendation engine makes a bad recommendation once in a while, the cost is a lower probability of conversion in a few instances. If an aircraft collision avoidance system makes an occasional error, the consequences could be catastrophic.

When the cost of errors is extremely high, formal verification may be worthwhile. Formal correctness proofs using something like Lean or Rocq are extremely tedious and expensive to create, and hence not economical. But if an AI can generate a result and a formal proof of correctness, hurrah!

Who watches the watchmen?

But if an AI result can be wrong, why couldn’t a formal proof generated to defend the result also be wrong? As the Roman poet Juvenal asked, Quis custodiet ipsos custodes? Who will watch the watchmen?

An AI could indeed generate an incorrect proof, but if it does, the proof assistant will reject it. So the answer to who will watch Claude, Gemini, and ChatGPT is Lean, Rocq, and Isabelle.

Who watches the watchers of the watchmen?

Isn’t it possible that a theorem prover like Rocq could have a bug? Of course it’s possible; there is no absolute certainty under the sun. But hundreds of PhD-years of work have gone into Rocq (formerly Coq) and so bugs in the kernel of that system are very unlikely. The rest of the system is bootstrapped, verified by the kernel.

Even so, an error in the theorem prover does not mean an error in the original result. For an incorrect result to slip through, the AI-generated proof would have to be wrong in a way that happens to exploit an unknown error in the theorem prover. It is far more likely that you’re trying to prove the wrong thing than that the theorem prover let you down.

I mentioned collision avoidance software above. I looked into collision avoidance software when I did some work for Amazon’s drone program. The software that was formally verified was also unrealistic in its assumptions. The software was guaranteed to work correctly, if two objects are flying at precisely constant velocity at precisely the same altitude etc. If everything were operating according to geometrically perfect assumptions, there would be no need for collision avoidance software.

The post Automation and Validation first appeared on John D. Cook.

When was Newton born?

John — Tue, 23 Dec 2025 17:03:58 +0000

Newton’s birthday was on Christmas when he was born, but now his birthday is not.

When Newton was born, England was still using the Julian calendar, and would continue to use the Julian calendar until 25 years after his death.

On the day of Newton’s birth, his parents would have said the date was December 25, 1642. We would now describe the date as January 4, 1643.

You’ll sometimes see Newton’s birthday written as December 25, 1642 O.S. The “O.S.” stands for “Old Style,” i.e. Julian calendar. Of course the Newton family would not have written O.S. because there was no old style until the new style (i.e. Gregorian calendar) was adopted, just as nobody living in the years before Christ would have written a date as B.C.

In a nutshell, the Julian year was too long, which made it drift out of sync with the astronomical calendar. The Julian year was 365 ¹/₄ days, whereas the Gregorian calendar has 365 ⁹⁷/₄₀₀ days, which more closely matches the time it takes Earth to orbit the sun. Removing three Leap Days (in centuries not divisible by 400) put the calendar back in sync. When countries adopted the Gregorian calendar, they had to retroactively remove excess Leap Days. That’s why Newton’s birthday got moved up 10 days.

You can read more on the Julian and Gregorian calendars here.

The winter solstice in the northern hemisphere was two days ago: December 21, 2025. And in 1642, using the Gregorian calendar, the solstice was also on December 21. But in England, in 1642, people would have said the solstice occurred on December 11, because the civil calendar was 10 days ahead of the astronomical calendar.

The post When was Newton born? first appeared on John D. Cook.

Mason, Dixon, and Latitude

John — Tue, 23 Dec 2025 15:57:34 +0000

A few weeks ago I mentioned that I was reading Stephen Ambrose’s account of the Lewis & Clark expedition and wrote a post about their astronomical measurements. James Campbell left a comment recommending Edwin Danson’s book [1] on the history of the Mason-Dixon line. I ordered the book, and now that work has slowed down for Christmas I have had the time to open it.

In addition to determining their eponymous line, surveyors Charles Mason and Jeremiah Dixon were also the first to measure a degree of latitude in 1767.

What exactly did they measure? We’ll get to that, but first we need some background.

The shape of the Earth

To first approximation a degree of latitude is simply 1/360th of the Earth’s circumference, but Mason and Dixon were more accurate than that. Isaac Newton (1643–1727) deduced that our planet was not a perfect sphere but rather an oblate spheroid. The best measurement in Mason and Dixon’s time was that the Earth’s semi-major axis was 6,397,300 meters with flattening 1/f = 216.8.

It’s a bit of an anachronism to describe the distance in meters since the meter was defined in 1791. The meter was originally defined as one ten-millionth of the distance from the equator to the North Pole along a great circle through Paris.

What exactly is a degree of latitude?

If the Earth were a perfect sphere, a degree of latitude would be 1/360th of its circumference. Using the original definition of the meter, this would be exactly 10,000,000/360 meters. But because the Earth is not a perfect sphere, each degree of latitude has a slightly different length. To put it another way, the length of a degree of latitude varies by latitude.

Another complication due to the flattening of the Earth is that there are multiple ways to define latitude. The two most common are geocentric and geodetic. The geocentric latitude of a point P on the Earth’s surface is the angle between the equatorial plane and a line between the center of the earth and P. The geodetic latitude (a.k.a. geographic latitude) of P is the angle between the equatorial plane and a line perpendicular to the Earth’s surface at P. More on the difference between geocentric and geodetic latitude here.

What did Mason and Dixon measure?

Since the length of a degree of latitude varies, we need to say at what latitude they measured the length of a degree. In short, they measured the length of a degree near what we now know as the Mason-Dixon line, the border between Pennsylvania and Maryland.

To be more precise, the starting point was Stargazer’s Stone, a stone placed by Mason and Dixon on John Harland’s farm near Embreeville, Pennsylvania, to a point about a degree and a half due south near what is now Delmar, a town on the Delaware / Maryland border.

I’ve had some difficulty determining how accurate Mason and Dixon were. Some sources I’ve found are obviously wrong. I haven’t verified this, but it seems Mason and Dixon overestimated the length of a degree of latitude at their location by only 465.55 ft or about 0.13%, a remarkable feat given the technology of their day.

[1] Edwin Danson. Drawing The Line: How Mason and Dixon Surveyed the Most Famous Border in America. John Wiley & Sons. 2001.

The post Mason, Dixon, and Latitude first appeared on John D. Cook.

Bowie integrator and the nonlinear pendulum

John — Tue, 23 Dec 2025 10:28:18 +0000

I recently learned of Bowie’s numerical method for solving ordinary differential equations of the form

y″ = f(y)

via Alex Scarazzini’s masters thesis [1].

The only reference I’ve been able to find for the method, other than [1], is the NASA Orbital Flight Handbook from 1963. The handbook describes the method as “a method employed by C. Bowie and incorporated in many Martin programs” and says nothing more about its origin.

Martin Company

What does it mean by “Martin programs”? The first line of the foreword of the manual says

This handbook has been produced by the Space Systems Division of the Martin Company under Contract NAS8-S03l with the George C. Marshall Space Flight Center of the National Aeronautics and Space Administration.

The Martin Company was the Glenn L. Martin Company, which became Martin Marietta after merging with American-Marietta Corporation in 1961. The handbook was written after the merger but used the older name. Martin Marietta would go on to become Lockheed Martin in 1995.

Bowie’s method was used “in many Martin programs” and yet is practically unknown in academic circles. Scarazzini’s thesis shows the method works well for his problem.

Nonlinear pendulum

My first thought when I saw the form of differential equations Bowie’s method solves was the nonlinear pendulum equation

y″ = − sin(y)

where the initial displacement y(0) is too large for the approximation sin θ ≈ θ to be sufficiently accurate. I wrote some Python code to try out Bowie’s method on this equation.

import numpy as np

N = 100
y  = np.zeros(N)
yp = np.zeros(N) # y'

y[0] = 1
yp[0] = 0

T = 4*ellipk(np.sin(y[0]/2)**2)
h = T/N

f   = lambda x: -np.sin(x)
fp  = lambda x: -np.cos(x) # f'
fpp = lambda x:  np.sin(x) # f''

for n in range(0, N-1):
    y[n+1] = y[n] + h*yp[n] + 0.5*h**2*f(y[n]) + \
              (h**3/6)*fp(y[n])*yp[n] + \
              (h**4/24)*(fpp(yp[n])**2 + fp(y[n])*f(y[n]))
    yp[n+1] = yp[n] + h*f(y[n]) + 0.5*h**2*fp(y[n])*yp[n] + \
              (h**3/6)*(fpp(yp[n])**2 + fp(y[n])*f(y[n]))

Here’s a graph of the numerical solution.

The solution looks like a cosine, but it isn’t exactly. As I explain here,

The solution to the nonlinear pendulum equation is also periodic, though the solution is a combination of Jacobi functions rather than a combination of trig functions. The difference between the two solutions is small when θ₀ is small, but becomes more significant as θ₀ increases.

The difference in the periods is more evident than the difference in shape for the two waves. The period of the nonlinear solution is longer than that of the linearized solution.

That’s why the period T in the code is not

2π = 6.28

but rather

4 K(sin² θ₀/2) = 6.70.

You’ll also see the period of the nonlinear pendulum given as 4 K(sin θ₀/2). As pointed out in the article linked above,

There are two conventions for defining the complete elliptic integral of the first kind. SciPy uses a convention for K that requires us to square the argument.

[1] Alex Scarazzini.3D Visualization of a Schwarzschild Black Hole Environment. University of Bern. August 2025.

The post Bowie integrator and the nonlinear pendulum first appeared on John D. Cook.

Trying to fit exponential data

John — Mon, 22 Dec 2025 15:00:06 +0000

The first difficulty in trying to fit an exponential distribution to data is that the data may not follow an exponential distribution. Nothing grows exponentially forever. Eventually growth slows down. The simplest way growth can slow down is to follow a logistic curve, but fitting a logistic curve has its own problems, as detailed in the previous post.

Suppose you are convinced that whatever you’re wanting to model follows an exponential curve, at least over the time scale that you’re interested in. This is easier to fit than a logistic curve. If you take the logarithm of the data, you now have a linear regression problem. Linear regression is numerically well-behaved and has been thoroughly explored.

There is a catch, however. When you extrapolate a linear regression, your uncertainly region flares out as you go from observed data to linear predictions based on the observed data. Your uncertainty grows linearly. But remember that we’re not working with the data per se; we’re working with the logarithm of the data. So on the original scale, the uncertainty flares out exponentially.

The post Trying to fit exponential data first appeared on John D. Cook.

Trying to fit a logistic curve

John — Sat, 20 Dec 2025 21:58:48 +0000

A logistic curve, sometimes called an S curve, looks different in different regions. Like the proverbial blind men feeling different parts of an elephant, people looking at different segments of the curve could come to very different impressions of the full picture.

It’s naive to look at the left end and assume the curve will grow exponentially forever, even if the data are statistically indistinguishable from exponential growth.

A slightly less naive approach is to look at the left end, assume logistic growth, and try to infer the parameters of the logistic curve. In the image above, you may be able to forecast the asymptotic value if you have data up to time t = 2, but it would be hopeless to do so with only data up to time t = −2. (This post was motivated by seeing someone trying to extrapolate a logistic curve from just its left tail.)

Suppose you know with absolute certainty that your data have the form

where ε is some small amount of measurement error. The world is not obligated follow a simple mathematical model, or any mathematical model for that matter, but for this post we will assume that for some inexplicable reason you know the future follows a logistic curve; the only question is what the parameters are.

Furthermore, we only care about fitting the a parameter. That is, we only want to predict the asymptotic value of the curve. This is easier than trying to fit the b or c parameters.

Simulation experiment

I generated 16 random t values between −5 and −2, plugged them into the logistic function with parameters a = 1, b = 1, and c = 0, then added Gaussian noise with standard deviation 0.05.

My intention was to do this 1000 times and report the range of fitted values for a. However, the software I was using (scipy.optimize.curve_fit) failed to converge. Instead it returned the following error message.

RuntimeError: Optimal parameters not found: Number of calls to function has reached maxfev = 800.

When you see a message like that, your first response is probably to tweak the code so that it converges. Sometimes that’s the right thing to do, but often such numerical difficulties are trying to tell you that you’re solving the wrong problem.

When I generated points between −5 and 0, the curve_fit algorithm still failed to converge.

When I generated points between −5 and 2, the fitting algorithm converged. The range of a values was from 0.8254 to 1.6965.

When I generated points between −5 and 3, the range of a values was from 0.9039 to 1.1815.

Increasing the number of generated points did not change whether the curve fitting method converge, though it did result in a smaller range of fitted parameter values when it did converge.

I said we’re only interested in fitting the a parameter. I looked at the ranges of the other parameters as well, and as expected, they had a wider range of values.

So in summary, fitting a logistic curve with data only on the left side of the curve, to the left of the inflection point in the middle, may completely fail or give you results with wide error estimates. And it’s better to have a few points spread out through the domain of the function than to have a large number of points only on one end.

The post Trying to fit a logistic curve first appeared on John D. Cook.

Regular expressions that cross lines

John — Fri, 19 Dec 2025 14:55:01 +0000

One of the fiddly parts of regular expressions is how to handle line breaks. Should regular expression searches be applied one line at a time, or should an entire file be treated as a single line?

This morning I was trying to track down a LaTeX file that said “discussed in the Section” rather than simply “discussed in Section.” I wanted to search on “the Section” to see whether I had a similar error in other files.

Line breaks don’t matter to LaTeX [1], so “the” could be at the end of one line and “Section” at the beginning of another. I found what I was after by using

    grep -Pzo "the\s+Section" foo.tex

Here -P tells grep to use Perl regular expressions. That’s not necessary here, but I imprinted on Perl regular expressions long ago, and I use PCRE (Perl compatible regular expressions) whenever possible so I don’t have to remember the annoying little syntax differences between various regex implementations.

The -z option says to treat the entire file as one long string. This eliminates the line break issue.

The -o option says to output only what the regular expression matches. Otherwise grep will return the matching line. Ordinarily that wouldn’t be so bad, but because of the -z option, the matching line is the entire file.

The \s+ characters between the and Section represent one or more whitespace characters, such as a space or a newline.

The -P flag is a Gnu feature, so it works on Linux. But macOS ships with BSD-derived versions of its utilities, and its version grep does not support the -P option. On my Macbook I have ggrep mapped to the Gnu version of grep.

Another option is to use ripgrep rather than grep. It uses Perl-like regular expressions, and so there is no need for anything like the -P flag. The analog of -z in ripgrep is -U, so the counterpart of the command above would be

    ripgrep -Uo "the\s+Section" foo.tex

Usually regular expression searches are so fast that execution time doesn’t matter. But when it does matter, ripgrep can be an order of magnitude faster than grep.

[1] LaTeX decides how to break lines in the output independent of line breaks in the input. This allows you to arrange the source file logically rather than aesthetically.

The post Regular expressions that cross lines first appeared on John D. Cook.

Multiples with no large digits

John — Tue, 16 Dec 2025 15:49:52 +0000

Here’s a curious theorem I stumbled across recently [1]. Take an integer N which is not a multiple of 10. Then there is some multiple of N which only contains the digits 1, 2, 3, 4, and 5.

For example, my business phone number 8324228646 has a couple 8s and a couple 6s. But

6312 × 8324228646 = 52542531213552

which contains only digits 1 through 5.

For a general base b, let p be the smallest prime factor of b. Then for every integer N that is not a multiple of b, there is some multiple of N whose base b representation contains only the digits 1, 2, 3, …, b/p.

This means that for every number N that is not a multiple of 16, there is some k such that the hex representation of kN contains only the digits 1 through 8. For example, if we take the magic number at the beginning of every Java class file, 0xCAFEBABE, we find

1341 × CAFEBABE_hex = 42758583546_hex.

In the examples above, we’re looking for multiple containing only half the possible digits. If the largest prime dividing the base is larger than 2 then we can find a multiples with digits in a smaller range. For example, in base 35 we can find a multiple containing only the digits 1 through 7.

[1] Gregory Galperin and Michael Reid. Multiples Without Large Digits. The American Mathematical Monthly, Vol. 126, No. 10 (December 2019), pp. 950-951.

The post Multiples with no large digits first appeared on John D. Cook.

Golden iteration

John — Fri, 12 Dec 2025 15:31:14 +0000

The expression

converges to the golden ratio φ. Another way to say this is that the sequence defined by x₀ = 1 and

for n > 0 converges to φ. This post will be about how it converges.

I wrote a little script to look at the error in approximating φ by x_n and noticed that the error is about three times smaller at each step. Here’s why that observation was correct.

The ratio of the error at one step to the error at the previous step is

If x = φ + ε the expression above becomes

when you expand as a Taylor series in ε centered at 0. This says the error multiplied by a factor of about

at each step. The next term in the Taylor series is approximately −0.03ε, so the exact rate of convergence is a slightly faster at first, but essentially the error is multiplied by 0.309 at each iteration.

The post Golden iteration first appeared on John D. Cook.

Just change the key

John — Thu, 11 Dec 2025 18:37:41 +0000

When I was a kid, I suppose sometime in my early teens, I was interested in music theory, but I couldn’t play piano. One time I asked a lady who played piano at our church to play a piece of sheet music for me so I could hear how it sounded. The music was in the key of A, but she played it in A♭. She didn’t say she was going to change the key, but I could tell from looking at her hands that she had.

I was shocked by the audacity of changing the music to be what you wanted it to be rather than playing what was on the page. I was in band, and there you certainly don’t decide unilaterally that you’re going to play in a different key!

In retrospect what the pianist was doing makes sense. Hymns are very often in the key of A♭. One reason is it’s a comfortable key for SATB singing. Another is that if many hymns are in the same key, that makes it easy to go from one directly into another. If a traditional hymn is not in A♭, it’s probably in a key with flats, like B♭ or D♭. (Contemporary church music is often in keys with sharps because guitarists like open strings, which leads to keys like A or E.)

The pianist wasn’t a great musician, but she was good enough. Picking her key was a coping mechanism that worked well. Unless someone in the congregation has perfect pitch, you can change a song from the key of D to the key of D♭ and nobody will know.

There’s something to be said for clever coping mechanisms, especially if they’re declared, “You asked for A. Is it OK if I give you B?” It’s better than saying “Sorry, I can’t help you.”

The post Just change the key first appeared on John D. Cook.

Rolling n-sided dice to get at least n

John — Wed, 10 Dec 2025 15:08:20 +0000

Say you have a common 6-sided die and need to roll it until the sum of your rolls is at least 6. How many times would you need to roll?

If you had a 20-sided die and you need to roll for a sum of at least 20, would that take more rolls or fewer rolls on average?

According to [1], the expected number of rolls of an n-sided dice for the sum of the rolls to be n or more equals

So for a 6-sided die, the expected number of rolls is (7/6)⁵ = 2.1614.

For a 20-sided die, the expected number of rolls is (21/20)¹⁹ = 2.5270.

The expected number of rolls is an increasing function of n, and it converges to e.

Here’s a little simulation script for the result above.

from numpy.random import randint

def game(n):
    s = 0
    i = 0
    while s < n:
        s += randint(1, n+1)
        i += 1
    return i

N = 1_000_000
s = 0
n = 20
for _ in range(N):
    s += game(n)
print(s / N)

This produced 2.5273.

[1] Enrique Treviño. Expected Number of Dice Rolls for the Sum to Reach n. American Mathematical Monthly, Vol 127, No. 3 (March 2020), p. 257.

The post Rolling n-sided dice to get at least n first appeared on John D. Cook.

Weak derivatives

John — Wed, 10 Dec 2025 14:33:53 +0000

There are numerous memes floating around with the words “Being weak is nothing to be ashamed of; staying weak is.” Or some variation. I thought about this meme in the context of weak derivatives.

The last couple posts have talked about distributions, also called generalized functions. The delta function, for example, is not actually a function but a generalized function, a linear functional on a space of test functions.

Distribution theory lets you take derivatives of functions that don’t have a derivative in the classical sense. View the function as a regular distribution, take its derivative as a distribution, and if this derivative is a regular distribution, that function is called a weak derivative of the original function.

You can use distribution theory to complete a space of functions analogous to how the real numbers complete the rational numbers.

To show that an equation has a rational solution, you might first show that it has a real solution, then show that the real solution is in fact a rational. To state the strategy more abstractly, to find a solution in a small space, you first look for solutions in a larger space where solutions are easier to find. Then you see whether the solution you found lies in the smaller space.

This is the modern strategy for studying differential equations. You first show that a differential equation has a solution in a weak sense, then if possible prove a regularity result that shows the solution is a classical solution. There’s no shame in finding a weak solution. But from a classical perspective, there’s shame in stopping there.

The post Weak derivatives first appeared on John D. Cook.

Fourier transform of a Fourier series

John — Mon, 08 Dec 2025 15:00:37 +0000

The previous post showed how we can take the Fourier transform of functions that don’t have a Fourier transform in the classical sense.

The classical definition of the Fourier transform of a function f requires the integral of |f| over the real line to be finite. This implies f(x) must approach zero as x goes to ∞ and −∞. A constant function won’t do, and yet we got around that in the previous post. Distribution theory even lets you take the Fourier transform of functions that grow as their arguments go off to infinity, as long as they don’t grow too fast, i.e. like a polynomial but not like an exponential.

In this post we want to take the Fourier transform of functions like sine and cosine. If you read that sentence as saying Fourier series, you have the right instinct for classical analysis: you take the Fourier series of periodic functions, not the Fourier transform. But with distribution theory you can take the Fourier transform, unifying Fourier series and Fourier transforms.

For this post I’ll be defining the classical Fourier transform using the convention

and generalizing this definition to distributions as in the previous post.

With this convention, the Fourier transform of 1 is δ, and the Fourier transform of δ is 2π.

One can show that the Fourier transform of a cosine is a sum of delta functions, and the Fourier transform of a sine is a difference of delta functions.

It follows that the Fourier transform of a Fourier series is a sum of delta functions shifted by integers. In fact, if you convert the Fourier series to complex form, the coefficients of the deltas are exactly the Fourier series coefficients.

The post Fourier transform of a Fourier series first appeared on John D. Cook.

Fourier transform of a flat line

John — Mon, 08 Dec 2025 12:30:58 +0000

Suppose you have a constant function f(x) = c. What is the Fourier transform of f?

We will show why the direct approach doesn’t work, give two hand-wavy approaches, and a rigorous definition.

Direct approach

Unfortunately there are multiple conventions for defining the Fourier transform.

For this post, we will define the Fourier transform of a function f to be

If f(x) = c then the integral diverges unless c = 0.

Heuristic approach

The more concentrated a function is in the time domain, the more it spreads out in the frequency domain. And the more spread out a function is in the time domain, the more concentrated it is in the frequency domain. If you think this sounds like the Heisenberg uncertainty principle, you’re right: there is a connection.

A constant function is as spread out as possible, so it seems that its Fourier transform should be as concentrated as possible, i.e. a delta function. The delta function isn’t literally a function, but it can be made rigorous. More on that below.

Gaussian density approach

The Fourier transform of the Gaussian function exp(−x²/2) is the same function, i.e. the Gaussian function is a fixed point of the Fourier transform. More generally, the Fourier transform of the density function for a normal random variable with standard deviation σ is the density function for a normal random variable with standard deviation 1/σ.

As σ gets larger, the density becomes flatter. So we could think of our function f(x) = c as some multiple of a Gaussian density in the limit as σ goes to infinity. The Fourier transform is then some multiple of a Gaussian density with σ = 0, i.e. a point mass or delta function.

Rigorous approach

If f and φ are two well-behaved functions then

In other words, we can move the “hat” representing the Fourier transform from one function to the other. The equation above is a theorem when f and φ are nice functions. We can use it to motivate a definition when the function f is not so nice but the function φ is very nice. Specifically, we will assume φ is an infinitely differentiable function that goes to zero at infinity faster than any polynomial.

Given a Lebesgue integrable function f, we can think of f as a linear operator via the map

More generally, we can define a distribution to be any continuous [1] linear operator from the space of test functions to the complex numbers. A distribution that can be defined by integral as above is called a regular distribution. When we say we’re taking the Fourier transform of the constant function f(x) = c, we’re actually taking the Fourier transform of the regular distribution associated with f. [2]

Not all distributions are regular. The delta “function” δ(x) is a distribution that acts on test functions by evaluating them at 0.

We define the Fourier transform of (the regular distribution associated with) a function f to be the distribution whose action on a test function φ equals the integral of the product of f and the Fourier transform of φ. When a function is Lebesgue integrable, this definition matches the classical definition.

With this definition, we can calculate that the Fourier transform of a constant function c equals

Note that with a different convention for defining the Fourier transform, you might get 2π c δ or just c δ.

An advantage of the convention that we’re using is that the Fourier transform of the Fourier transform of f(x) is f(−x) and not some multiple of f(−x). This implies that the Fourier transform of √2π δ is 1 and so the Fourier transform of δ is 1/√2π.

[1] To define continuity we need to put a topology on the space of test functions. That’s too much for this post.

[2] The constant function doesn’t have a finite integral, but its product with a test function does because test functions decay rapidly. In fact, even the product of a polynomial with a test function is integrable

The post Fourier transform of a flat line first appeared on John D. Cook.

Obscuring P2P nodes with Dandelion

John — Mon, 08 Dec 2025 12:11:44 +0000

The weakest link in the privacy of cryptocurrency transactions is often outside the blockchain. There are technologies such as stealth addresses and subaddresses to try to thwart attempts to link transactions to individuals. They do a good job of anonymizing transaction data, but the weak link may be metadata, as is often the case.

Cryptocurrency nodes circulate transaction data using a peer-to-peer network. An entity running multiple nodes can compare when data arrived at each of its nodes and triangulate to infer which node first sent a set of transactions. The Dandelion protocol, and its refinement Dandelion++, aims to mitigate this risk. Dandelion++ is currently used in Monero and a few other coins; other cryptocurrencies have considered or are considering using it.

The idea behind the Dandelion protocol is to have a “stalk” period and a “diffusion” period. Imagine data working up the stalk of a dandelion plant before diffusing like seeds in the wind. The usual P2P process is analogous to simply blowing on the seed head [1].

During the stalk period, information travels from one node to one node. Then after some number of hops, the diffusion process begins; the final node in the stalk period diffuses the information to all its peers. An observer with substantial but not complete visibility of the network may be able to determine which node initiated the diffusion, but maybe not the node at the other end of the stem.

A natural question is how this differs from something like Tor. In a nutshell, Tor offers identity protection before you enter a P2P network, and Dandelion offers identity protection inside the P2P network.

For more details, see the original paper on Dandelion [2].

[1] The original paper on Dandelion uses a dandelion seed as the metaphor for the protocol. “The name ‘dandelion spreading’ reflects the spreading pattern’s resemblance to a dandelion seed head and refers to the diagram below. However, other sources refer to the stalk and head of the dandelion plant, not just a single seed. Both mental images work since the plant has a slightly fractal structure with a single seed looking something like the plant.

[2] Shaileshh Bojja Venkatakrishnan, Giulia Fanti, Pramod Viswanath. Dandelion: Redesigning the Bitcoin Network for Anonymity. Proceedings of the ACM on Measurement and Analysis of Computing Systems, Volume 1, Issue 1 Article No.: 22, Pages 1–34. Available here: https://doi.org/10.1145/3084459.

The post Obscuring P2P nodes with Dandelion first appeared on John D. Cook.

What is a Pedersen commitment?

John — Sat, 06 Dec 2025 16:36:37 +0000

I’m taking a break from my series on celestial navigation. The previous posts give the basics, but I haven’t thought of a way to go further that I’m satisfied with. So now for something completely different: Pedersen commitments.

Pedersen commitments are a building block of zero knowledge proofs (ZKP), and they give an opportunity to look at a couple other interesting topics: nothing-up-my-sleeve constructions and homomorphic encryption.

A Pedersen commitment to a value v takes a random number x and two generators of an elliptic curve, G and H, and returns

C = vG + xH.

The significance of C is that it appears to be a random number to the recipient, but the sender who calculated it can later show that it was computed from v and x. C is called a commitment to the value v because the sender cannot later say that C was computed from a different v and a different x.

Mathematical details

The addition in

C = vG + xH

is carried out on an elliptic curve, such as Ed25519 in the case of Monero. Multiplication is defined by repeated addition, though it’s not computed that way [1]. G and H are not just points on the elliptic curve but points in a large, prime-order subgroup of the elliptic curve.

Because the value x is random, the possible values of C are uniformly distributed on the curve, and so someone observing C learns nothing about v. For that reason x is called a blinding factor.

The difficulty of the discrete logarithm problem insures that it is impractical come up with different values v‘ and x‘ such that

v G + x H = v‘ G + x‘ H.

This depends on two assumptions.

The first assumption is that the discrete logarithm problem is hard to solve given current algorithms and hardware. The prevailing opinion is that it is unlikely anyone will come up with an efficient algorithm for solving the discrete logarithm problem on current hardware. However, Shor’s algorithm could solve the discrete logarithm problem efficiently if and when a practical, large-scale quantum computer is created.

The second assumption is that the generator H was chosen at random and not calculated to be a backdoor.

How to make and use a backdoor

Because G and H are members of the same prime-order (i.e. cyclic) group, there exists some integer h such

H = hG

If the generator H was randomly selected, nobody knows h and nobody can calculate it. But if H was calculated by first selecting h and multiplying hG then there is a backdoor.

Now

C = vG + xH = vG + x(hG) = (v + xh)G.

If you know h, you can pick a new v‘ and solve for x‘ such that

v + xh = v‘ + x‘ h.

That would mean that in the context of a cryptocurrency that uses Pedersen commitments, such as Monero or the Liquid Network on top of Bitcoin, you could initially commit to spending v and later claimed that you committed to spending v‘.

Note that solving for x‘ requires modular arithmetic, not solving the discrete logarithm problem.

How to prove no backdoor

The way to prove that the generator H was chosen in good faith is to be transparent about how it was created. In practice this means using some sort of cryptographic hash function. For example, Bulletproofs hashed “bulletproof_g” and “bulletproof_h” to create its values of G and H. Bulletproofs require multiple values of G and H and so consecutive integers were concatenated to the strings before hashing.

Reversing a cryptographic hash like SHA256 is impractical, even assuming you have a quantum computer, and so it is extremely unlikely that there is a backdoor when the generators were created by hashing a natural string.

It’s said that Pedersen commitments do not require a trusted setup. That’s true in spirit, but more precisely they require a transparent setup that is easy to trust.

Homomorphic encryption

The function

C: (v, x) ↦ vG + xH

is a group homomorphism from pairs of integers to the subgroup generated by G and H. This means that

C(v, x) + C(v‘, x‘) = C(v + v‘, x + x‘)

or in other words, you can combine multiple commitments into a single commitment. The sum of a commitment to (v, x) and a commitment to (v‘, x‘) is a commitment to (v + v‘, x + x‘).

[1] In practice the number x is enormous, say on the order of the number of points on the elliptic curve, and so software does not add H to itself x times. Instead it uses a process analogous to fast exponentiation. In fact, if you write the group operation multiplicatively rather than additively, it is exactly fast exponentiation.

The post What is a Pedersen commitment? first appeared on John D. Cook.

Solving spherical triangles

John — Thu, 04 Dec 2025 18:51:52 +0000

This post is a side quest in the series on navigating by the stars. It expands on a footnote in the previous post.

There are six pieces of information associated with a spherical triangle: three sides and three angles. I said in the previous post that given three out of these six quantities you could solve for the other three. Then I dropped a footnote saying sometimes the missing quantities are uniquely determined but sometimes there are two solutions and you need more data to uniquely determine a solution.

Todhunter’s textbook on spherical trig gives a thorough account of how to solve spherical triangles under all possible cases. The first edition of the book came out in 1859. A group of volunteers typeset the book in TeX. Project Gutenberg hosts the PDF version of the book and the TeX source.

I don’t want to duplicate Todhunter’s work here. Instead, I want to summarize when solutions are or are not unique, and make comparisons with plane triangles along the way.

SSS and AAA

The easiest cases to describe are all sides or all angles. Given three sides of a spherical triangle (SSS), you can solve for the angles, as with a plane triangle. Also, given three angles (AAA) you can solve for the remaining sides of a spherical triangle, unlike a plane triangle.

SAS and SSA

When you’re given two sides and an angle, there is a unique solution if the angle is between the two sides (SAS), but there may be two solutions if the angle is opposite one of the sides (SSA). This is the same for spherical and plane triangles.

There could be even more than two solutions in the spherical case. Consider a triangle with one vertex at the North Pole and two vertices on the equator. Two sides are specified, running from the pole to the equator, and the angles at the equator are specified—both are right angles—but the side of the triangle on the equator could be any length.

ASA and AAS

When you’re given two angles and a side, there is a unique solution if the side is common to the two angles (ASA).

If the side is opposite one of the angles (AAS), there may be two solutions to a spherical triangle, but only one solution to a plane triangle. This is because two angles uniquely determine the third angle in a plane triangle, but not in a spherical triangle.

The example above of a triangle with one vertex at the pole and two on the equator also shows that an AAS problem could have a continuum of solutions.

Summary

Note that spherical triangles have a symmetry that plane triangles don’t: the spherical column above remains unchanged if you swap S’s and A’s. This is an example of duality in spherical geometry.

The post Solving spherical triangles first appeared on John D. Cook.

The Navigational Triangle

John — Thu, 04 Dec 2025 16:44:23 +0000

The previous post introduced the idea of finding your location by sighting a star. There is some point on Earth that is directly underneath the star at any point in time, and that location is called the star’s GP (geographic position). That is one vertex of the navigational triangle. The other two vertices are your position and the North Pole.

Unless you’re at Santa’s workshop and observing a star nearly directly overhead, the navigational triangle is a big triangle, so big that you need to use spherical geometry rather than plane geometry. We will assume the Earth is a sphere [1].

Let a be the side running from your position to the GP. In the terminology of the previous post a is the radius of the line of position (LOP).

Let b be the side running from the GP to the North Pole. This is the GP’s lo-latitude, the complement of latitude.

Let c be the side running from your location to the North Pole. This is your co-latitude.

Let A, B, and C be the angles opposite a, b, and c respectively. The angle A is known as the local hour angle (LHA) because it is proportional to the time difference between noon at your location and noon at the GP.

Given three items from the set {a, b, c, A, B, C} you can solve for the other three [2]. Note that one possibility is knowing the three angles. This is where spherical geometry differs from plane geometry: you can’t have spherical triangles that are similar but not congruent because the triangle excess determines the area.

If you know the current time, you can look up the GP coordinates in a table. The complement of the GP’s latitude is the side b.

Also from the current time you can determine your longitude, and from that you can find the LHA (angle A).

As described in the previous post, the altitude of the star, along with its GP, determines the LOP. From the LOP you can determine the arc between you and the GP, i.e. side a. We haven’t said how you could determine a, only that you could.

If you know two sides (in our case a and b) and the angle opposite one of the sides (in our case A) you can solve for the rest.

Adding detail

This post is more detailed than the previous, but still talks about what can be calculated but now how. We’re adding detail as the series progresses.

To motivate future posts, note that just because something can in theory be computed from an equation, that doesn’t mean it’s best to use that equation. Maybe the equation is sensitive to measurement error, or is numerically unstable, or is hard to calculate by hand.

Since we’re talking about navigating by the stars rather than GPS, we’re implicitly assuming that you’re using pencil and paper because for some reason you can’t use GPS.

[1] To first approximation, the Earth is a sphere. To second approximation, it’s an oblate spheroid. If you want to get into even more detail, it’s not exactly an oblate spheroid. How much difference does all this make? See this post.

[2] In some cases there are two solutions for one of the missing elements and you’ll need to use additional information, such as your approximate location, to rule out one of the possibilities. More on when solutions are unique here.

The post The Navigational Triangle first appeared on John D. Cook.

Line of position (LOP)

John — Thu, 04 Dec 2025 12:42:34 +0000

The previous post touched on how Lewis and Clark recorded celestial observations so that the data could be turned into coordinates after they returned from their expedition. I intend to write a series of posts about celestial navigation, and this post will discuss one fundamental topic: line of position (LOP).

Pick a star that you can observe [1]. At any particular time, there is exactly one point on the Earth’s surface directly under the star, the point where a line between the center of the Earth and the star crosses the Earth’s surface. This point is called the geographical position (GP) of the star.

This GP can be predicted and tabulated. If you happen to be standing at the GP, and know what time it is, these tables will tell your position. Most likely you’re not going to be standing directly under the star, and so it will appear to you as having some deviation from vertical. The star would appear at the same angle from vertical for ring of observers. This ring is called the line of position (LOP).

The LOP is a “small circle” in a technical sense. A great circle is the intersection of the Earth’s surface with a plane through the Earth’s center, like a line of longitude. A small circle is the intersection of the surface with a plane that does not pass through the center, like a line of latitude.

The LOP is a small circle only in contrast to a great circle. In fact, it’s typically quite large, so large that it matters that it’s not in the plane of the GP. You have to think of it as a slice through a globe, not a circle on a flat map, and therein lies some mathematical complication, a topic for future poss. The center of the LOP is the GP, and the radius of the LOP is an arc. This radius is measured along the Earth’s surface, not as the length of a tunnel.

One observation of a star reduces your set of possible locations to a circle. If you can observe two stars, or the same star at two different times, you know that you’re at the intersection of the two circles. These two circles will intersect in two points, but if you know roughly where you are, you can rule out one of these points and know you’re at the other one.

[1] At the time of the Lewis and Clark expedition, these were the stars of interest for navigation in the northern hemisphere: Antares, Altair, Regulus, Spica, Pollux, Aldebaran, Formalhaut, Alphe, Arietes, and Alpo Pegas. Source: Undaunted Courage, Chapter 9.

The post Line of position (LOP) first appeared on John D. Cook.

Lewis & Clark geolocation

John — Mon, 01 Dec 2025 14:31:47 +0000

I read Undaunted Courage, Stephen Ambrose’s account of the Lewis and Clark expedition, several years ago [1], and now I’m listening to it as an audio book. The first time I read the book I glossed over the accounts of the expedition’s celestial observations. Now I’m more curious about the details.

The most common way to determine one’s location from sextant measurements is Hilaire’s method [2], developed in 1875. But the Lewis and Clark expedition took place between 1804 and 1806. So how did the expedition calculate geolocation from their astronomical measurements? In short, they didn’t. They collected data for others to turn into coordinates later. Ambrose explains

With the sextant, every few minutes he would measure the angular distance between the moon and the target star. The figures obtained could be compared with tables show how those distances appeared at the same clock time in Greenwich. Those tables were too heavy to carry on the expedition, and the work was too time-consuming. Since Lewis’s job was to make the observations and bring them home, he did not try to do the calculations; he and Clark just gathered the figures.

The question remains how someone back in civilization would have calculated coordinates from the observations when the expedition returned. This article by Robert N. Bergantino addresses this question in detail.

Calculating latitude from measurements of the sun was relatively simple. Longitude was more difficult to obtain, especially without an accurate way to measure time. The expedition had a chronometer, the most expensive piece of equipment on the expedition that was accurate enough to determine the relative time between observations, but not accurate enough to determine Greenwich time. A more accurate chronometer would have been too expensive and too fragile to carry on the voyage.

For more on calculating longitude, see Dava Sobel’s book Longitude.

[1] At least 17 years ago. I don’t keep a log of what I read, but I mentioned Undaunted Courage in a blog post from 2008.

[2] More formally known as Marcq Saint-Hilaire’s intercept method.

The post Lewis & Clark geolocation first appeared on John D. Cook.

Zero knowledge proof of compositeness

John — Sat, 29 Nov 2025 16:53:35 +0000

A zero knowledge proof (ZKP) answers a question without revealing anything more than answer. For example, a digital signature proves your possession of a private key without revealing that key.

Here’s another example, one that’s more concrete than a digital signature. Suppose you have a deck of 52 cards, 13 of each of spades, hearts, diamonds, and clubs. If I draw a spade from the deck, I can prove that I drew a spade without showing which card I drew. If I show you that all the hearts, diamonds, and clubs are still in the deck, then you know that the missing card must be a spade.

Composite numbers

You can think of Fermat’s primality test as a zero knowledge proof. For example, I can convince you that the following number is composite without telling you what its factors are.

n = 244948974278317817239218684105179099697841253232749877148554952030873515325678914498692765804485233435199358326742674280590888061039570247306980857239550402418179621896817000856571932268313970451989041

Fermat’s little theorem says that if n is a prime and b is not a multiple of n, then

bⁿ⁻¹ = 1 (mod n).

A number b such that bⁿ⁻¹ ≠ 1 (mod n) is a proof that n is not prime, i.e. n is composite. So, for example, b = 2 is a proof that n above is composite. This can be verified very quickly using Python:

    >>> pow(2, n-1, n)
    10282 ... 4299

I tried the smallest possible base [1] and it worked. In general you may have to try a few bases. And for a few rare numbers (Carmichael numbers) you won’t be able to find a base. But if you do find a base b such that bⁿ⁻¹ is not congruent to 1 mod n, you know with certainty that n is composite.

Prime numbers

The converse of Fermat’s little theorem is false. It can be used to prove a number is not prime, but it cannot prove that a number is prime. But it can be used to show that a number is probably prime. (There’s some subtlety as to what it means for a number to probably be prime. See here.)

Fermat’s little theorem can give you a zero knowledge proof that a number is composite. Can it give you a zero knowledge proof that a number is prime? There are a couple oddities in this question.

First, what would it mean to have a zero knowledge proof that a number is prime? What knowledge are you keeping secret? When you prove that a number is composite, the prime factors are secret (or even unknown), but what’s the secret when you say a number is prime? Strictly speaking a ZKP doesn’t have to keep anything secret, but in practice it always does.

Second, what about the probability of error? Zero knowledge proofs do not have to be infallible. A ZKP can have some negligible probability of error, and usually do.

It’s not part of the definition, but practical ZKPs must be easier to verify than the direct approach to what they prove. So you could have something like a primality certificate that takes far less computation to verify than the computation needed to determine from scratch that a number is prime.

Proving other things

You could think of non-constructive proofs as ZKPs. For example, you could think of the intermediate value theorem as a ZKP: it proves that a function has a zero in an interval without giving you any information about where that zero may be located.

What makes ZKPs interesting in application is that they can prove things of more general interest than mathematical statements [2]. For example, cryptocurrencies can provide ZKPs that accounting constraints hold without revealing the inputs or outputs of the transaction. You could prove that nobody tried to spend a negative amount and that the sum of the inputs equals the sum of the outputs.

[1] You could try b = 1, but then bⁿ⁻¹ is always 1. This example shows that the existence of a base for which bⁿ⁻¹ = 1 (mod n) doesn’t prove anything.

[2] You might object that accounting rules are mathematical statements, and of course they are. But they’re of little interest to mathematicians and of great interest to the parties in a transaction.

The post Zero knowledge proof of compositeness first appeared on John D. Cook.

Monero subaddresses

John — Fri, 28 Nov 2025 21:23:32 +0000

Monero has a way of generating new addresses analogous to the way HD wallets generate new addresses for Bitcoin. In both cases, the recipient’s software can generate new addresses to receive payments that others cannot link back to the recipient.

Monero users have two public/private keys pairs: one for viewing and one for spending. Let K^s and k^s be the public and private spending keys, and let K^v and k^v be the public and private viewing keys. Then the user’s ith subaddress is given by

Here G is a generator for the elliptic curve Ed25519 and H is a hash function. The hash function output and k^v are integers; the public keys, denoted by capital Ks with subscripts and superscripts, are points on Ed25519. The corresponding private keys are

As with hierarchical wallets, the user scans the blockchain to see which of his addresses have received funds.

A user may choose to give a different subaddress for each transaction for added security, or to group transactions for accounting purposes.

Note that in addition to subaddresses, Monero uses stealth addresses. An important difference between subaddresses and stealth addresses is that recipients generate subaddresses, and senders generate stealth addresses. Someone could send you money to the same subaddress twice, failing to create a new stealth address. This is not possible if you give the sender a different subaddress each time.

Janus attacks

The possibility of a Janus attack is a fly in the subaddress ointment. If an attacker suspects that two subaddresses belong to the same wallet, they can confirm this suspicion by sending a transaction to one subaddress and making it look like it came from the other. If the recipient confirms receipt of the funds, they inform the attacker that his suspicion was correct. A cautious user might not confirm receipt of funds, but a cryptocurrency exchange would. You can read more about the details of a Janus attack here.

The post Monero subaddresses first appeared on John D. Cook.

A triangle whose interior angles sum to zero

John — Fri, 28 Nov 2025 17:53:05 +0000

Spherical geometry

In spherical geometry, the interior angles of a triangle add up to more than π. And in fact you can determine the area of a spherical triangle by how much the angle sum exceeds π. On a sphere of radius 1, the area equals the triangle excess

Area = E = interior angle sum − π.

Small triangles have interior angle sum near π. But you could, for example, have a triangle with three right angles: put a vertex on the north pole and two vertices on the equator 90° longitude apart.

Hyperbolic geometry

In hyperbolic geometry, the sum of the interior angles of a triangle is always less than π. In a space with curvature −1, the area equals the triangle defect, the difference between π and the angle sum.

Area = D = π − interior angle sum.

Again small triangles have an interior angle sum near π. Both spherical and hyperbolic geometry are locally Euclidean.

The interior angle sum can be any value less than π, and so as the angle sum goes to 0, the triangle defect, and hence the area, goes to π. Since the minimum angle sum is 0, the maximum area of a triangle is π.

The figure below has interior angle sum 0 and area π in hyperbolic geometry.

Strictly speaking this is an improper triangle because the three hyperbolic lines (i.e. half circles) don’t intersect within the hyperbolic plane per se but at ideal points on the real axis. But you could come as close to this triangle as you like, staying within the hyperbolic plane.

Note that the radii of the (Euclidean) half circles doesn’t change the area. Any three semicircles that intersect on the real line as above make a triangle with the same area. Note also that the triangle has infinite perimeter but finite area.

The post A triangle whose interior angles sum to zero first appeared on John D. Cook.

A circle in the hyperbolic plane

John — Fri, 28 Nov 2025 14:54:46 +0000

Let ℍ be the upper half plane, the set of complex real numbers with positive imaginary part. When we measure distances the way we’ve discussed in the last couple posts, the geometry of ℍ is hyperbolic.

What is a circle of radius r in ℍ? The same as a circle in any geometry: it’s the set of points a fixed distance r from a center. But when you draw a circle using one metric, it may look very different when viewed from the perspective of another metric.

Suppose we put on glasses that gave us a hyperbolic perspective on ℍ, draw a circle of radius r centered at i, then take off the hyperbolic glasses and put on Euclidean glasses. What would our drawing look like?

In the previous post we gave several equivalent expressions for the hyperbolic metric. We’ll use the first one here.

Here the Fraktur letter ℑ stands for imaginary part. So the set of points in a circle of radius r centered at i is

Divide the expression for d(x + iy, i) by 2, apply sinh, and square. This gives us

which is an equation for a Euclidean circle. If we multiply both sides by 4y and complete the square, we find that the center of the circle is (0, cosh(r)) and the radius is sinh(r).

Summary so far

So to recap, if we put on our hyperbolic glasses and draw a circle, then switch out these glasses for Euclidean glasses, the figure we drew again looks like a circle.

To put it another way, a hyperbolic viewer and a Euclidean viewer would agree that a circle has been draw. However, the two viewers would disagree where the center of the circle is located, and they would disagree on the radius.

Both would agree that the center is on the imaginary axis, but the hyperbolic viewer would say the imaginary part of the center is 1 and the Euclidean viewer would say it’s cosh(r). The hyperbolic observer would say the circle has radius r, but the Euclidean observer would say it has radius sinh(r).

Small circles

For small r, the hyperbolic and Euclidean viewpoints nearly agree because

cosh(r) = 1 + O(r²)

and

sinh(r) = r + O(r³)

Big circles

Note that if you asked a Euclidean observer to draw a circle of radius 100, centered at (0, 1), he would say that the circle will extend outside of the half plane. A hyperbolic observer would disagree. From his perspective, the real axis is infinitely far away and so he can draw a circle of any radius centered at any point and stay within the half plane.

Moving circles

Now what if we looked at circles centered somewhere else? The hyperbolic metric is invariant under Möbius transformations, and so in particular it is invariant under

z ↦ x₀ + y₀ z.

This takes a circle with hyperbolic center i to a circle centered at x₀ + i y₀ without changing the hyperbolic radius. The Euclidean center moves from cosh(r) to y₀ cosh(r) and the radius changes from sinh(r) to y₀ sinh(r).

The post A circle in the hyperbolic plane first appeared on John D. Cook.

John D. Cook

In-shuffles and out-shuffles

Perfect and imperfect shuffles

Random shuffles

Perfect shuffles

Coming full circle

Shuffle code

Related posts

Knight’s tour with fewest obtuse angles

More knight’s tour posts

The center of the earth is not straight down

Conversion formulas

Proof

Difference

Related posts

Interesting categories are big

Related posts

Klein bottle

Formal definition

More formality

Escaping 3D

Automation and Validation

Consistency checks

Certificates

Formal methods

Who watches the watchmen?

Who watches the watchers of the watchmen?

When was Newton born?

Mason, Dixon, and Latitude

The shape of the Earth

What exactly is a degree of latitude?

What did Mason and Dixon measure?

Related posts

Bowie integrator and the nonlinear pendulum

Martin Company

Nonlinear pendulum

Related posts

Trying to fit exponential data

Trying to fit a logistic curve

Simulation experiment

Related posts

Regular expressions that cross lines

Related posts

Multiples with no large digits

Golden iteration

Just change the key

Rolling n-sided dice to get at least n

Weak derivatives

Related posts

Fourier transform of a Fourier series

Related posts

Fourier transform of a flat line

Direct approach

Heuristic approach

Gaussian density approach

Rigorous approach

Related posts

Obscuring P2P nodes with Dandelion

Related posts

What is a Pedersen commitment?

Mathematical details

How to make and use a backdoor

How to prove no backdoor

Homomorphic encryption

Related posts

Solving spherical triangles

SSS and AAA

SAS and SSA

ASA and AAS

Summary

The Navigational Triangle

Adding detail

Related posts

Line of position (LOP)

Lewis & Clark geolocation

Related posts

Zero knowledge proof of compositeness

Composite numbers

Prime numbers

Proving other things