John D. Cook

Error in Ramanujan’s approximation for ellipse perimeter

John — Sun, 22 Sep 2024 18:33:34 +0000

Ramanujan discovered an incredibly accurate approximation for the perimeter of an ellipse. This post will illustrate how accurate the approximation is and push its limits.

As with all computations involving ellipses, the error of Ramanujan’s approximation increases as eccentricity increases. But the error increases slowly, asymptotically approaching an upper bound that is remarkably small.

Let a and b be the semi-major and semi-minor axes of an ellipse. Then Ramanujan’s approximation for the perimeter of the ellipse is

where λ = (a − b)/(a + b).

Example

To illustrate how accurate the approximation is, let’s apply it to a very large, highly eccentric ellipse: the orbit of Sedna, a dwarf planet discovered in 2003. Sedna has the most elliptical orbit of any of the dwarf planets, 0.8496 compared to 0.2488 for Pluto. Sedna is also about 12 times further from the sun than Pluto.

The semi-major axis of Sedna’s orbit is 76 billion kilometers. Eccentricity e corresponds to aspect ratio √(1 − e²) (see this post), and so the semi-minor axis is about 40 billion kilometers. Let’s assume the orbit of Sedna is perfectly elliptical, which it is not, and that the semi-axes stated above are exact, which they are not. Then the length of Sedna’s orbit is on the order of 366 billion kilometers, and Ramanujan’s approximation has an error of about 53 kilometers.

Error

Here’s a plot of the relative error when b = 1 and a varies.

The error appears to be approaching an asymptote, and in fact it is. The error is bound by 4/π − 14/11 = 0.00051227…., as proved here.

The post Error in Ramanujan’s approximation for ellipse perimeter first appeared on John D. Cook.

The Cauchy distribution’s counter-intuitive behavior

John — Thu, 19 Sep 2024 12:00:46 +0000

Someone with no exposure to probability or statistics likely has an intuitive sense that averaging random variables reduces variance, though they wouldn’t state it in those terms. They might, for example, agree that the average of several test grades gives a better assessment of a student than a single test grade. But data from a Cauchy distribution doesn’t behave this way.

Averages and scaling

If you have four independent random variables, each normally distributed with the same scale parameter σ, then their average is also normally distributed but with scale parameter σ/2.

If you have four independent random variables, each Cauchy distributed with the same scale parameter σ, then their average is also Cauchy distributed but with exact same scale parameter σ.

So the normal distribution matches common intuition, but the Cauchy distribution does not.

In the case of random variables with a normal distribution, the scale parameter σ is also the standard deviation. In the case of random variables with a Cauchy distribution, the scale parameter σ is not the standard deviation because Cauchy random variables don’t have a variance, so they don’t have a standard deviation.

Modeling

Some people object that nothing really follows a Cauchy distribution because the Cauchy distribution has no mean or variance. But nothing really follows a normal distribution either. All probability distributions are idealizations. The question of any probability distribution is whether it adequately captures the aspect of reality it is being used to model.

Mean

Suppose some phenomenon appears to behave like it has a Cauchy distribution, with no mean. Alternately, suppose the phenomenon has a mean, but this mean is so variable that it is impossible to estimate. There’s no practical difference between the two.

Variance

And in the alternate case, suppose there is a finite variance, but the variance is so large that it is impossible to estimate. If you take the average of four observations, the result is still so variable that the variance is impossible to estimate. You’ve cut the theoretical variance in half, but that makes no difference. Again this is practically indistinguishable from a Cauchy distribution.

Truncating

Now suppose you want to tame the Cauchy distribution by throwing out samples with absolute value less than M. Now you have a truncated Cauchy distribution, and it has finite mean and variance.

But how do you choose M? If you don’t have an objective reason to choose a particular value of M, you would hope that your choice doesn’t matter too much. And that would be the case for a thin-tailed probability distribution like the normal, but it’s not true of the Cauchy distribution.

The variance of the truncated distribution will be approximately equal to M, so by choosing M you choose the variance. So if you double your cutoff for outliers that are to be discarded, you approximately double the variance of what’s left. Your choice of M matters a great deal.

The post The Cauchy distribution’s counter-intuitive behavior first appeared on John D. Cook.

Arithmetic, Geometry, Harmony, and Gold

John — Tue, 17 Sep 2024 13:27:02 +0000

I recently ran across a theorem connecting the arithmetic mean, geometric mean, harmonic mean, and the golden ratio. Each of these comes fairly often, and there are elegant connections between them, but I don’t recall seeing all four together in one theorem before.

Here’s the theorem [1]:

The arithmetic, geometric, and harmonic means of two positive real numbers are the lengths of the sides of a right triangle if, and only if, the ratio of the arithmetic to the harmonic mean is the Golden Ratio.

The proof given in [1] is a straight-forward calculation, only slightly longer than the statement of the theorem.

The conclusion of the theorem stops short of saying how to construct the triangle, though this is a simple exercise, which we carry out here.

Given two positive numbers, a and b, the three means are defined as follows.

AM = (a + b)/2
GM = √ab
HM = 2ab/(a + b)

Denote the Golden Ratio by

φ = (1 + √5)/2.

Then the equation AM/HM = φ is equivalent to the quadratic equation

a² + (2 − 4φ)ab + b² = 0.

The means are all homogeneous functions of a and b, i.e. if we multiply a and b by a constant, we multiply the three means by the same constant. Therefore we can set one of the parameters to 1 without loss of generality. Setting b = 1 gives

a² + (2 − 4φ)a + 1 = 0

and so there are two solutions:

a = 2φ − 3

and

a = 2φ + 1.

However, there is in a sense only one solution: the two solutions are reciprocals of each other, reversing the roles of a and b. So while there are two solutions to the quadratic equation, there is only one triangle, up to similarity.

[1] Angelo Di Domenico. The Golden Ratio: The Right Triangle: And the Arithmetic, Geometric, and Harmonic Means. The Mathematical Gazette Vol. 89, No. 515 (July, 2005), p. 261

The post Arithmetic, Geometry, Harmony, and Gold first appeared on John D. Cook.

Ceva, cevians, and Routh’s theorem

John — Sat, 14 Sep 2024 18:30:59 +0000

I keep running into Edward John Routh (1831–1907). He is best known for the Routh-Hurwitz stability criterion but he pops up occasionally elsewhere. The previous post discussed Routh’s mnemonic for moments of inertia and his “stretch” theorem. This post will discuss his triangle theorem.

Before stating Routh’s theorem, we need to say what a cevian is. Giovanni Ceva (1647–1734) was an Italian geometer, best known for Ceva’s theorem, and for a construction in that theorem now known as a cevian.

A cevian is a line from the vertex of a triangle to the opposite side. Draw three cevians by connecting each vertex of a triangle to a point on its opposite side. If the cevians intersect at a point, Ceva’s theorem says something about how the lines divide the sides. If the cevians form a triangle, Routh’s theorem find the area of that triangle.

Routh’s theorem is a generalization of Ceva’s theorem because if the cevians intersect at a common point, the area of the triangle formed is zero, and then Routh’s area equation implies Ceva’s theorem.

Let A, B, and C be the vertices of a triangle and let D, E, and F be the points where their cevians intersect the opposite sides.

Let x, y, and z be the ratios into which each side is divided by the cevians. Specifically let x = FB/AF, y = DC/BD, and z = EA/CE.

Then Routh’s theorem says the relative area of the green triangle formed by the cevians is

If the cevians intersect at a point, the area of the triangle is 0, which implies xyz = 1, which is Ceva’s theorem.

The post Ceva, cevians, and Routh’s theorem first appeared on John D. Cook.

Moments of inertia mnemonic

John — Sat, 14 Sep 2024 16:08:16 +0000

Edward John Routh (1831–1907) came up with a mnemonic for summarizing many formulas for moment of inertia of a solid rotating about an axis through its center of mass.

Routh’s mnemonic is

I = MS / k

where M is the mass of an object, S is the sum of the squares of the semi-axes, and k is 3, 4, or 5 depending on whether the object is rectangular, elliptical, or ellipsoidal respectively.

This post will show how a variety of formulas fit into Routh’s framework.

Rectangular solids

Suppose we have a box whose base is a rectangle of with sides of length a and b, and we’re rotating the box about the vertical axis through the center of mass. The moment of inertia is

I = M(a² + b²) / 12.

The semi-axes have length a/2 and b/2 and so the formula above fits into Routh’s mnemonic with k = 3:

I = M( (a/2)² + (b/2)² ) / 3.

Why did Routh state his theorem in terms of semi-axes rather than axes? Because circles and spheres are typically described in terms of radius, and ellipses are described in terms of semi-axes.

Cylinders

The moment of inertia for a (circular) cylinder about its center is

I = Mr² /2.

From Routh’s perspective, there are two perpendicular axes to the central axis, both of length r. So his mnemonic could calculate the moment of inertia as

I = M(r² + r²)/4

using k = 4.

For an elliptical cylinder, where the ellipse has semi-major axis a and semi-minor axis b, the moment of inertia is

I = M(a² + b²)/4

which reduces to the circular cylinder result when a = b = r.

Spheres and ellipsoids

The moment of inertia of a sphere about a line through its center is

I = 2Mr² / 5.

Again there are two perpendiculars to the line, both of length r, and so we get the result above using Roth’s mnemonic with k = 5.

For an ellipsoid with semi-axes a, b, and c, rotated about the axis corresponding to c, the moment of inertia is

I = M(a² + b²)/5.

Thin rod

The moment of inertia for a thin rod of length L rotated about its center is

I = ML²/3.

This can be derived from the case of a rectangular solid with length L and negligible width.

Note that the formula for moment of inertia of a cylinder does not apply because we are rotating the rod about its middle, not along the axis running the length of the rod.

Routh’s stretch rule

Moving a point mass in a direction parallel to the axis of rotation doesn’t change its moment of inertia. The continuous version of this observation means that we can stretch the shapes without changing their inertia if we stretch them in the direction of the axis of rotation. This means the rules above apply to more general shapes.

Note the that this post refers to physical moments and the link above refers to statistical moments. They’re closely related.

The post Moments of inertia mnemonic first appeared on John D. Cook.

Binomial bound

John — Fri, 13 Sep 2024 14:29:02 +0000

I recently came across an upper bound I hadn’t seen before [1]. Given a binomial coefficient C(r, k), let

n = min(k, r − k)

and

m = r − n.

Then for any ε > 0,

C(n + m, n) ≤ (1 + ε)^{n + m} / εⁿ.

The proof follows quickly from applying the binomial theorem to (1 + ε)^{n + m}.

I could imagine how non-optimal choice of ε could be convenient in some context, it’s natural to want to see how good the upper bound is for the best ε, which works out to be ε = n/m.

A little algebra shows this value of ε leads to

C(n + m, n) ≤ (n + m)^{n + m} / nⁿ m^m.

Note that while the original bound is not symmetric in n and m, the optimal bound is.

Returning to the original notation C(r, k), let’s see how tight the optimal bound is by plotting, as a function of r, the maximum relative error as a k varies.

The maximum relative error, over the range plotted, is very roughly r/10.

[1] Grzegorz Łysik. The ε-binomial inequality. The Mathematical Gazette. Vol. 92, No. 523 (March 2008), pp. 97–99

The post Binomial bound first appeared on John D. Cook.

Separable functions in different contexts

John — Tue, 10 Sep 2024 14:41:11 +0000

I was skimming through the book Mathematical Reflections [1] recently. He was discussing a set of generalizations [2] of the Star of David theorem from combinatorics.

The theorem is so named because if you draw a Star of David by connecting points in Pascal’s triangle then each side corresponds to the vertices of a triangle.

One such theorem was the following.

This theorem also has a geometric interpretation, connecting vertices within Pascal’s triangle.

The authors point out that the binomial coefficient is a separable function of three variables, and that their generalized Star of David theorem is true for any separable function of three variables.

The binomial coefficient C(n, k) is a function of two variables, but you can think of it as a function of three variables: n, k, and n − k. That is

where f(n) = n! and g(k) = 1/k!.

I was surprised to see the term separable function outside of a PDE context. My graduate work was in partial differential equations, and so when I hear separable function my mind goes to separation of variables as a technique for solving PDEs.

Coincidentally, I was looking a separable coordinate systems recently. These are coordinate systems in which the Helmholtz equation can be solved by separable function, i.e. a coordinate system in which the separation of variables technique will work. The Laplacian can take on very different forms in different coordinate systems, and if possible you’d like to choose a coordinate system in which a PDE you care about is separable.

[1] Peter Hilton, Derek Holton, and Jean Pedersen. Mathematical Reflections. Springer, 1996.

[2] Hilton et al refer to a set of theorems as generalizations of the Star of David theorem, but these theorems are not strictly generalizations in the sense that the original theorem is clearly a special case of the generalized theorems. The theorems are related, and I imagine with more effort I could see how to prove the older theorem from the newer ones, but it’s not immediately obvious.

The post Separable functions in different contexts first appeared on John D. Cook.

Body roundness index

John — Sun, 08 Sep 2024 01:12:54 +0000

Body Roundness Index (BRI) is a proposed replacement for Body Mass Index (BMI) [1]. Some studies have found that BRI is a better measure of obesity and a more effective predictor of some of the things BMI is supposed to predict [2].

BMI is based on body mass and height, and so it cannot distinguish a body builder and an obese man if both have the same height and weight. BRI looks at body shape more than body mass.

The basic idea behind Body Roundness Index is to draw an ellipse based on a person’s body and report how close that ellipse is to being a circle. The more a person looks like a circle, higher his BRI. The formula for BRI is

BRI = 364.2 − 365.5 e

where e is the eccentricity of the ellipse.

Now what is this ellipse we’ve been talking about? It’s roughly an ellipse whose major axis runs from head to toe and whose minor axis runs across the person’s width.

There are a couple simplifications here.

You don’t actually measure how wide someone is. You measure the circumference of their waist and find the diameter of a circle with that circumference.
You don’t actually measure how high their waist is [3]. You assume their waist is at exactly half their height.

It’s conventional to describe an ellipse in terms of its semi-major axis a and semi-minor axis b. For a circle, a = b = radius. But in general an ellipse doesn’t have a single radius and a > b. You could think of a and b as being the maximum and minimum radii.

So to fit an ellipse to our idealized model of a person, the major axis, 2a, equals the person’s height.

a = h/2

The minor axis b is the radius of a circle of circumference c where c is the circumference of the person’s waist (or hips [3]).

b = c / 2π

As explained here, eccentricity is computed from a and b by

As an example, consider a man who is 6 foot (72 inches) tall and has a 34 inch waist. Then

a = 36
b = 17/π = 5.4112
e = √(1 − b²/a²) = 0.9886
BRI = 364.2 − 365.5 e = 2.8526

Note that the man’s weight doesn’t enter the calculation. He could be a slim guy weighing 180 pounds or a beefier guy weighing 250 pounds as long as he has a 34 inch waist. In the latter case, the extra mass is upper body muscle and not around his waist.

[1] Diana M. Thomas et al. Relationships Between Body Roundness with Body Fat and Visceral Adipose Tissue Emerging from a New Geometrical Model. Obesity (2013) 21, 2264–2271. doi:10.1002/oby.20408.

[2] Researchers argue over which number to reduce a person to: BMI, BRI, or some other measure. They implicitly agree that a person must be reduced to a number; they just disagree on which number.

[3] Or waist. There are two versions of BRI, one based on waist circumference and one based on hip circumference.

The post Body roundness index first appeared on John D. Cook.

A couple more variations on an ancient theme

John — Sat, 07 Sep 2024 22:32:40 +0000

I’ve written a couple posts on the approximation

by the Indian astronomer Aryabhata (476–550). The approximation is accurate for x in [−π/2, π/2].

The first post collected a Twitter thread about the approximation into a post. The second looked at how far the coefficients in Aryabhata’s approximation are from the optimal approximation as a ratio of quadratics.

This post will answer a couple questions. First, what value of π did Aryabhata have and how would that effect the approximation error? Second, how bad would Aryabhata’s approximation be if we used the approximation π² ≈ 10?

Using Aryabhata’s value of π

Aryabhata knew the value 3.1416 for π. We know this because he said that a circle of diameter 20,000 would have circumference 62,832. We don’t know, but it’s plausible that he knew π to more accuracy and rounded it to the implied value.

Substituting 3.1416 for π changes the sixth decimal of the approximation, but the approximation is good to only three decimal places, so 3.1416 is as good as a more accurate approximation as far as the error in approximating cosine is concerned.

Using π² ≈ 10

Substituting 10 for π² in Aryabhata’s approximation gives an approximation that’s convenient to evaluate by hand.

It’s very accurate for small values of x but the maximum error increases from 0.00163 to 0.01091. Here’s a plot of the error.

The post A couple more variations on an ancient theme first appeared on John D. Cook.

Finding pi in the alphabet

John — Sat, 07 Sep 2024 19:00:16 +0000

Write the letters of the alphabet around a circle, then strike out the letters that are symmetrical about a vertical line. The remaining letters are grouped in clumps of 3, 1, 4, 1, and 6 letters.

I’ve heard that this observation is due to Martin Gardner, but I don’t have a specific reference.

In case you’re interested, here’s the Python script I wrote to make the image above.

    from numpy import *
    import matplotlib.pyplot as plt
    
    for i in range(26):
        letter = chr(ord('A') + i)
        if letter in "AHIMOTUVWXY":
            latex = r"$\equiv\!\!\!\!\!" + letter + "$"
        else:
            latex = f"${letter}$"
        theta = pi/2 - 2*pi*i/26
        pt = (0.5*cos(theta), 0.5*sin(theta))
        plt.plot(pt[0], pt[1], ' ')
        plt.annotate(latex, pt, fontsize="xx-large")
    plt.axis("off")
    plt.gca().set_aspect("equal")
    plt.savefig("alphabet_pi.png")

The post Finding pi in the alphabet first appeared on John D. Cook.

Optimal rational approximation

John — Tue, 03 Sep 2024 12:33:28 +0000

A few days ago I wrote about the approximation

for cosine due to the Indian astronomer Aryabhata (476–550) and gave this plot of the error.

I said that Aryabhata’s approximation is “not quite optimal since the ripples in the error function are not of equal height.” This was an allusion to the equioscillation theorem.

Chebyshev proved that an optimal polynomial approximation has an error function that has equally large positive and negative oscillations. Later this theorem was generalized to rational approximations through a sequence of results by de la Vallée Poussin, Walsh, and Achieser. Here’s the formal statement of the theorem from [1] in the context of real-valued rational approximations with numerators of degree m and denominators of degree n

Theorem 24.1. Equioscillation characterization of best approximants. A real function f in C[−1, 1] has a unique best approximation r*, and a function r is equal to r* if and only if f − r equioscillates between at least m + n + 2 − d extremes where d is the defect of r.

When the theorem says the error equioscillates, it means the error alternately takes on ± its maximum absolute value.

The defect is non-zero when both numerator and denominator have less than maximal degree, which doesn’t concern us here.

We want to find the optimal rational approximation for cosine over the interval [−π/2, π/2]. It doesn’t matter that the theorem is stated for continuous functions over [−1, 1] because we could just rescale cosine. We’re looking for approximations with (m, n) = (2, 2), i.e. ratios of quadratic polynomials, to see if we can improve on the approximation at the top of the post.

The equioscillation theorem says our error should oscillate at least 6 times, and so if we find an approximation whose error oscillates as required by the theorem, we know we’ve found the optimal approximation.

I first tried finding the optimal approximation using Mathematica’s MiniMaxApproximation function. But this function tries to optimize relative error and I’m trying to minimize absolute error. Minimizing relative error creates problems because cosine evaluates to zero at the ends of interval ±π/2. I tried several alternatives and eventually decided to take another approach.

Because the cosine function is even, the optimal approximation is even. Which means the optimal approximation has the form

(a + bx²) / (c + dx²)

and we can assume without loss of generality that a = 1. I then wrote some Python code to minimize the error as a function of the three remaining variables. The results were b = −4.00487004, c = 9.86024544, and d = 1.00198695, very close to Aryabhata’s approximation that corresponds to b = −4, c = π², and d = 1.

Here’s a plot of the error, the difference between cosine and the rational approximation.

The absolute error takes on its maximum value seven times, alternating between positive and negative values, and so we know the approximation is optimal. However sketchy my approach to finding the optimal approximation may have been, the plot shows that the result is correct.

Aryabhata’s approximation had maximum error 0.00163176 and the optimal approximation has maximum error 0.00097466. We were able to shave about 1/3 off the maximum error, but at a cost of using coefficients that would be harder to use by hand. This wouldn’t matter to a modern computer, but it would matter a great deal to an ancient astronomer.

[1] Approximation Theory and Approximation Practice by Lloyd N. Trefethen

The post Optimal rational approximation first appeared on John D. Cook.

Pell is to silver as Fibonacci is to gold

John — Mon, 02 Sep 2024 01:32:02 +0000

As mentioned in the previous post, the ratio of consecutive Fibonacci numbers converges to the golden ratio. Is there a sequence whose ratios converge to the silver ratio the way ratios of Fibonacci numbers converge to the golden ratio?

(If you’re not familiar with the silver ratio, you can read more about it here.)

The Pell numbers P_n start out just like the Fibonacci numbers:

P₀ = 0
P₁ = 1.

But the recurrence relationship is slightly different:

P_n+2 = 2P_n+1 + P_n.

So the Pell numbers are 0, 1, 2, 5, 12, 29, ….

The ratios of consecutive Pell numbers converge to the silver ratio.

Metallic ratios

There are more analogs of the golden ratio, such as the bronze ratio and more that do not have names. In general the kth metallic ratio is the larger root of

x² − kx − 1 = 0.

The cases n = 1, 2, and 3 correspond to the gold, silver, and bronze ratios respectively.

The quadratic equation above is the characteristic equation of the recurrence relation

P_n+2 = kP_n+1 + P_n.

which suggests how we construct a sequence of integers such that consecutive ratios converge to the nth metallic constant.

So if we use k = 3 in the recurrence relation, we should get a sequence whose ratios converge to the bronze ratio. The results are

0, 1, 3, 10, 33, 109, 360, 1189, 3927, 12970, …

The following code will print the ratios.

    def bronze(n):
        if n == 0: return 0
        if n == 1: return 1
        return 3*bronze(n-1) + bronze(n-2)

    for n in range(2, 12):
        print( bronze(n)/bronze(n-1) )

Here’s the output.

    3.0
    3.3333333333333335
    3.3
    3.303030303030303
    3.302752293577982
    3.3027777777777776
    3.302775441547519
    3.3027756557168324
    3.302775636083269
    3.3027756378831383

The results are converging to the bronze ratio

(3 + √13)/2 = 3.302775637731995.

Plastic ratio

The plastic ratio is the real root of x³ − x − 1 = 0. Following the approach above, we can construct a sequence of integers whose consecutive ratios converge to the plastic ratio with the recurrence relation

S_n+3 = S_n+1 + S_n

Let’s try this out with a little code.

    def plastic(n):
        if n < 3: return n
        return plastic(n-2) + plastic(n-3)

    for n in range(10, 20):
        print( plastic(n)/plastic(n-1) )

This prints

    1.3
    1.3076923076923077
    1.3529411764705883
    1.3043478260869565
    1.3333333333333333
    1.325
    1.320754716981132
    1.3285714285714285
    1.3225806451612903

which shows the ratios are approaching the plastic constant 1.324717957244746.

The post Pell is to silver as Fibonacci is to gold first appeared on John D. Cook.

Miles to kilometers

John — Sun, 01 Sep 2024 11:41:49 +0000

The number of kilometers in a mile is k = 1.609344 which is close to the golden ratio φ = 1.6180334.

The ratio of consecutive Fibonacci numbers converges to φ, and so you can approximately convert miles to kilometers by multiplying by a Fibonacci number and dividing by the previous Fibonacci number. For example, you could multiply by 8 and divide by 5, or you could multiply by 13 and divide by 8.

As you start going down the Fibonacci sequence, consecutive ratios get closer to k and closer to φ. But since the ratios converge to φ, at some point the ratios get closer to φ and further from k. That means there’s an optimal Fibonacci ratio for converting miles to kilometers.

I was curious what this optimal ratio is, and it turns out to be 21/13. There we have

|k − 21/13| = 0.0060406

and so the error in the approximation is 0.375%. The error is about a third smaller than using φ as the conversion factor.

The Lucas numbers satisfy the same recurrence relation as the Fibonacci numbers, but start with L₀ = 2 and L₁ = 1. The ratio of consecutive Lucas numbers also converges to φ, and so you could also use Lucas numbers to convert miles to kilometers.

There is an optimal Lucas ratio for converting miles to kilometers for the same reasons there is an optimal Fibonacci ratio. That ratio turns out to be 29/18, and

|k − 29/18| = 0.001767

which is about 4 times more accurate than the best Fibonacci ratio.

The post Miles to kilometers first appeared on John D. Cook.

Ancient accurate approximation for sine

John — Sat, 31 Aug 2024 17:33:40 +0000

This post started out as a Twitter thread. The text below is the same as that of the thread after correcting an error in the first part of the thread. I also added a footnote on a theorem the thread alluded to.

***

The following approximation for sin(x) is remarkably accurate for 0 < x < π.

The approximation is so good that you can’t see the difference between the exact value and the approximation until you get outside the range of the approximation.

Here’s a plot of just the error.

This is a very old approximation, dating back to Aryabhata I, around 500 AD.

In modern terms, it is a rational approximation, quadratic in the numerator and denominator. It’s not quite optimal since the ripples in the error function are not of equal height [1], but the coefficients are nice numbers.

***

As pointed out in the comments, replacing x with π/2 − x in order to get an approximation for cosine gives a much nicer equation.

***

[1] The equioscillation theorem says that the optimal approximation will have ripples of equal positive and negative amplitude. This post explores the equioscillation theorem further and finds how far Aryabhata’s is from optimal.

The post Ancient accurate approximation for sine first appeared on John D. Cook.

Mentally multiply by π

John — Sat, 31 Aug 2024 12:16:13 +0000

This post will give three ways to multiply by π taken from [1].

Simplest approach

Here’s a very simple observation about π :

π ≈ 3 + 0.14 + 0.0014.

So if you need to multiply by π, you need to multiply by 3 and by 14. Once you’ve multiplied by 14 once, you can reuse your work.

For example, to compute 4π, you’d compute 4 × 3 = 12 and 4 × 14 = 56. Then

4π ≈ 12 + 0.56 + 0.0056 = 12.5656.

The correct value is 12.56637… and so the error is .00077.

First refinement

Now of course π = 3.14159… and so the approximation above is wrong in the fourth decimal place. But you can squeeze out a little more accuracy with the observation

π ≈ 3 + 0.14 + 0.0014 + 0.00014 = 3.14154.

Now if we redo our calculation of 4π we get

4π ≈ 12 + 0.56 + 0.0056 + 0.00056 = 12.56616.

Now our error is .00021, which is 3.6 times smaller.

Second refinement

The approximation above is based on an underestimate of π. We can improve it a bit by adding half of our last term, based on

π ≈ 3 + 0.14 + 0.0014 + 0.00014 + 0.00014/2 = 3.14157

So in our running example,

4π ≈ 12 + 0.56 + 0.0056 + 0.00056 + 00028 = 12.5656 = 12.56654.

which has an error of 0.00007, which is three times smaller than above.

[1] Trevor Lipscombe. Mental mathematics for multiples of π. The Mathematical Gazette, Vol. 97, No. 538 (March 2013), pp. 167–169

The post Mentally multiply by π first appeared on John D. Cook.

A better integral for the normal distribution

John — Sat, 31 Aug 2024 11:45:09 +0000

For a standard normal random variable Z, the probability that Z exceeds some cutoff z is given by

If you wanted to compute this probability numerically, you could obviously evaluate its defining integral numerically. But as is often the case in numerical analysis, the most obvious approach is not the best approach. The range of integration is unbounded and it varies with the argument.

J. W. Craig [1] came up with a better integral representation, better from the perspective of numerical integration. The integration is always over the same finite interval, with the argument appearing inside the integrand. The integrand is smooth and bounded, well suited to numerical integration.

For positive z, Craig’s integer representation is

Illustration

To show that the Craig’s integral is easy to integrate numerically, we’ll evaluate it using Gaussian quadrature with only 10 integration points.

    from numpy import sin, exp, pi
    from scipy import integrate
    from scipy.stats import norm

    for x in [0.5, 2, 5]:
        q, _ = integrate.fixed_quad(
            lambda t: exp(-x**2 / (2*sin(t)**2))/pi,
            0.0, pi/2, n=10)
        print(q, norm.sf(x))

(SciPy uses sf (“survival function”) for the CCDF. More on that here.)

The code above produces the following.

    0.30858301 0.30853754
    0.02274966 0.02275013
    2.86638437e-07 2.86651572e-07

So with 10 integration points, we get four correct figures. And the accuracy seems to be consistent for small, medium, and large values of x. (Five standard deviations is pretty far out in the tail of a normal distribution, as evidenced by the small value of the integral.)

[1] J. W. Craig, A new, simple and exact result for calculating the probability of error for two-dimensional signal constellations, in TEEE MILCOM’91 Conf. Rec., Boston, MA (1991) рр. 25.2.1-25.5.5.

The post A better integral for the normal distribution first appeared on John D. Cook.

Drawing with a compass on a globe

John — Fri, 30 Aug 2024 13:09:19 +0000

Take a compass and draw a circle on a globe. Then take the same compass, opened to the same width, and draw a circle on a flat piece of paper. Which circle has more area?

If the circle is small compared to the radius of the globe, then the two circles will be approximately equal because a small area on a globe is approximately flat.

To get an idea what happens for larger circles, let’s a circle on the globe as large as possible, i.e. the equator. If the globe has radius r, then to draw the equator we need our compass to be opened a width of √2 r, the distance from the north pole to the equator along a straight line cutting through the globe.

The area of a hemisphere is 2πr². If we take our compass and draw a circle of radius √2 r on a flat surface we also get an area of 2πr². And by continuity we should expect that if we draw a circle that is nearly as big as the equator then the corresponding circle on a flat surface should have approximately the same area.

Interesting. This says that our compass will draw a circle with the same area whether on a globe or on a flat surface, at least approximately, if the width of the compass sufficiently small or sufficiently large. In fact, we get exactly the same area, regardless of how wide the compass is opened up. We haven’t proven this, only given a plausibility argument, but you can find a proof in [1].

Note that the width w of the compass is the radius of the circle drawn on a flat surface, but it is not the radius of the circle drawn on the globe. The width w is greater than the radius of the circle, but less than the distance along the sphere from the center of the circle. In the case of the equator, the radius of the circle is r, the width of the compass is √2 r , and the distance along the sphere from the north pole to the equator is πr/2.

[1] Nick Lord. On an alternative formula for the area of a spherical cap. The Mathematical Gazette, Vol. 102, No. 554 (July 2018), pp. 314–316

The post Drawing with a compass on a globe first appeared on John D. Cook.

The negative binomial distribution and Pascal’s triangle

John — Thu, 29 Aug 2024 14:54:01 +0000

The Poisson probability distribution gives a simple, elegant model for count data. You can even derive from certain assumptions that data must have a Poisson distribution. Unfortunately reality doesn’t often go along with those assumptions.

A Poisson random variable with mean λ also has variance λ. But it’s often the case that data that would seem to follow a Poisson distribution has a variance greater than its mean. This phenomenon is called over-dispersion: the dispersion (variance) is larger than a Poisson distribution assumption would allow.

One way to address over-dispersion is to use a negative binomial distribution. This distribution has two parameters, r and p, and has the following probability mass function (PMF).

As the parameter r goes to infinity, the negative binomial distribution converges to a Poisson distribution. So you can think of the negative binomial distribution as a generalization of the Poisson distribution.

These notes go into the negative binomial distribution in some detail, including where its name comes from.

If the parameter r is a non-negative integer, then the binomial coefficients in the PMF for the negative binomial distribution are on the (r+1)st diagonal of Pascal’s triangle.

The case r = 0 corresponds to the first diagonal, the one consisting of all 1s. The case r = 1 corresponds to the second diagonal consisting of consecutive integers. The case r = 2 corresponds to the third diagonal, the one consisting of triangular numbers. And so forth.

The post The negative binomial distribution and Pascal’s triangle first appeared on John D. Cook.

A strange take on the harmonic series

John — Thu, 29 Aug 2024 12:11:24 +0000

It is well known that the harmonic series

1 + ½ + ⅓ + ¼ + …

diverges. But if you take the denominators as numbers in base 11 or higher, the series converges [1].

I wonder what inspired this observation. Maybe Brewster was bored, teaching yet another cohort of students that the harmonic series diverges, and let his mind wander.

Proof

Let f(n) be the function that takes a positive integer n, writes it in base 10, then reinterprets the result as a number in base b where b > 10. Brewster is saying that the sum of the series 1/f(n) converges.

To see this, note that the first 10 terms are less than or equal to 1. The next 100 terms are less than 1/b. The next 1000 terms are less than 1/b², and so on. This means the series is bounded by the geometric series 10 (10/b)^m.

Python

Incidentally, despite being an unusual function, f is very easy to implement in Python:

   def f(n, b): return int(str(n), b)

Citation

Brewster’s note was so brief that I will quote it here in full.

The [harmonic series] is divergent. But if the denominators of the terms are read as numbers in scale 11 or any higher scale, the series is convergent, and the sum is greater than 2.828 and less than 26.29. The convergence is rather slow. I estimate that, to find the last number by direct addition, one would have to work out 10⁹⁰ terms, to about 93 places of decimals.

[1] G. W. Brewster. An Old Result in a New Dress. The Mathematical Gazette, Vol. 37, No. 322 (Dec., 1953), pp. 269–270.

The post A strange take on the harmonic series first appeared on John D. Cook.

Variance matters more than mean in the extremes

John — Mon, 26 Aug 2024 16:18:12 +0000

Suppose you have two normal random variables, X and Y, and that the variance of X is less than the variance of Y.

Let M be an equal mixture of X and Y. That is, to sample from M, you first chose X or Y with equal probability, then you choose a sample from the random variable you chose.

Now suppose you’ve observed an extreme value of M. Then it is more likely the that the value came from Y. The means of X and Y don’t matter, other than determining the cutoff for what “extreme” means.

High-level math

To state things more precisely, there is some value t such that the posterior probability that a sample m from M came from Y, given that |m| > t, is greater than the posterior probability that m came from X.

Let’s just look at the right-hand tails, even though the principle applies to both tails. If X and Y have the same variance, but the mean of X is greater, then larger values of Z are more likely to have come from X. Now suppose the variance of Y is larger. As you go further out in the right tail of M, the posterior probability of an extreme value having come from Y increases, and eventually it surpasses the posterior probability of the sample having come from X. If X has a larger mean than Y that will delay the point at which the posterior probability of Y passes the posterior probability of X, but eventually variance matters more than mean.

Detailed math

Let’s give a name to the random variable that determines whether we choose X or Y. Let’s call it C for coin flip, and assume C takes on 0 and 1 each with probability 1/2. If C = 0 we sample from X and if C = 1 we sample from Y. We want to compute the probability P(C = 1 | M ≥ t).

Without loss of generality we can assume X has mean 0 and variance 1. (Otherwise transform X and Y by subtracting off the mean of X then divide by the standard deviation of X.) Denote the mean of Y by μ and the standard deviation by σ.

From Bayes’ theorem we have

where Φ^c(t) = P(Z ≥ t) for a standard normal random variable.

Similarly, to compute P(C = 1 | M ≤ t) just flip the direction of the inequality signs replace Φ^c(t) = P(Z ≥ t) with Φ(t) = P(Z ≤ t).

The calculation for P(C = 1 | |M| ≥ t) is similar

Example

Suppose Y has mean −2 and variance 10. The blue curve shows that a large negative sample from M very likely comes from Y and the orange line shows that large positive sample very likely comes from Y as well.

The dip in the orange curve shows the transition zone where Y‘s advantage due to a larger mean gives way to the disadvantage of a smaller variance. This illustrates that the posterior probability of Y increases eventually but not necessarily monotonically.

Here’s a plot showing the probability of a sample having come from Y depending on its absolute value.

The post Variance matters more than mean in the extremes first appeared on John D. Cook.

Increasing speed due to friction

John — Sat, 24 Aug 2024 15:52:29 +0000

Orbital mechanics is fascinating. I’ve learned a bit about it for fun, not for profit. I seriously doubt Elon Musk will ever call asking me to design an orbit for him. [1]

One of the things that makes orbital mechanics interesting is that it can be counter-intuitive. For example, atmospheric friction can make a satellite move faster. How can this be? Doesn’t friction always slow things down?

Friction does reduce a satellite’s tangential velocity, causing it to move into a lower orbit, which increases its velocity. It’s weird to think about, but the details are worked out in [2].

Note the date on the article: May 1958. The paper was written in response to Sputnik 1 which launched in October 1957. Parkyn’s described the phenomenon of acceleration due to friction in general, and how it applied to Sputnik in particular.

[1] I had a lead on a project with NASA once, but it wasn’t orbital mechanics, and the lead didn’t materialize.

[2] D. G. Parkyn. The Effect of Friction on Elliptic Orbits. The Mathematical Gazette. Vol. 42, No. 340 (May, 1958), pp. 96-98

The post Increasing speed due to friction first appeared on John D. Cook.

Ptolemy’s theorem

John — Sat, 24 Aug 2024 13:42:06 +0000

Draw a quadrilateral by pick four arbitrary points on a circle and connecting them cyclically.

Now multiply the lengths of the pairs of opposite sides. In the diagram below this means multiplying the lengths of the two horizontal-ish blue sides and the two vertical-ish orange sides.

Ptolemy’s theorem says that the sum of the two products described above equals the product of the diagonals.

To put it in colorful terms, the product of the blue sides plus the product of the orange sides equals the product of the green diagonals.

The converse of Ptolemy’s theorem also holds. If the relationship above holds for a quadrilateral, then the quadrilateral can be inscribed in a circle.

Note that if the quadrilateral in Ptolemy’s theorem is a rectangle, then the theorem reduces to the Pythagorean theorem.

The post Ptolemy’s theorem first appeared on John D. Cook.

Rule for converting trig identities into hyperbolic identities

John — Tue, 20 Aug 2024 14:14:31 +0000

There is a simple rule of thumb for converting between (circular) trig identities and hyperbolic trig identities known as Osborn’s rule: stick an h on the end of trig functions and flip signs wherever two sinh functions are multiplied together.

Examples

For example, the circular identity

sin(θ + φ) = sin(θ) cos(φ) + cos(θ) sin(φ)

becomes the hyperbolic identity

sinh(θ + φ) = sinh(θ) cosh(φ) + cosh(θ) sinh(φ)

but the identity

2 sin(θ) sin(φ) = cos(θ − φ) − cos(θ + φ)

becomes

2 sinh(θ) sinh(φ) = cosh(θ + φ) − cosh(θ − φ)

because there are two sinh terms.

Derivation

Osborn’s rule isn’t deep. It’s a straight-forward application of Euler’s theorem:

exp(iθ) = cos(θ) + i sin(θ).

More specifically, Osborn’s rule follows from two corollaries of Euler’s theorem:

sin(iθ) = i sinh(θ)
cos(iθ) = cosh(θ)

Why bother?

The advantage of Osborn’s rule is that it saves time, and perhaps more importantly, it reduces the likelihood of making a mistake.

You could always derive any identity you need on the spot. All trig identities—circular or hyperbolic—are direct consequences of Euler’s theorem. But it saves time to work at a higher level of abstraction. And as I’ve often said in the context of more efficient computer usage, the advantage of doing things faster is not so much the time directly saved but the decreased probability of losing your train of thought.

Caveats

Osborn’s rule included implicit expressions of sinh, such as in tanh = sinh / cosh. So, for example, the circular identity

tan(2θ) = 2 tan(θ) / (1 − tan²(θ))

becomes

tanh(2θ) = 2 tanh(θ) / (1 + tanh²(θ))

because the tanh² term implicitly contains two sinh terms.

Original note

Osborn’s original note [1] from 1902 is so short that I include the entire text below:

[1] G. Osborn. Mnemonic for Hyperbolic Formulae. The Mathematical Gazette, Vol. 2, No. 34 (Jul., 1902), p. 189

The post Rule for converting trig identities into hyperbolic identities first appeared on John D. Cook.

Interpolation and the cotanc function

John — Mon, 19 Aug 2024 11:25:29 +0000

This weekend I wrote three posts related to interpolation:

The first post looks at reducing the size of mathematical tables by switching for linear to quadratic interpolation. The immediate application is obsolete, but the principles apply to contemporary problems.

The second post looks at alternatives to Lagrange interpolation that are much better suited to hand calculation. The final post is a tangent off the middle post.

Tau and sigma functions

In the process of writing the posts above, I looked at Chambers Six Figure Mathematical Tables from 1964. There I saw a couple curious functions I hadn’t run into before, functions the author called τ and σ.

τ(x) = x cot x
σ(x) = x csc x

So why introduce these two functions? The cotangent and cosecant functions have a singularity at 0, and so its difficult to tabulate and interpolate these functions. I touched on something similar in my recent post on interpolating the gamma function: because the function grows rapidly, linear interpolation gives bad results. Interpolating the log of the gamma function gives much better results.

Chambers tabulates τ(x) and σ(x) because these functions are easy to interpolate.

The cotanc function

I’ll refer to Chambers’ τ function as the cotanc function. This is a whimsical choice, not a name used anywhere else as far as I know. The reason for the name is as follows. The sinc function

sinc(x) = sin(x)/x

comes up frequently in signal processing, largely because it’s the Fourier transform of the indicator function of an interval. There are a few other functions that tack a c onto the end of a function to indicate it has been divided by x, such as the jinc function.

The function tan(x)/x is sometimes called the tanc function, though this name is far less common than sinc. The cotangent function is the reciprocal of the tangent function, so I’m calling the reciprocal of the tanc function the cotanc function. Maybe it should be called the cotank function just for fun. For category theorists this brings up images of a tank that fires backward.

Practicality of the cotanc function

As noted above, the cotangent function is ill-behaved near 0, but the cotanc function is very nicely behaved near 0. The cotanc function has singularities at non-zero multiples of π but multiplying by x removes the singularity at 0.

As noted here, interpolation error depends on the size of the derivatives of the function being interpolated. Since the cotanc function is flat and smooth, it has small derivatives and thus small interpolation error.

The post Interpolation and the cotanc function first appeared on John D. Cook.

Binomial coefficients with non-integer arguments

John — Sun, 18 Aug 2024 21:25:38 +0000

When n and r are positive integers, with n ≥ r, there is an intuitive interpretation of the binomial coefficient C(n, r), namely the number of ways to select r things from a set of n things. For this reason C(n, r) is usually pronounced “n choose r.”

But what might something like C(4.3, 2)? The number of ways to choose two giraffes out of a set of 4.3 giraffes?! There is no combinatorial interpretation for binomial coefficients like these, though they regularly come up in applications.

It is possible to define binomial coefficients when n and r are real or even complex numbers. These more general binomial coefficients are in this liminal zone of topics that come up regularly, but not so regularly that they’re widely known. I wrote an article about this a decade ago, and I’ve had numerous occasions to link to it ever since.

The previous post implicitly includes an application of general binomial coefficients. The post alludes to coefficients that come up in Bessel’s interpolation formula but doesn’t explicitly say what they are. These coefficients B_k can be defined in terms of the Gaussian interpolation coefficients, which are in turn defined by binomial coefficients with non-integer arguments.

Note that 0 < p < 1.

The coefficients in Everett’s interpolation formula can also be expressed simply in terms of the Gauss coefficients.

The post Binomial coefficients with non-integer arguments first appeared on John D. Cook.

Bessel, Everett, and Lagrange interpolation

John — Sun, 18 Aug 2024 20:26:42 +0000

I never heard of Bessel or Everett interpolation until long after college. I saw Lagrange interpolation several times. Why Lagrange and not Bessel or Everett?

First of all, Bessel interpolation and Everett interpolation are not different kinds of interpolation; they are different algorithms for carrying out the same interpolation as Lagrange. There is a unique polynomial of degree n fitting a function at n + 1 points, and all three methods evaluate this same polynomial.

The advantages of Bessel’s approach or Everett’s approach to interpolation are practical, not theoretical. In particular, these algorithms are practical when interpolating functions from tables, by hand. This was a lost art by the time I went to college. Presumably some of the older faculty had spent hours computing with tables, but they never said anything about it.

Bessel interpolation and Everett interpolation are minor variations on the same theme. This post will describe both at such a high level that there’s no difference between them. This post is high-level because that’s exactly what seems to be missing in the literature. You can easily find older books that go into great detail, but I believe you’ll have a harder time finding a survey presentation like this.

Suppose you have a function f(x) that you want to evaluate at some value p. Without loss of generality we can assume our function has been shifted and scaled so that we have tabulated f at integers our value p lies between 0 and 1.

Everett’s formula (and Bessel’s) cleanly separates the parts of the interpolation formula that depend on f and the parts that depend on p.

The values of the function f enter through the differences of the values of f at consecutive integers, and differences of these differences, etc. These differences are easy to calculate by hand, and sometimes were provided by the table publisher.

The functions of p are independent of the function being interpolated, so these functions, the coefficients in Bessel’s formula and Everett’s formula, could be tabulated as well.

If the function differences are tabulated, and the functions that depend on p are tabulated, you could apply polynomial interpolation without ever having to explicitly evaluate a polynomial. You’d just need to evaluate a sort of dot product, a sum of numbers that depend on f multiplied by numbers that depend on p.

Another advantage of Bessel’s and Everett’s interpolation formulas is that they can be seen as a sequence of refinements. First you obtain the linear interpolation result, then refine it to get the quadratic interpolation result, then add terms to get the cubic interpolation result, etc.

This has several advantages. First, you have the satisfaction of seeing progress along the way; Lagrange interpolation may give you nothing useful until you’re done. Related to this is a kind of error checking: you have a sense of where the calculations are going, and intermediate results that violate your expectations are likely to be errors. Finally, you can carry out the calculations for the smallest terms in with less precision. You can use fewer and fewer digits in your hand calculations as you compute successive refinements to your result.

The post Bessel, Everett, and Lagrange interpolation first appeared on John D. Cook.

Compression and interpolation

John — Sat, 17 Aug 2024 12:56:36 +0000

Data compression is everywhere. We’re unaware of it when it is done well. We only become aware of it when it is pushed too far, such as when a photo looks grainy or fuzzy because it was compressed too much.

The basic idea of data compression is to not transmit the raw data but to transmit some of the data along with instructions for how to approximately reconstruct the rest [1].

Fifty years ago scientists were concerned with a different application of compression: reducing the size of mathematical tables. Books of tabulated functions are obsolete now, but the principles used in producing these tables are still very much relevant. We use compression and interpolation far more often now, though it’s almost always invisibly executed by software.

Compressing tables

In this post I want to expand on comments by Forman Acton from his book Numerical Methods That Work on compression.

Many persons are unaware of the considerable compression in a table that even the use of quadratic interpolation permits. A table of sin x covering the first quadrant, for example, requires 541 pages if it is to be linearly interpolable to eight decimal places. If quadratic interpolation is used, the same table takes only one page having entries at one-degree intervals with functions of the first and second differences being recorded together with the sine itself.

Acton goes on to mention the advantage of condensing shelf space by a factor of 500. We no longer care about saving shelf space, but we may care very much about saving memory in an embedded device.

Quadratic interpolation does allow more compression than linear interpolation, but not by a factor of 500. I admire Acton’s numerical methods book, but I’m afraid he got this one wrong.

Interpolation error bound

In order to test Acton’s claim we will need the following theorem on interpolation error [2].

Let f be a function so that f⁽ⁿ⁺¹⁾ is continuous on [a, b] and satisfies |f⁽ⁿ⁺¹⁾ (x)| ≤ M. Let p be the polynomial of degree ≤ n that interpolates f at n + 1 equally spaced nodes in [a, b], including the end points. Then on [a, b],

Quadratic interpolation error

Acton claims that quadratic interpolation at intervals of one degree is adequate to produce eight decimal places of accuracy. Quadratic interpolation means n = 2.

We have our function tabulated at evenly spaced points a distance h = π/180 radians apart. Quadratic interpolation requires function values at three points, so b − a = 2h = π/90. The third derivative of sine is negative cosine, so M = 1.

This gives an error bound of 4.43 × 10⁻⁷, so this would give slightly better than six decimal place accuracy, not eight.

Linear interpolation error

Suppose we wanted to create a table of sine values so that linear interpolation would give results accurate to eight decimal places.
In the interpolation error formula we have M = 1 as before, and now n = 1. We would need to tabulate sine at enough points that h = b − a is small enough that the error is less than 5 × 10⁻⁹. It follows that h = 0.0002 radians. Covering a range of π/2 radians in increments of 0.0002 radians would require 7854 function values. Acton implicitly assumes 90 values to a page, so this would take about 87 pages.

Abramowitz and Stegun devotes 32 pages to tabulating sine and cosine at increments of 0.001 radian. This does not always guarantee eight decimal place accuracy using linear interpolation, but it does guarantee at least seven places (more on that here), which is better than a table at one degree increments would deliver using quadratic interpolation. So it would have been more accurate for Acton to say quadratic interpolation reduces the number of pages by a factor of 30 rather than 500.

Cubic interpolation error

If we have a table of sine values at one degree increments, how much accuracy could we get using cubic interpolation? In that case we’d apply the interpolation error theorem with n = 3 and b − a = 3(π/180) = π/60. Then the error bound is 5.8 × 10⁻⁹. This would usually give you eight decimal place accuracy, so perhaps Acton carried out the calculation for cubic interpolation rather than quadratic interpolation.

[1] This is what’s known as lossy compression; some information is lost in the compression process. Lossless compression also replaces the original data with a description that can be used to reproduce the data, but in this case the reconstruction process is perfect.

[2] Ward Cheney and David Kincaid. Numerical Methods and Computation. Third edition.

The post Compression and interpolation first appeared on John D. Cook.

Chebyshev polynomials as distorted cosines

John — Fri, 16 Aug 2024 03:13:39 +0000

Forman Acton’s book Numerical Methods that Work describes Chebyschev polynomials as

cosine curves with a somewhat disturbed horizontal scale, but the vertical scale has not been touched.

The relation between Chebyshev polynomials and cosines is

T_n(cos θ) = cos(nθ).

Some sources take this as the definition of Chebyshev polynomials. Other sources define the polynomials differently and prove this equation as a theorem.

It follows that if we let x = cos θ then

T_n(x) = cos(n arccos x).

Now sin x = cos(π/2 − x) and for small x, sin x ≈ x. This means

arccos(x) ≈ π/2 − x

for x near 0, and so we should expect the approximation

T_n(x) ≈ cos(n(π/2 − x)).

to be accurate near the middle of the interval [−1, 1] though not at the ends. A couple plots show that this is the case.

Mote Chebyshev posts

The post Chebyshev polynomials as distorted cosines first appeared on John D. Cook.

Math’s base 32 versus Linux’s base 32

John — Tue, 13 Aug 2024 15:27:00 +0000

The convention in math for writing numbers in bases larger than 10 is to insert capital letters after 9, starting with A. So, for example, the digits in base 12 are 0, 1, 2, …, 9, A, and B.

So if you’re familiar with math but not Linux, and you run across the base32 utility, you might naturally assume that the command converts numbers to base 32 using the symbols 0, 1, 2, &hellip, 9, A, B, C, …, V. That’s a reasonable guess, but it actually uses the symbols A, B, C, …, Z, 2, 3, 4, 5, 6, and 7. It’s all described in RFC 3548.

What’s going on? The purpose of base 32 encoding is to render binary data in a way that is human readable and capable of being processed by software that was originally written with human readable input in mind. The purpose is not to carry out mathematical operations.

Note that the digit 0 is not used, because it’s visually similar to the letter O. The digit 1 is also not used, perhaps because it looks like a lowercase l in some fonts.

The post Math’s base 32 versus Linux’s base 32 first appeared on John D. Cook.

Editing a file without an editor

John — Sun, 11 Aug 2024 12:30:08 +0000

I don’t use sed very often, but it’s very handy when I do use it, particularly when needing to make a small change to a large file.

Fixing a JSON file

Lately I’ve been trying to fix a 30 MB JSON file that has been corrupted somehow. The file is one very long line.

Emacs was unable to open the file. (It might have eventually opened the file, but I killed the process when it took longer than I was willing to wait.)

Emacs can open large files, but it has trouble with long lines. Somewhere its data structures assume lines are not typically millions of characters long.

I used sed to add line breaks after closing brackets

    sed -i 's/]/]\n/g' myfile.json

and then I was able to open the file in Emacs.

If the problem with my JSON file were simply a missing closing brace—it’s not—then I could add a closing brace with

    sed -i 's/$/}/' myfile.json

Using sed to find a job

I had a friend in college who got a job because of a sed script he wrote as an intern.

A finite element program would crash when it attempted to take too large a time step, but the program would not finish by the time the results were needed if it always took tiny time steps. So they’d let the program crash occasionally, then edit the configuration file with a smaller time step and restart the program.

They were asking engineers to work around the clock so someone could edit the configuration file and restart the finite element program if it crashed in the middle of the night. My friend wrote a shell script to automate this process, using sed to do the file editing. He eliminated the need for a night shift and got a job offer.

The post Editing a file without an editor first appeared on John D. Cook.

John D. Cook

Error in Ramanujan’s approximation for ellipse perimeter

Example

Error

The Cauchy distribution’s counter-intuitive behavior

Averages and scaling

Modeling

Mean

Variance

Truncating

Related posts

Arithmetic, Geometry, Harmony, and Gold

Ceva, cevians, and Routh’s theorem

Related posts

Moments of inertia mnemonic

Rectangular solids

Cylinders

Spheres and ellipsoids

Thin rod

Routh’s stretch rule

Related posts

Binomial bound

Related posts

Separable functions in different contexts

Related posts

Body roundness index

Related posts

A couple more variations on an ancient theme

Using Aryabhata’s value of π

Using π² ≈ 10

Finding pi in the alphabet

Optimal rational approximation

Related posts

Pell is to silver as Fibonacci is to gold

Metallic ratios

Plastic ratio

Related posts

Miles to kilometers

Ancient accurate approximation for sine

Mentally multiply by π

Simplest approach

First refinement

Second refinement

Related posts

A better integral for the normal distribution

Illustration

Related posts

Drawing with a compass on a globe

Related posts

The negative binomial distribution and Pascal’s triangle

Related posts

A strange take on the harmonic series

Proof

Python

Citation

Variance matters more than mean in the extremes

High-level math

Detailed math

Example

Related posts

Increasing speed due to friction

Related posts

Ptolemy’s theorem

Related posts

Rule for converting trig identities into hyperbolic identities

Examples

Derivation

Why bother?

Caveats

Original note

Related posts

Interpolation and the cotanc function

Tau and sigma functions

The cotanc function

Practicality of the cotanc function

Binomial coefficients with non-integer arguments

Bessel, Everett, and Lagrange interpolation

Related posts

Compression and interpolation

Compressing tables