The only Cyrillic letter I recall seeing in math is sha (Ш, U+0428) for the so-called Dirc comb distribution.

One Hebrew letter is commonly used in math, and that’s aleph (א, U+05D0). Aleph is used fairly often, but other Hebrew letters are much rarer. If you see any other Hebrew letter in math, it’s very likely to be one of the next three letters: beth (ב, U+05D1), gimel (ג, U+05D2), or dalet (ד, U+05D3).

To back up this claim, basic LaTeX only has a command for aleph (unsurprisingly, it’s `\aleph`

). AMS-LaTeX adds the commands `\beth`

, `\gimel`

, and `\daleth`

, but no more. Those are the only Hebrew letters you can use in LaTeX without importing a package or using XeTeX so you can use Unicode symbols.

Not only are Hebrew letters rare in math, the only area of math that uses them at all set theory, where they are used to represent transfinite numbers.

So in short, if you see a Hebrew letter in math, it’s overwhelmingly likely to be in set theory, and it’s very likely to be aleph, or possibly beth, gimel, or dalet.

But today I was browsing through Morse and Feschbach and was very surprised to see the following on page 324.

I’ve never seen a Hebrew letter in applied math, and I’ve never seen ayin (ע, U+05E2) or yod (י, U+05D9) used anywhere in math.

In context, the authors had used Roman letters, Fraktur letters, and Greek letters and so they’d run out of alphabets. The entity denoted by gimel is related to a tensor the authors denoted with *g*, so presumably they used the Hebrew letter that sounds like “g”. But I have no idea why they chose ayin or yod.

A binomial random variable is the sum of iid (independent, identically distributed) Bernoulli random variables. But what if the Bernoulli random variables don’t have the same distribution. That is, suppose you’re counting the number of heads seen in flipping *n* coins, where each coin has a potentially different probability of coming up heads. Will a Poisson approximation still work?

This post will cite three theorems on the error in approximating a sum of *n* independent Bernoulli random variables, each with a different probability of success *p*_{i}. I’ll state each theorem and very briefly discuss its advantages. The theorems can be found in [1].

For *i* = 1, 2, 3, …, *n* let *X*_{i} be Bernoulli random variables with

Prob(*X*_{i} = 1) = *p*_{i}

and let *X* with no subscript be their sum:

*X* = *X*_{1} + *X*_{2} + *X*_{3} + … + *X*_{n}

We want to approximate the distribution of *X* with a Poisson distribution with parameter λ. We will measure the error in the Poisson approximation by the maximum difference between the mass density function for *X* and the mass density function for a Poisson(λ) random variable.

We consider two ways to choose λ. The first is

λ = *p*_{1} + *p*_{2} + *p*_{3} + … + *p*_{n}.

For this choice we have two different theorems that give upper bounds on the approximation error. One says that the error is bounded by the sum of the squares of the *p*‘s

*p*_{1}² + *p*_{2}² + *p*_{3}² + … + *p*_{n}²

and the other says it is bounded by 9 times the maximum of the *p*‘s

9 max(*p*_{1}, *p*_{2}, *p*_{3}, …, *p*_{n}).

The sum of squares bound will be smaller when *n* is small and the maximum bound will be smaller when *n* is large.

The second way to choose λ is

λ = λ_{1} + λ_{2} + λ_{3} + … + λ_{n}

where

λ_{i} = -log(1 – *p*_{i}).

In this case the bound on the error is one half the sum of the squared λ’s:

(λ_{1}² + λ_{2}² + λ_{3}² + … + λ_{n}²)/2.

When *p*_{i} is small, λ_{i} ≈ *p*_{i}. In this case the error bound for the transformed Poisson approximation will be about half that of the one above.

- Normal approximation to binomial
- Camp-Paulson approximation to binomial
- Relative error in normal approximations

[1] R. J. Serfling. Some Elementary Results on Poisson Approximation in a Sequence of Bernoulli Trials. SIAM Review, Vol. 20, No. 3 (July, 1978), pp. 567-579.

The post Sum of independent but differently distributed variables first appeared on John D. Cook.]]>I ran across HAKMEM item 143 while writing my previous two blog posts about the arithmetic-geometric mean (AGM). This entry, due to Gene Salamin, says that

for large *n*. In fact, *n* doesn’t have to be very large. The expression on the right converges exponentially as *n* grows.

Here’s the Mathematica code that produced the plot above.

Plot[{2 n ArithmeticGeometricMean[1, 4 Exp[-n]], Pi}, {n, 1, 5}, PlotRange -> All]

If you have *e* to a large number of decimals, you could compute π efficiently by letting *n* be a power of 2, computing *e*^{–n} by repeatedly squaring 1/*e*, and using the AGM. Computing the AGM is simple. In Python notation, iterate

a, b = (a+b)/2, sqrt(a*b)

until *a* and *b* are equal modulo your tolerance.

First I need to define the Jacobi theta functions

Note that if *q* is very small, the series above converge very quickly. We will pick *q* so small that θ_{2}(*q*) and θ_{3}(*q*) are trivial to compute with sufficient accuracy.

Suppose we want to compute the natural log of 10 to a thousand digits. We can use the following equation from [1].

We want *q* to be small, so we want 1/*q* to be large. We set *q* = 10^{–n} to compute log 10^{n} and divide the result by *n*.

We can show from the definition of the theta functions that

Since we want 1000 digit accuracy, we can set *q* = 10^{-250}. Then θ_{2}^{2}(*q*^{4}) = 4×10^{-500} and θ_{3}^{2}(*q*^{4}) = 1 to about 2000 digits of accuracy.

We can calculate the result we want with just 20 iterations of the AGM as the following bc code shows.

define agm(a, b, n) { auto i, c, d for (i = 1; i <= n; i++) { c = (a + b)/2 d = sqrt(a*b) a = c b = d } return a } scale=2000 # a(1) = arctan(1) = pi/4 x = a(1)/(agm(4*10^-500, 1, 20)*250) - l(10) # error l(x)/l(10) # log_10(error)

This returns -999.03…, so we were basically able to get 1000 digits, though the last digit or two might be dodgy. Let’s try again with a smaller value of *q*, setting 10^{-300}.

x = a(1)/(agm(4*10^-600,1, 20)*300) - l(10); l(x)/l(10)

This returns -1199.03… and so we have about 1200 digits. We were using 2000 digit precision in our calculations, but our approximation θ_{3}^{2}(*q*^{4}) = 1 wasn’t good to quite 2000 digits. Using a smaller value of *q* fixed that.

[1] The Borwein brothers, Pi and the AGM. arXiv

The post Computing logs with the AGM first appeared on John D. Cook.]]>*x* – *x*²/2 + *x*³/3 – …

with unbelievable speed and accuracy. You say thank you and your visitors vanish.

You get back home and wonder what you can do with your black boxes. The series the boxes compute looks vaguely familiar, but you can’t remember what it is. You call up a friend and he tells you that it’s the Taylor series for log(1 + *x*).

OK, so now what?

Your friend tells you he can predict what the boxes will output. He tells you, for example, that if you enter 0.5 it will output 0.4054651081081644. You try it, and your friend is right. At least partly. If you ask your box to give you only 16 digits, it will give you exactly what your friend said it would. But you could ask it for a thousand digits or a million digits or a billion digits, something your friend cannot do, at least not quickly.

Then you realize your friend has things backward. The way to exploit these boxes is not to compute logs on your laptop to predict their output, but to use the boxes *instead* of your laptop.

So you have a way to compute logs. You can bootstrap that to compute inverse logs, i.e. exp(*x*). And you can bootstrap that to compute sines and cosines. You try to compute anything you can starting from logs.

The preceding story was an introduction to the AGM, the arithmetic-geometric mean. It is the limit of alternatingly taking ordinary and geometric means. More on that here. What I want to focus on here is that the AGM can be computed extremely quickly.

Each iteration in the process of computing the AGM doubles the number of correct figures in the answer. Suppose you want to compute its output to a billion decimal places, and you’ve calculated a million decimal places. You need to compute 999,000,000 more decimal places, but you’re nearly there! Ten more steps and you’ll have all billion decimal places.

If you want to compute something to millions of digits, it would make sense to try to compute it in terms of the AGM. This was the research program of the brothers Jonathan and Peter Borwein. Much of this research was codified in their book Pi and the AGM. They used the AGM to compute π to crazy precision, but that wasn’t their goal *per se*.

Computing π was a demonstration project for a deeper agenda. While describing the work of the Borwein brothers, Richard Brent said

… theorems about π are often just the tips of “mathematical icebergs”—much of interest lies hidden beneath the surface.

The AGM of *x* and *y* equals

where *K* is the “complete elliptic integral of the first kind.” You might reasonably think “Great. I’ll keep that in mind if I ever need to compute the compute elliptic integral of the first kind, whatever that is.” But *K* is like the log function in the story above, something that can be bootstrapped to compute other things.

The aliens Gauss, Lagrange, and Legendre gave the Borweins the AGM black box, and the Borweins figured out how to use it to compute π, but also log, exp, cos, etc. The Borwein algorithms may not be the most efficient if you only want, say, 16 digits of precision. But as you need more precision, eventually they become the algorithms to use.

See the next post for an example using the AGM to compute logarithms to 1000 digits.

And see the one after that for a way to compute π with the AGM.

The summation form of the Cauchy-Schwarz inequality says that

for sequences of real numbers *x*_{n} and *y*_{n}.

The integral form of the Cauchy-Schwarz inequality says that

for any two real-valued functions *f* and *g* over a measure space (*E*, μ) provided the integrals above are defined.

You can derive the sum form from the integral form by letting your measure space be the integers with counting measure. You can derive the integral form by applying the sum form to the integrals of simple functions and taking limits.

The Cauchy-Schwarz inequality is well known [1]. There are reversed versions of the Cauchy-Schwarz inequality that not as well known. The most basic such reversed inequality was proved by Pólya and Szegö in 1925 and many variations on the theme have been proved ever sense.

Pólya and Szegö’s inequality says

for some constant *C* provided *f* and *g* are bounded above and below. The constant *C* does not depend on the functions *per se* but on their upper and lower bounds. Specifically, assume

Then

where

Sometimes you’ll see *C* written in the equivalent form

This way of writing *C* makes it clear that the constant only depends on *m* and *M* via their ratio.

Note that if *f* and *g* are constant, then the inequality is exact. So the constant *C* is best possible without further assumptions.

The corresponding sum form follows immediately by using counting measure on the integers. Or in more elementary terms, by integrating step functions that have width 1.

Let *x* = (2, 3, 5) and *y* = (9, 8, 7).

The sum of the squares in *x* is 38 and the sum of the squares in *y* is 194. The inner product of *x* and *y* is 18+24+35 = 77.

The product of the lower bounds on *x* and *y* is *m* = 14. The product of the upper bounds is *M* = 45. The constant *C* = 59²/(4×14×45) = 1.38.

The left side of the Pólya and Szegö inequality is 38×194 = 7372. The right side is 1.38×77²= 8182.02, and so the inequality holds.

Let *f*(*x*) = 3 + cos(*x*) and let *g*(*x*) = 2 + sin(*x*). Let *E* be the interval [0, 2π].

The following Mathematica code shows that the left side of the Pólya and Szegö inequality is 171π² and the right side is 294 π².

The function *f* is bound below by 2 and above by 4. The function *g* is bound below by 1 and above by 3. So *m* = 2 and *M* = 12.

In[1]:= f[x_] := 3 + Cos[x] In[2]:= g[x_] := 2 + Sin[x] In[3]:= Integrate[f[x]^2, {x, 0, 2 Pi}] Integrate[g[x]^2, {x, 0, 2 Pi}] Out[3]= 171 π² In[4]:= {m, M} = {2, 12}; In[5]:= c = (m + M)^2/(4 m M); In[6]:= c Integrate[f[x] g[x], {x, 0, 2 Pi}]^2 Out[6]= 294 π²

[1] The classic book on inequalities by Hardy, Littlewood, and Pólya mentions the Pólya-Szegö inequality on page 62, under “Miscellaneous theorems and examples.” Maybe Pólya was being inappropriately humble, but it’s odd that his inequality isn’t more prominent in his book.

The post Reversed Cauchy-Schwarz inequality first appeared on John D. Cook.]]>Morse and Feshbach has density 1.005 g/cm³, denser than water.

**Gravitation** by Misner, Thorne, and Wheeler is, appropriately, a massive book. It’s my weightiest paperback book, literally and perhaps metaphorically. But it’s not that dense, about 0.66 g/cm³. It would easily float.

**The Mathematica Book** by Wolfram (4th edition) is about the same weight as Gravitation, but denser, about 0.80 g/cm³. Still, it would float.

**Physically Based Rendering** by Pharr and Humphreys weighs in at 1.05 g/cm³. Like Morse and Feshbach, it would sink.

But the densest of my books is **An Atlas of Functions** by Oldham, Myland, and Spanier, coming in at 1.12 g/cm³.

The books that are denser than water were all printed on glossy paper. Apparently matte paper floats and glossy paper sinks.

The post My densest books first appeared on John D. Cook.]]>*p* = *p*_{0} + *p*_{1}*i* + *p*_{2}*j* + *p*_{3}*k*

and consider the function that acts on quaternions by multiplying them on the left by *p*.

If we think of *q* as a vector in *R*^{4} then this is a linear function of *q*, and so it can be represented by multiplication by a 4 × 4 matrix *M*_{p}.

It turns out

How might you remember or derive this matrix? Consider the matrix on the left below. It’s easier to see the pattern here than in *M*_{p}.

You can derive *M*_{p} from this matrix.

Let’s look at the second row, for example. The second row of *M*_{p}, when multiplied by *q* as a column vector, produces the *i* component of the product.

How do you get an *i* term in the product? By multiplying the *i* component of *p* by the real component of *q*, or by multiplying the real component of *p* times the *i* component of *p*, or by multiplying the *i*/ *j* component of *p* by the *j* component of *q*, or by multiplying the *i*/*k* component of *p* by the *k* component of *q*.

The other rows follow the same pattern. To get the *x* component of the product, you add up the products of the *x*/*y* term of *p* and the *y* term of *q*. Here *x* and *y* range over

{1, *i*, *j*, *k*}.

To get *M*_{p} from the matrix on the right, replace 1 with the real component of *p*, replace *i* with the *i* component of *p*, etc.

As a final note, notice that the off-diagonal elements of *M*_{p} are anti-symmetric:

*m*_{ij} = –*m*_{ji}

unless *i* = *j*.

You can use the defining properties of an inner product to show that

This is a form of the so-called polarization identity. It implies that you can calculate inner products if you can compute norms.

So does this mean you can define an inner product on any space that has a norm?

No, it doesn’t work that way. The polarization identity says that if you have a norm **that came from an inner product** then you can recover that inner product from norms.

What would go wrong if tried to use the equation above to define an inner product on a space that doesn’t have one?

Take the plane *R*² with the max norm, i.e.

and define a function that takes two vectors and returns the right-side of the polarization identity.

This is a well-defined function, but it’s not an inner product. An inner product is bilinear, i.e. if you multiply one of the arguments by a constant, you multiply the inner product by the same constant.

To see that *f* is not an inner product, let *v* = (1, 0) and *w* = (0, 1). Then *f*(*v*, *w*) = -1/2, but *f*(2*v*, *w*) is also -1/2. Multiplying the first argument by 2 did not multiply the result by 2.

When we say that *R*² with the max norm doesn’t have an inner product, it’s not simply that we forgot to define one. We *cannot* define an inner product that is consistent with the norm structure.

Feedburner’s RSS service is still going, for now, but most RSS subscribers use my RSS feed without going through Feedburner.

If you’d like to subscribe to my monthly newsletter or to blog post notifications by email, you can do so here.

The post Email subscription switchover first appeared on John D. Cook.]]>

Just as the most direct approach to computing complex products requires 4 real multiplications, the most direct approach to quaternion products requires 16 real multiplications. (See the function `mult`

here.)

We can represent a quaternion as a pair of complex numbers by observing

*a* + *bi* + c*j* + *dk* = (*a* + *bi*) + (*c* + *di*)*j*

Gauss’ trick can multiply two complex numbers using only three real multiplications. It would seem that we could use something analogous to multiply two quaternions using only 3 complex multiplications, then apply Gauss’ trick to do each of these complex multiplications with 3 real multiplications. The net result would be computing a quaternion product in only 9 real multiplications.

Except that doesn’t work. A direct application of Gauss’ trick doesn’t work because *i* and *j* don’t commute. Perhaps there’s some variation on Gauss’ trick that would work, but I haven’t found one.

So let’s back up and take a different approach.

You can represent the quaternions as 2 × 2 complex matrices via

So we could multiply two quaternions by multiplying two 2 × 2 complex matrices. That would require 8 complex multiplications [1], which could be carried out using 3 real multiplications each. This would let us “reduce” the number of multiplications from 16 to 24.

But by symmetry we could eliminate half the multiplications. This puts us at 12 real multiplications, which is less than the 16 required by the most direct approach.

To see this, let α = *a* – *di* and β = *c* + *bi*. Then our matrix representation has the form

Then our quaternion product becomes the matrix product

We only need to compute the complex products in top row; we can recover the quaternion representation of the product from there. The products in the second row are negatives or conjugates of products in the first row.

Maybe there’s a Gauss-like trick to compute the top row of the matrix using only three complex products, but it can at least be done using four complex products.

[1] You could use Strassen’s approach to multiply two 2 × 2 matrices in 7 multiplications. That would cut the number of real multiplications down to 21, but exploiting symmetry cuts the number of multiplications further.

The post Quaternion products with fewer real products first appeared on John D. Cook.]]>*q**pq**

where *q* represents a rotation, *q** is its conjugate, and *p* is the the vector being rotated. (That’s leaving out some details that we’ll get to shortly.)

The primary advantage of using quaternions to represent rotations is that it avoids gimbal lock, a kind of coordinate singularity.

If you multiply quaternions directly from the definitions, the product takes 16 real multiplications. (See the function `mult`

in the Python code below.) So a naive implication of the product above, with two quaternion multiplications, would require 32 real multiplications.

You can do something analogous to Gauss’s trick for complex multiplication to do each quaternion product with fewer multiplications [1, 2], but there’s an even better way. It’s possible to compute the rotation in 15 multiplications [3].

Before we can say what the faster algorithm is, we have to fill in some details we left out at the top of the post. We’re rotating a vector in 3 dimensions using quaternions that have 4 dimensions. We do that by embedding our vector in the quaternions, carrying out the product above, and then pulling the rotated vector out.

Specifically, let

*v* = (*v*_{1}, *v*_{2}, *v*_{3})

be the vector we want to rotate. We turn *v* into a quaternion by defining

*p* = (o, *v _{1}, v_{2}, v_{3}*).

We encode our rotation in the unit quaternion

*q* = (*q*_{0}, *q*_{1}, *q*_{2}, *q*_{3})

where *q*_{0} = cos(θ/2) and (*q*_{1}, *q*_{2}, *q*_{3}) is the axis we want to rotate around and θ is the amount we rotate.

The rotated vector is the vector part of

*q**pq**

i.e. the vector formed by chopping off the first component of the quaternion product.

Let

*t* = 2 (*q*_{1}, *q*_{2}, *q*_{3}) × (*v*_{1}, *v*_{2}, *v*_{3}) = (*t*_{1}, *t*_{2}, *t*_{3})

where × is cross product. Then the rotated vector is

*v*‘ = (*v*_{1}, *v*_{2}, *v*_{3}) + *q*_{0}(*t*_{1}, *t*_{2}, *t*_{3}) + (*q*_{1}, *q*_{2}, *q*_{3}) × (*t*_{1}, *t*_{2}, *t*_{3}).

The algorithm involves two cross products, with 6 real multiplications each, and one scalar-vector product requiring 3 real multiplications, for a total of 15 multiplications. (I’m not counting the multiplication by 2 as a multiplications because it could be done by an addition or by a bit-shift.)

Here’s some Python code to carry out the naive product and the faster algorithm.

import numpy as np def mult(x, y): return np.array([ x[0]*y[0] - x[1]*y[1] - x[2]*y[2] - x[3]*y[3], x[0]*y[1] + x[1]*y[0] + x[2]*y[3] - x[3]*y[2], x[0]*y[2] - x[1]*y[3] + x[2]*y[0] + x[3]*y[1], x[0]*y[3] + x[1]*y[2] - x[2]*y[1] + x[3]*y[0] ]) def cross(x, y): return np.array( (x[1]*y[2] - x[2]*y[1], x[2]*y[0] - x[0]*y[2], x[0]*y[1] - x[1]*y[0]) ) def slow_rot(q, v): qbar = -q qbar[0] = q[0] return mult(mult(q, v), qbar)[1:] def fast_rot(q, v): t = 2*cross(q[1:], v[1:]) return v[1:] + q[0]*t + cross(q[1:], t)

And here’s some test code.

def random_quaternion(): x = np.random.normal(size=4) return x for n in range(10): v = random_quaternion() v[0] = 0 q = random_quaternion() q /= np.linalg.norm(q) vprime_slow = slow_rot(q, v) vprime_fast = fast_rot(q, v) print(vprime_fast - vprime_slow)

This prints 10 vectors that are essentially zero, indicating that `vprime_slow`

and `vprime_fast`

produced the same result.

It’s possible to compute the rotation in less time than two general quaternion products because we had three things we could exploit.

- The quaternions
*q*and*q** are very similar. - The first component of
*p*is zero. - We don’t need the first component of
*qpq**, only the last three components.

[1] I know you can multiply quaternions using 12 real multiplications, and I suspect you can get it down to 9. See this post.

[2] Note that I said “fewer multiplications” rather than “faster.” Gauss’ method eliminates a multiplication at the expense of 3 additions. Whether the re-arranged calculation is faster depends on the ratio of time it takes for a multiply and an add, which depends on hardware.

However, the rotation algorithm given here reduces multiplications and additions, and so it should be faster on most hardware.

[3] I found this algorithm here. The author cites two links, both of which are dead. I imagine the algorithm is well known in computer graphics.

The post Faster quaternion product rotations first appeared on John D. Cook.]]>The paper looks at three notions of a tensor

- a multi-indexed object that satisfies certain transformation rules
- a multilinear map
- an element of a tensor product of vector spaces

and explores (for 210 pages!) how they are related. The length comes primarily from the number of examples. The author is pulling on a thread that runs throughout a lot of mathematics.

Here’s the paper:

Lek-Heng Lim. Tensors in computations. Acta Numerica (2021), pp. 1–210. doi:10.1017/S0962492921000076. Available here.

The post Three kinds of tensor first appeared on John D. Cook.]]>Mathematica has a function, unsurprisingly called `Laplacian`

, that will compute the Laplacian of a given function in a given coordinate system. If you give it a specific function, it will compute the Laplacian of that function. But you can also give it a general function to find a general formula.

For example,

Simplify[Laplacian[f[r, θ], {r, θ}, "Polar"]]

returns

This is not immediately recognizable as the Laplacian from this post

because Mathematica is using multi-index notation, which is a little cumbersome for simple cases, but much easier to use than classical notation when things get more complicated. The superscript (0,2), for example, means do not differentiate with respect to the first variable and differentiate twice with respect to the second variable. In other words, take the second derivative with respect to θ.

Here’s a more complicated example with oblate spheroidal coordinates. Such coordinates come in handy when you need to account for the fact that our planet is not exactly spherical but is more like an oblate spheroid.

When we ask Mathematica

Simplify[Laplacian[f[ξ, η, φ], {ξ, η, φ}, "OblateSpheroidal"]]

the result isn’t pretty.

I tried using `TeXForm`

and editing it into something readable, but after spending too much time on this I gave up and took a screenshot. But as ugly as the output is, it would be uglier (and error prone) to do by hand.

Mathematica supports the following 12 coordinate systems in addition to Cartesian coordinates:

- Cylindrical
- Bipolar cylindrical
- Elliptic cylindrical
- Parabolic cylindrical
- Circular parabolic
- Confocal paraboloidal
- Spherical
- Bispherical
- Oblate spheroidal
- Prolate spheroidal
- Conical
- Toroidal

These are all orthogonal, meaning that surfaces where one variable is held constant meet at right angles. Most curvilinear coordinate systems used in practice are orthogonal because this simplifies a lot of things.

Laplace’s equation is separable in Stäckel coordinate systems. These are all these coordinate systems except for toroidal coordinates. And in fact Stäckel coordinates are the only coordinate systems in which Laplace’s equation is separable.

It’s often the case that Laplace’s equation is separable in orthogonal coordinate systems, but not always. I don’t have a good explanation for why toroidal coordinates are an exception.

If you’d like a reference for this sort of thing, Wolfram Neutsch’s tome Coordinates is encyclopedic. However, it’s expensive new and hard to find used.

The post Laplacian in various coordinate systems first appeared on John D. Cook.]]>def mult(a, b, c, d): re = a*c - b*d im = b*c + a*d return (re, im)

Gauss [1] discovered a way to do this with only three multiplications:

def gauss(a, b, c, d): t1 = a*c t2 = b*d re = t1 - t2 im = (a + b)*(c + d) - t1 - t2 return (re, im)

You can count the `*`

, `+`

, and `-`

symbols above and see that the first algorithm uses 4 multiplications and 2 additions/subtractions. The second algorithm uses 3 multiplications and 5 additions/subtractions.

If you’re carrying out calculations by hand, the latter is faster since multiplication takes much longer than addition or subtraction. In a computer, the latter may not be much faster or even any faster. It depends on the hardware, and whether the real and imaginary parts are integers or floats.

Gauss’s algorithm would be faster than the direct algorithm if the components were very large integers or very high-precision floats. The time required to add *n*-digit integers is *O*(*n*), but the time required to multiply *n*-digit numbers is at least *O*(*n* log *n*). So for large enough *n*, it’s worth doing some extra addition to save a multiplication.

I imagine someone has created an algorithm for quaternion multiplication analogous to the algorithm above for complex multiplication, but you could try it yourself as an exercise. I wonder how much you could reduce the number of component multiplications relative to the most direct approach. (Update: here’s my stab at it.)

- How to compute a complex square root
- Quaternion square roots
- How fast can you multiply two matrices?

[1] I’ve seen this described as “commonly attributed to Gauss.” Maybe there’s some debate over whether Gauss did this or whether he was the first.

The post Gauss algorithm for complex multiplication first appeared on John D. Cook.]]>This post will tie together many things I’ve blogged about before. The previous post **justified** separation of variables. This post will **illustrate** separation of variables.

Also, this post will show why you might care about **Bessel functions** and their zeros. I’ve written about Bessel functions before, and said that Bessel functions are to polar coordinates what sines and cosines are to rectangular coordinates. This post will make this analogy more concrete.

The separation of variables technique is typically presented in three contexts in introductory courses on differential equations:

- One-dimensional heat equation
- One-dimensional wave equation
- Two-dimensional (rectangular) Laplace equation

My initial motivation for writing this post was to illustrate separation of variables outside the most common examples. Separation of variables requires PDEs to have a special form, but not *as* special as the examples above might imply. A secondary motivation was to show Bessel functions in action.

Suppose you have a thin membrane, like a drum head, and you want to model its vibrations. By “thin” I mean that the membrane is sufficiently thin that we can adequately model it as a two-dimensional surface bobbing up and down in three dimensions. It’s not so thick that we need to model the material in more detail.

Let *u*(*x*, *y*, *t*) be the height of the membrane at location (*x*, *y*) and time *t*. The wave equation is a PDE modeling the motion of the membrane by

where Δ is the Laplacian operator. In rectangular coordinates the Laplacian is given by

We’re interested in a circular membrane, and so things will be much easier if we work in polar coordinates. In polar coordinates the Laplacian is given by

We will assume that our boundary conditions and initial conditions are radially symmetric, and so our solution will be radially symmetric, i.e. derivatives with respect to θ are zero. And in our case the wave equation simplifies to

Let *a* be the radius of our membrane. We will assume our membrane is clamped down on its boundary, like a drum head, which means we have the boundary condition

for all *t* ≥ 0. We assume the initial displacement and initial velocity of the membrane are given by

for all *r* between 0 and *a*.

Now we get down to separation of variables. Because we’re assuming radial symmetry, we’re down to a function of two variables: *r* for the distance from the center and *t* for time. We assume

and stick it into the wave equation. A little calculation shows that

The left side is a function of *t* alone, and the right side is a function of *r* alone. The only way this can be true for all *t* and all *r* is for both sides to be constant. Call this constant -λ². Why? Because looking ahead a little we find that this will make things easier shortly.

Separation of variables allowed us to reduce our PDE to the following pair of ODEs.

The solutions to the equation for *R* are linear combinations of the Bessel functions *J*_{0}(λ*r*) and *Y*_{0}(λ*r*) [1].

And the solutions to the equation for *T* are linear combinations of the trig functions cos(*c*λ*t*) and sin(*c*λ*t*).

The boundary condition *u*(*a*, *t*) = 0 implies the boundary condition *R*(*a*) = 0. This implies that λ*a* must be a zero of our Bessel function, and that all the *Y*_{0} terms drop out. This means that our solution is

where

Here α_{n} are the zeros of of the Bessel function *J*_{0}.

The coefficients *A*_{n} and *B*_{n} are determined by the initial conditions. Specifically, you can show that

The function *J*_{1} in the expression for the coefficients is another Bessel function.

The functions *J*_{0} and *J*_{1} are so important in applications that even the otherwise minimalist Unix calculator `bc`

includes these functions. (As much as I appreciate Bessel functions, this still seems strange to me.) And you can find functions for zeros of Bessel functions in many libraries, such as `scipy.special.jn_zeros`

in Python.

[1] This is why introductory courses are unlikely to include an example in polar coordinates. Separation of variables itself is no harder in polar coordinates than in rectangular coordinates, and it shows the versatility of the method to apply it in a different setting.

But the resulting ODEs have Bessel functions for solutions, and it’s understandable that an introductory course might not want to go down this rabbit trail, especially since PDEs are usually introduced at the end of a differential equation class when professors are rushed and students are tired.

[2] Snare drum image above “Snare drum” by YannickWhee is licensed under CC BY 2.0

The post Vibrating circular membranes first appeared on John D. Cook.]]>For example, you might first see the heat equation

*u*_{t} = *c*² *u*_{xx}.

The professor asks you to assume the solution has the form

*u*(*x*, *t*) = *X*(*x*) *T*(*t*).

i.e. the solution can be separated into the product of a function of *x* alone and a function of *t* alone.

Following that you might see Laplace’s equation on a rectangle

*u*_{xx} + *u*_{yy} = 0

with the analogous assumption that

*u*(*x*, *y*) = *X*(*x*) *Y*(*y*),

i.e. the product of a function of *x* alone and a function of *y* alone.

There are several possible responses to this assumption.

- Whatever you say, doc.
- How can you assume that?
- How do you know you’re not missing any possibilities?
- What made someone think to try this?

As with many things, separation of variables causes the most consternation for the moderately sophisticated students. The least sophisticated students are untroubled, and the most sophisticated student can supply their own justification (at least after the fact).

One response to question (2) is “Bear with me. I’ll show that this works.”

Another response would be “OK, how about assuming the solution is a *sum* of such functions. That’s a much larger space to look in. And besides, we *are* going to take sums of such solutions in a few minutes.” One could argue from functional analysis or approximation theory that the sums of separable functions are dense in reasonable space of functions [1].

This is a solid explanation, but it’s kind of anachronistic: most students see separation of variables long before they see functional analysis or approximation theory. But it would be a satisfying response for someone who is seeing all this for the second time. Maybe they were exposed to separation of variables as an undergraduate and now they’re taking a graduate course in PDEs. In an undergraduate class a professor could do a little foreshadowing, giving the students a taste of approximation theory.

Existence of solutions is easier to prove than uniqueness in this case because you can concretely construct a solution. This goes back to the “it works” justification. This argument deserves more respect than a sophomoric student might give it. Mathematics research is not nearly as deductive and mathematics education. You often have to make inspired guesses and then show that they work.

Addressing question (3) requires saying something about uniqueness. A professor could simply assert that there are uniqueness theorems that allow you to go from “I’ve found something that works” to “and so it must be the only thing that works.” Or one could sketch a uniqueness theorem. For example, you might apply a maximum principle to show that the difference between any two solutions is zero.

Question (4) is in some sense the most interesting question. It’s not a mathematical question *per se* but a question about how people do mathematics. I don’t know what was going through the mind of the first person to try separation of variables, or even who this person was. But a plausible line of thinking is that ordinary differential equations are easier than partial differential equations. How might you reduce a PDE to an ODE? Well, if the solution could be factored into functions of one variable, …

The next post will illustrate using separation of variables by solving the wave equation on a disk.

[1] Also, there’s the mind-blowing Kolmogorov-Arnol’d theorem. This theorem says any continuous function of several variables can be written as a sum of continuous separable functions. It doesn’t say you can make the functions in your sum smooth, but it suggests that sums of separable functions are more expressive than you might have imagined.

The post Justifying separation of variables first appeared on John D. Cook.]]>A natural reaction would be “Isn’t geometry intrinsically visual?” Indeed, geometry is motivated by things we can visualize. But modern developments of geometry have become heavy with formal machinery, so much so that one could reasonably ask “What happened to the geometry?”

Tristan Needham has a new book entitled Visual Differential Geometry and Forms that aims to put the geometry back into a first course on differential geometry. I expect it’s a good read based on having read his previous book Visual Complex Analysis.

I just got a review copy in the mail, and flipping through the book I can see that it lives up to its title. It has lots of illustrations, just as you’d expect from a book on differential geometry if you hadn’t taken courses in the subject.

The post Visual geometry first appeared on John D. Cook.]]>This made me wonder more generally: **Is there a project to create errata pages?** I’m thinking especially of mathematical reference books. I’m not concerned with spelling errors and such, but rather errors in equations that could lead to hours of debugging.

I would be willing to curate and host errata pages for a few books I care about, but it would be better if this were its own site, maybe a Wiki.

I don’t want to duplicate someone else’s effort. So if there’s already a site for community-generated errata pages, I could add a little content there. But if there isn’t such a project out there already, maybe someone would like to start one.

***

[1] Update: Jan Van lent found the errata page. See the first comment. Apparently the changes that were penciled into my book were copied from the author’s errata list. Also, these changes were applied to the paperback edition of the book.

The post Volunteer-generated errata pages first appeared on John D. Cook.]]>Imagine *m* employees being given a red ticket representing the first screening, and *m* being given a blue ticket representing the second screening. The tickets be passed out in

different ways. Of these, the number of ways the tickets could be passed out so that no one has both a red and a blue ticket is

because you can first pass out the red tickets, then choose *m* of the employees who did not get a red ticket for the blue tickets. And so the probability that no one will be picked twice, the probability that nobody holds both a red and a blue ticket, is

Now let’s plug in some numbers. Suppose a company has 100 employees and 20 are randomly screened each time. In two screenings, there is only a 0.7% chance that no one will be tested twice. Said another way, there’s a 99.3% chance that at least one person will be screened twice. Any given individual has a 20% chance of being selected each time, and so a 4% chance of being picked twice.

A variation on this problem is to compute the expected number of overlaps between two tests. With *N* = 100 and *m* = 20, we expect four people to be tested twice.

By the way, what if over half the employees are tested each time? For example, if a company of 100 people tests 60 people each time, it’s *certain* to test somebody twice. But the derivation above still works. The general definition of binomial coefficients takes care of this because the numerator will be zero if *m* is larger than half *N*. The number of ways to choose 60 things from a set of 40 things, for instance, is zero.

If you’re already subscribed via Feedburner, there’s **no need to sign up again** with MailerLite. Some time in the next few weeks I will import all the email addresses from Feedburner into MailerLite. There will be some formatting changes and hopefully that will be the only difference you notice.

You could also subscribe via my RSS feed or follow one of my Twitter accounts if you’d like.

The post Blog email subscription first appeared on John D. Cook.]]>I’ve also recommended to clients that they use an extended Kalman filter or homomorphic encryption; sometimes fancy math is called for. But often they need something simple.

However, knowing about division is not enough to appropriately recommend division. A ten-year-old child should know how to carry out division, but would be unlikely to realize when data is best analyzed by dividing one thing by another.

I saw something the other day saying that there are no child prodigies in applied math. Child prodigies flourish in closed worlds. Applied math is an open world. Not textbook math—that’s a closed world—but the skillful application of math to the messy real world.

It doesn’t take decades of experience to carry out division, but it may take decades of experience to wield it well.

The post Recommending division first appeared on John D. Cook.]]>The earlier post said that the inequality in the Fourier uncertainty principle is exact when *f* is proportional to a Gaussian probability density. G. H. Hardy proved this result in 1933 in the form of the following theorem.

Let *f* be a square-integrable function on the real line and assume *f* and its Fourier transform satisfy the following bounds

for some constant *C*. Then if *ab* > 1/4, then *f* = 0. And if *ab* = 1/4, *f*(*x*) = *c* exp(-*ax*²) for some constant *c*.

Let’s translate this into probability terms by setting

Now Hardy’s theorem says that if *f* is bounded by a multiple of a Gaussian density with variance σ² and its Fourier transform is bounded by a multiple of a Gaussian density with variance τ², then the product of the two variances is no greater than 1. And if the product of the variances equals 1, then *f* is a multiple of a Gaussian density with variance σ².

Theorem 3 in [1] says that if *u*(*t*, *x*) is a solution to the free Schrödinger’s equation

then *u* at different points in time satisfies a theorem similar to Hardy’s theorem. In fact, the authors show that this theorem is equivalent to Hardy’s theorem.

Specifically, if *u* is a sufficiently smooth solution and

then αβ > (4*T*)^{-2} implies *u*(*t*, *x*) = 0, and αβ = (4*T*)^{-2} implies

[1] Aingeru Fernández-Bertolin and Eugenia Malinnikova. Dynamical versions of Hardy’s uncertainty principle: A survey. Bulletin of the American Mathematical Society. DOI: https://doi.org/10.1090/bull/1729

The post Fourier, Gauss, and Heisenberg first appeared on John D. Cook.]]>But you can’t really “read” lambda calculus so much as you can mentally execute it. Not many people can look at more than a few characters of lambda calculus and have any idea what it represents, though it’s not hard to mechanically manipulate it. I suppose in hindsight that’s what you’d expect from a theoretical model for computing.

The post The Programmer’s Ring by Loup Vaillant gives a notation for lambda calculus that I hadn’t seen before, notation that in my opinion is easier to understand. The author combines two ideas: de Bruijn indices (which I’d seen before), and replacing lambdas with brackets (which I had not seen).

The idea of de Bruijn indices is to replace variables with numbers. Vaillant describes De Bruijn indices saying

… variables are not referenced by name, but by stack offset: the last variable is accessed with the number 0, the second last with the number 1, and so on …

This would be a terrible idea in an ordinary programming language, but in lambda calculus it actually helps. Lambda calculus is *so* low-level that the variables aren’t meaningful anyway. It’s not like you can look at a few lines of lambda calculus and say “Oh, this is calculating net present value, so this variable must be the interest rate. It would be easier to read if they just called it `interest_rate`

instead of 42 because it’s the 42nd variable.”

Wikipedia describes de Bruijn indices differently than Vaillant:

Each De Bruijn index is a natural number that represents an occurrence of a variable in a λ-term, and denotes the number of binders that are in scope between that occurrence and its corresponding binder.

It’s not immediately obvious that these two definitions of a de Bruijn index are the same, and in fact they’re not. But the only difference is that the first definition numbers from 0 and the second numbers from 1. This post will count from 0 with Vaillant.

After rewriting lambda calculus expressions to use de Bruijn indices, Vaillant fully parenthesizes the expressions (using braces, saving parentheses for grouping, as Mathematica does) then deletes the λs: every bracketed expression starts with a λ, so the λ itself is redundant. Also, you can delete the name of the variable that the λ takes since the variable numbering takes care of that.

OK, so what does all this buy us? As Vaillant points out, this notation is more concise, and it makes some patterns easier to recognize. For example, the famous Y combinator

Y = λf. (λx. f (x x)) λx. f (x x)

becomes

Y = [ [1 (0 0)] [1 (0 0)] ]

The latter makes the repetition inside more apparent.

As another example, let’s look at Church encoding for numerals:

0 → λf. λx. x 1 → λf. λx. f x 2 → λf. λx. f (f x) 3 → λf. λx. f (f (f x)) …

Here’s what Church numerals would look like in the notation described above.

0 → [[ 0 ]] 1 → [[ 1 0 ]] 2 → [[ 1 (1 0) ]] 3 → [[ 1 (1 (1 0)) ]] …

You can convert a Church numeral to its corresponding integer by adding all the numbers in the lambda calculus expression.

Vaillant’s notation is more formal than traditional lambda calculus notation, but lambda calculus is formal anyway, so you might as well carry the formality a step further if it makes things easier.

“RIPEMD” stands for “RIPE Message Digest,” where “RIPE” stands for “RACE Integrity Primitives Evaluation” and where “RACE” stands for “Research and Development in Advanced Communications Technologies in Europe”—a nice example of a recursive abbreviation.

This deserves a diagram:

I created the diagram above with DITAA.

[1] Introduction to Cryptography with Open-Source Software by Alasdair McAndrew. CRC Press. 2011

The post What does RIPEMD stand for? first appeared on John D. Cook.]]>It wouldn’t bother me if they weren’t consistent. I ordinarily wouldn’t bother to check such things. But I remember looking into time dilation before and being surprised how little effect velocity has until you get very close to the speed of light. I couldn’t decide whether the relativistic effect in the novel sounded too large or too small.

If a stationary observer is watching a clock moving at velocity *v*, during one second of the observer’s time,

seconds will have elapsed on the moving clock.

Even at 20% of the speed of light, the moving clock only appears to slow down by about 2%.

If, as in the novel, a spaceship is moving at 56.7% of the speed of light, then for every second an Earth-bound observer experiences, someone on the ship will experience √(1 – 0.567²) = 0.82 seconds. So time would run about 20% slower on the ship, as the novel says.

The author must have either done this calculation or asked someone to do it for him. I had a science fiction author ask me for something a while back, though I can’t remember right now what it was.

You can expand the expression above in a Taylor series to get

and so velocities much smaller than the speed of light, the effect of time dilation is 0.5 *v*²/*c*², a quadratic function of velocity. You can use this to confirm the comment above that when *v*/*c* = 0.2, the effect of time dilation is about 2%.

GPS satellites travel at about 14,000 km/hour, and so the effect of time dilation is on the order of 1 part in 10^{10}. This would seem insignificant, except it amounts to milliseconds per year, and so it does make a practical difference.

For something moving 100 times slower, like a car, time dilation would be 10,000 times smaller. So time in a car driving at 90 miles per hour slows down by one part in 10^{14} relative to a stationary observer.

The math in the section above is essentially the same as the math in the post explaining why it doesn’t matter much if a tape measure does run exactly straight when measuring a large distance. They both expand an expression derived from the Pythagorean theorem in a Taylor series.

The post Time dilation in SF and GPS first appeared on John D. Cook.]]>*kn* mod 1

where *k* = log_{2}(3). This value of *k* arose out of exploring perfect fifths and octaves.

We saw that the sequence above fills in the unit interval fairly well, though not exactly regularly. The first 12 elements of the sequence nearly divide the unit interval into 12 nearly equal parts. And the first 53 elements of the sequence divide the interval into 53 nearly equal parts.

There’s something interesting going on here. We could set *k* = 1/12 and divide the unit interval into *exactly* 12 equal parts. But if we did, then the same sequence couldn’t also divide the interval in 53 equal parts. Our sequence fills in the unit interval fairly uniformly, more uniformly than a random sequence, but less evenly than a sequence designed to obtain a particular spacing.

Our sequence is delicately positioned somewhere between being completely regular and being completely irregular (random). The sequence is dense in the interval, meaning that it comes arbitrarily close to any point in the interval eventually. It could not do that if it were more regular. And yet it is more regular than a random sequence in a way that we’ll now make precise.

Let’s pick an interval and see how often the sequence lands in that interval and compare the results to a random sequence. For my example I chose [0.1, 0.2], but I’d get similar results with any other interval.

The horizontal axis is the length of the sequence and the vertical axis is the proportion of sequence elements that fall into the test interval.

Our sequence spends about 1/10 of its time in this interval of length 1/10. The same is true of a random sequence, *eventually*, and *with high probability*. Our sequence achieves the limiting behavior of a random sequence, but it achieves it more quickly, and it does so with certainty, not just with high probability.

Here’s another example just to show that our initial choice of interval didn’t matter. We use the interval

[π – 3, *e* – 2] = [0.141…, 0.718…]

and we get analogous results.

The interval has width 0.57, and our sequence spends 57% of its time in the interval.

Our sequence is an example of a **Quasi-Monte Carlo** (QMC) sequence. The name implies that the sequence is something like a random sequence, but better in some sense. QMC sequences are said to have **low discrepancy**, meaning that the difference (discrepancy) between the number of points that land in an interval and the expected number is low, lower than what you’d expect from a random (Monte Carlo) sequence.

To define discrepancy rigorously we get rid of our arbitrary choice of interval and take the supremum over all subintervals. Or we could define the star-discrepancy by taking the sumpremum over subintervals that must start with 0 on the left end.

It turns out we could do better with a different choice of *k*. For any irrational *k* the sequence

*kn* mod 1

has low discrepancy, i.e. lower than you’d expect from a random sequence, but the discrepancy is as small as possible when *k* is the golden ratio.

We’ve been looking at a QMC sequence in an interval, but QMC sequences in higher dimension are defined analogously. In any volume of space, the visits the space more uniformly than a random sequence.

QMC sequences are usually applied in higher dimensions, and the primary application is numerical integration. Quasi-Monte Carlo integration is more efficient than Monte Carlo integration, though the error is harder to estimate; QMC integration often does better in practice than pessimistic theoretical bounds predict. A hybrid of QMC and MC can retain the efficiency of QMC and a some of the error estimation of MC.

See more on QMC sequences are used in numerical integration.

The post A gentle introduction to QMC and discrepancy first appeared on John D. Cook.]]>Going up by 12 perfect fifths is very nearly the same as going up 7 octaves. That is,

(3/2)^{12} ≈ 2^{7}

or in other words,

2^{7/12} ≈ 3/2.

This is why equal temperament tuning works. More on that here.

Going back to our first approximation, we can say that (12, 7) is an approximate solution to the equation

(3/2)^{x} = 2^{y}.

One could naturally ask whether there are better approximate solutions. One such solution, dating back to 40 BC, came from the Chinese scholar King Fang [1]. He discovered that (53, 31) is a substantially more accurate solution than (12, 7), between 6 and 7 times better.

>>> 1.5**12/2**7 1.0136432647705078 >>> 1.5**53/2**31 1.0020903140410862

In other words, going up 53 perfect fifths is nearly the same as going up 31 octaves. You could divide the octave into 53 parts using perfect fifths; we will demonstrate this visually below. However, we’re moving more the realm of number theory than practical music theory at this point.

We could change our problem slightly, making it mathematically simpler though less obviously related to music, by putting all our powers of 2 on one side and looking for powers of 3 that approximately equal powers of two. That is, we could look at approximate solutions to

3^{x} = 2^{y}.

where now *x* and *y* have different meanings; our new *y* is our old *y* plus our old *x*. No power of 3 will ever equal a power of 2, but you can find powers of 3 close to powers of 2, making the ratio as small as you’d like.

Taking the logarithm of both sides in base 2, we recast the problem as looking for integers *n* such that

log_{2}(3) *n*

is approximately an integer.

If we define *k* = log_{2}(3), we are looking for *n* such that *kn* mod 1, the integer part of *kn*, is near 0.

The map

*n* → *kn *mod 1

is ergodic because *k* is irrational, and so its image in the unit interval is dense as *n* goes to infinity. This means we can find solutions as close we’d like to 0. It also means we can find solutions as close as we’d like to any other number in the interval.

Here’s a plot of our map for *n* up to 12.

Note that the range of the map falls very nearly on the 12 evenly-spaced horizontal lines. This corresponds to the circle of fifths filling in the 12 notes of the chromatic scale.

Now let’s keep going for *n* up to 53.

Look closely at the bottom of the plot. The graph gets close to 0 at 12, and it gets even closer to 0 at 53. The value of *kn* mod 1 doesn’t get smaller than the value at *n* = 53 until *n* = 359 where the value is about twice as small.

When we go up by 12 perfect fifths, we don’t end up exactly on the note we started on; 12 fifths is a little more than 7 octaves. If we keep going up in fifths we fill in notes a little higher than the original chromatic scale. Here’s a plot for *n* up to 24, with the values sorted so we can focus on the range rather than the ups and downs of filling in the range.

Every note in the original chromatic scale now has a counterpart that’s somewhere around 25 cents sharp.

If we keep going to *n* = 53, we fill in the octave more evenly. Here’s a plot of the sorted values.

- What key has 30 sharps?
- Listening to golden angles
- What does ergodic mean?
- The mathematics of Deep Note

[1] A. L. Leigh Silver. Some Musico-Mathematical Curiosities. The Mathematical Gazette , Feb., 1964, Vol. 48, No. 363 (Feb., 1964), pp. 1-17.

The post Perfect fifths, octaves, and an ergodic map first appeared on John D. Cook.]]>***

This page quotes a typesetter who said she calls this a *zareba*. The OED defines a *zareba* as a kind of cattle pen but does not mention its use as a typographical term.

According to Wikipedia, a row of three asterisks is a particular kind of *dinkus*, punctuation used to signify a section break. My guess is that the word is related to *dingus* and *dingbat* but I haven’t been able to find an etymology. The OED does not have an entry for *dinkus*.

This article attests to both *zareba* and *dinkus* as defined above.

Apparently the plural of *dinkus* is *dinkuses*.

⁂

When the three stars are arranged in a triangle, as above, this is called an *asterism*. This term is less obscure than *zareba* or *dinkus*. It has an OED entry and Unicode code point (U+2042). In case your browser didn’t display the asterism above as a character, the following another one as an image.

⁂

It’s my impression that the zareba *symbol* is more common than the asterism *symbol*, but the word *asterism* is more common than the word *zareba*, at least in typographical usage.

There is no command for the asterism in The Comprehensive LaTeX Symbol List, but the document does explain how to create your own, citing the following solution by Peter Flynn.

\newcommand{\asterism}{ \smash{ \raisebox{-.5ex}{ \setlength{\tabcolsep}{-.5pt} \begin{tabular}{@{}cc@{}} \multicolumn2c*\\[-2ex]*&* \end{tabular} } } }

If you are using XeTeX you can just insert the Unicode character U+2042 directly, but the command above will work with plain TeX or LaTeX.

The post Zareba, asterism, and dinkus first appeared on John D. Cook.]]>sin(*x* + *y*) sin(*x* – *y*) = (sin(*x*) + sin(*y*)) (sin(*x*) – sin(*y*))

It looks as if someone fallaciously expanded

sin(*x* + *y*) “=” sin(*x*) + sin(*y*)

and

sin(*x* – *y*) “=” sin(*x*) – sin(*y*).

Although both expansions are wrong, their product is correct. That is, for all *x* and *y*,

sin(*x* + *y*) sin(*x* – *y*) = sin²(*x*) – sin²(*y*).

Not only does the sine function satisfy this identity, it is almost the only function that does [1]. The only functions *f* that satisfy

*f*(*x* + *y*) *f*(*x* – *y*) = *f*²(*x*) – *f*²(*y*)

are *f*(*x*) = *ax* and *f*(*x*) = *a* sin(*bx*), given some mild regularity conditions on *f*. [2]

If *f*(*x*) satisfies the identity above, then so does

*g*(*x*) = *a* *f*(*bx*)

and so the identity could only characterize sine up to a change in amplitude and frequency.

You could think of the linear solution

*f*(*x*) = *ax*

as the limiting case of the solution

*f*(*x*) = *ab* sin(*x*/*b*)

as *b* goes to infinity. So you could include linear functions as sine waves with “infinite amplitude” and “zero frequency.”

In [1] the authors also prove that if we restrict *f* to be a real-valued function of a real variable, the only solutions are of the form *ax*, *a* sin(*bx*), and *a* sinh(*bx*).

The hyperbolic sine is only a new solution from the perspective of real numbers. Since

sinh(*x*) = –*i* sin(*ix*),

sinh was already included in *a* sin(*bx*) as the special case *a* = –*i* and *b* = *i*.

***

[1] R. A. Rosenbaum and S. L. Segal. A Functional Equation Characterising the Sine. The Mathematical Gazette , May, 1960, Vol. 44, No. 348 (May, 1960), pp. 97-105.

[2] It is sufficient to assume *f* is an entire function, i.e. a complex function of a complex variable that is analytic everywhere. Rosenbaum and Segal give the weaker but more complicated condition that *f* is “on defined over all complex numbers, continuous at a point, bounded on every closed set, and such that the set of non-zero zeros of *f* is either empty or bounded away from zero.”

The post A unique property of sine first appeared on John D. Cook.]]>