Given the unique risks associated with the use of contractors operating outside the jurisdiction of the United States, CMS encourages sponsors using offshore subcontractors to take

extraordinary measuresto ensure that offshore arrangements protect beneficiary privacy [3, emphasis added].

Why are the risks unique? For one thing, subcontractors outside the US may not be familiar with US law or held accountable for complying with that law.

What are the necessary “extraordinary measures”? The CMS does not say because the answer depends very much on context. The probability of being able to identify someone from a set of data fields depends on what those fields are. It also can change over time, and so the conclusion needs to be reviewed periodically.

It is legal to give a third party access to PHI if there is a BAA (business associate agreement) in place. For example, if a hospital outsources billing, the company doing the billing must see personal information in order to carry out their business function. But with offshore processing, it seems safest, if practical, to proceed as if there were no BAA in place and deidentify the data before it leaves the US.

If you’d like help with privacy regulation compliance or data de-identification, let’s talk.

[1] Nothing in this blog post is legal advice. Compliance with privacy regulation is both a legal and statistical matter. I address statistical questions and let lawyers address legal questions. I have a lawyer specializing in healthcare law who I recommend if clients don’t have their own legal expert.

[2] CMS memo September 16, 2011. Contract Year (CY) 2012 Medicare Advantage and Part D Readiness Checklist. Available here.

[3] Allen Briskin, Lisa C. Earl, Gerry Hinkley, and Joseph E. Kendall. Offshoring Health Information: Issues and Lingering Concerns. Journal of Health & Life Sciences Law. Vol. 8, No. 1.

]]>You only need one unit of length, say a meter. You could express all distances in terms of meters and dispense with kilometers, millimeters, etc. And in fact, this is essentially what science does. It hardly makes a difference whether you say the Andromeda galaxy is 2.365 × 10^{22} meters away or 2.365 × 10^{19} kilometers away. Especially if you need scientific notation anyway, you might as well use base units.

Multiple units for the same fundamental quantity are psychologically convenient though not logically necessary. Astronomers, for example, simply use meters for **calculation**. But they may use other units such as AUs or light years for **conceptualization** and **communication**. Scientists do all time calculations in seconds, though it’s easier to understand long time scales in units of years.

It’s commonly said that it’s easier to do science in SI units than in imperial units. This is in some sense true, but it would be more accurate to say it’s easier to do science in terms of **fundamental units**. The SI prefixes kilo, mega, etc. are a sort of intermediate step to letting go of all but fundamental units.

One could imagine an alternative history in which science standardized on the inch rather than the meter. Distances like yards and miles could have been seen as auxiliary units for conceptual purposes, much like the astronomical unit and light year.

***

This post was motivated by an online discussion of kilobytes and megabytes. I said something about a kilobyte being 1,000 bytes, and someone replied saying technically this should be 1,024. I replied saying that even more technically, it’s 1,000 bytes. This is a good example of coming full circle. The least informed and most informed agree, while those in the middle disagree.

You might naively assume that because 1,000 grams is a kilogram that 1,000 bytes is a kilobyte. But for years, a kilobyte was commonly understood to mean 1,024 bytes. To address the confusion, the IEC introduced new prefixes such as kibi and mebi in 1998. Now a kibibyte is 2^{10} = 1,024 bytes, a mebibyte is 2^{20} = 1,048,576 bytes, etc. And so a kilobyte now refers to 1,000 bytes, just as one might naively have assumed.

In practice the difference hardly matters. If you want to indicate exactly 2^{20} bytes, you can say “one mebibyte.” But in my opinion it’s even better to simply say “2^{20} bytes” because then there is no chance of being misunderstood.

To my mind, terms like “megabyte” are something like light years, conceptual units. Whether you think a megabyte is exactly a million bytes or approximately a million bytes doesn’t matter in informal communication. If it’s necessary to be precise, simply state the number of bytes as an integer, using powers of 10 or 2 if that helps.

The purpose of communication is to be understood, not to be correct. If you use megabyte to mean exactly 1,000,000 bytes, you may be correct according to the IEC, but you run a significant risk of being misunderstood. Better, in my opinion, to just use the fundamental unit bytes.

I made a similar point in my post on names for extremely large numbers. When you use these names, there’s a good chance your listener will either not know what you’re talking about, or worse, misunderstand what you mean.

]]>The US uses a mix of imperial and metric units of measure. Some people, almost all outside the US, are quite agitated by this. In practice, the two systems coexist peacefully.

Americans buy milk by the gallon and wine by the milliliter. Milk typically comes in gallon jugs, and wine typically comes in 750 milliliter bottles. The inconsistency is curious but harmless.

For practical purposes, milk is sold in units of jugs, not gallons. And wine is sold in units of bottles, not milliliters. When I go to the store to buy a jug of milk or a bottle of wine, I’m not that aware of the units. If overnight the size of a milk jug changed from one gallon to four liters, or if wine went from 750 milliliter bottles to 25 fluid ounce bottles, no one would know.

If you went to a store determined to buy *exactly* equal volumes of milk and wine, you couldn’t do it. You’d need to buy more containers of each than any store could possibly sell [1]. But nobody ever does this.

And if for some bizarre reason you really did want to buy equal volumes of milk and wine, the biggest problem would not be gallons versus milliliters but rather the uncertainty in both. A jug of milk isn’t *exactly* one gallon, nor is a bottle of wine *exactly* 750 milliliters.

**Related post**: In defense of complicated measurement systems

[1] A US gallon is 231 cubic inches, and an inch is 2.54 centimeters. From this can work out that 31,250,000 milk jugs contain the same volume as 157,725,491 wine bottles.

]]>For example, suppose you wanted to find *N* so that 1/7 of the first *N* positive integers are prime. Then the following Python code shows you could pick *N* = 3059.

from sympy import primepi m = 7 N = 2*m while N / primepi(N) != m: N += m print(N)

[1] Solomon Golomb. On the Ratio of N to π(N). The American Mathematical Monthly, Vol. 69, No. 1 (Jan., 1962), pp. 36-37.

The proof is short, and doesn’t particularly depend on the distribution of primes. Golomb proves a more general theorem for any class of integers whose density goes to zero.

]]>- Why programmers are not paid in proportion to productivity
- Selection bias and bombers
- Automate to save mental energy, not time
- The most important skill in software development
- Golden powers are nearly integers
- Ten life lessons from differential equations
- Math library functions that seem unnecessary
- Unusual proof there are infinitely many primes
- Flying to Mars in three days
- Floating point error is the least of my worries

This post explores the analogy between trigonometric functions and Jacobi elliptic functions.

In the previous post we mentioned a connection between the argument *u* of a Jacobi function and the amplitude φ:

We can use this to define the functions **sn** and **cn**. Leaving the dependence on *m* implicit, we have

If *m* = 0, then *u* = φ and so sn and cn are exactly sine and cosine.

There’s a third Jacobi function we didn’t discuss much last time, and that’s **dn**. It would be understandable to expect dn might be analogous to tangent, but it’s not. The function dn is the derivative of φ with respect to *u*, or equivalently

Just as there are more trig functions than just sine and cosine, there are more Jacobi functions than sn, cn, and dn. There are two ways to define the rest of the Jacobi functions: in terms of ratios, and in terms of zeros and poles.

I wrote a blog post one time asking how many trig functions there are. The answer depends on your perspective, and I gave arguments for 1, 3, 6, and 12. For this post, lets say there are six: sin, cos, tan, sec, csc, and ctn.

One way to look at this is to say there are as many trig functions as there are ways to select distinct numerators and denominators from the set {sin, cos, 1}. So we have tan = sin/cos, csc = 1/sin, etc.

**There are 12 Jacobi elliptic functions**, one for each choice of distinct numerator and denominator from {sn, cn, dn, 1}. The name of a Jacobi function is two letters, denoting the numerator and denominator, where we have {s, c, d, n} abbreviating {sn, cn, dn, 1}.

For example, cs(*u*) = sn(*u*) / cn(*u*) and ns(*u*) = 1 / sn(*u*).

Note that to take the reciprocal of a Jacobi function, you just reverse the two letters in its name.

The twelve Jacobi functions can be classified [1] in terms of their zeros and poles over a rectangle whose sides have length equal to quarter periods. Let’s look at an analogous situation for trig functions before exploring Jacobi functions further.

Trig functions are periodic in one direction, while elliptic functions are periodic in two directions in the complex plane. A quarter period for the basic trig functions is π/2. The six trig functions take one value out of {0, 1, ∞} at 0 and different value at π/2. So we have one trig function for each of the six ways to chose an permutation of length 2 from a set of 3 elements.

In the previous post we defined the two quarter periods *K* and *K*‘. Imagine a rectangle whose corners are labeled

s = (0, 0)

c = (*K*, 0)

d = (*K*, *K*‘)

n = (0, *K*‘)

Each Jacobi function has a zero at one corner and a pole at another. The 12 Jacobi function correspond to the 12 ways to chose a permutation of two items from a set of four.

The name of a Jacobi function is two letters, the first letter corresponding to where the zero is, and the second letter corresponding to the pole. So, for example, sn has a zero at s and a pole at n. These give same names as the ratio convention above.

The Jacobi functions satisfy many identities analogous to trigonometric identities. For example, sn and cn satisfy a Pythagorean identity just like sine and cosine.

Also, the Jacobi functions have addition theorems, though they’re more complicated than their trigonometric counterparts.

The derivatives of the basic Jacobi functions are given below.

Note that the implicit parameter *m* makes an appearance in the derivative of dn. We will also need the complementary parameter *m*‘ = 1 – *m*.

The derivatives of all Jacobi functions are summarized in the table below.

The derivatives of the basic Jacobi functions resemble those of trig functions. They may look more complicated at first, but they’re actually more regular. You could remember them all by observing the patterns pointed out below.

Let *w*, *x*, *y*, *z* be any permutation of {s, n, d, c}. Then the derivative of *wx* is proportional to *yx zx*. That is, the derivative of every Jacobi function *f* is a constant times two other Jacobi functions. The names of these two functions both end in the same letter as the name of *f*, and the initial letters are the two letters not in the name of *f*.

The proportionality constants also follow a pattern. The sign is positive if and only if the letters in the name of *f* appear in order in the sequence s, n, d, c. Here’s a table of just the constants.

Note that the table is skew symmetric, i.e. its transpose is its negative.

[1] An elliptic function is determined, up to a constant multiple, by its periods, zeros, and poles. So not only do the Jacobi functions have the pattern of zeros and poles described here, these patterns uniquely determine the Jacob functions, modulo a constant. For (singly) periodic functions, the period, zeros, and poles do not uniquely determine the function. So the discussion of zeros and poles of trig functions is included for comparison, but it does not *define* the trig functions.

The image above is a plot of the function sn [1].

Jacobi functions take two inputs. We typically think of a Jacobi function as being a function of its first input, the second input being fixed. This second input is a “dial” you can turn that changes their behavior.

There are several ways to specify this dial. I started with the word “dial” rather than “parameter” because in this context **parameter** takes on a technical meaning, one way of describing the dial. In addition to the “parameter,” you could describe a Jacobi function in terms of its **modulus** or **modular angle**. This post will be a Rosetta stone of sorts, showing how each of these ways of describing a Jacobi elliptic function are related.

The **parameter** *m* is a real number in [0, 1]. The **complementary parameter** is *m*‘ = 1 – *m*. Abramowitz and Stegun, for example, write the Jacobi functions sn and cn as sn(*u* | *m*) and cn(*u* | *m*). They also use *m*_{1} = rather than *m*‘ to denote the complementary parameter.

The **modulus** *k* is the square root of *m*. It would be easier to remember if *m* stood for modulus, but that’s not conventional. Instead, *m* is for parameter and *k* is for modulus. The **complementary modulus** *k*‘ is the square root of the complementary parameter.

The **modular angle** α is defined by *m* = sin² α.

Note that as the parameter *m* goes to zero, so does the modulus *k* and the modular angle α. As any one of these three goes to zero, the Jacobi functions sn and cn converge to their counterparts sine and cosine. So whether your dial is the parameter, modulus, or modular angle, sn converges to sine and cn converges to cosine as you turn the dial toward zero.

As the parameter *m* goes to 1, so does the modulus *k*, but the modular angle α goes to π/2. So if your dial is the parameter or the modulus, it goes to 1. But if you think of your dial as modular angle, it goes to π/2. In either case, as you turn the dial to the right as far as it will go, sn converges to hyperbolic secant, and cn converges to the constant function 1.

In addition to parameter, modulus, and modular angle, you’ll also see Jacobi function described in terms of *K* and *K*‘. These are called the quarter periods for good reason. The functions sn and cn have period 4*K* as you move along the real axis, or move horizontally anywhere in the complex plane. They also have period 4*i**K*‘. That is, the functions repeat when you move a distance 4*K*‘ vertically [2].

The quarter periods are a function of the modulus. The quarter period *K* along the real axis is

and the quarter period *K*‘ along the imaginary axis is given by *K*‘(*m*) = *K*(*m*‘) = *K*(1 – *m*).

The function *K*(*m*) is known as “the complete elliptic integral of the first kind.”

So far we’ve focused on the second input to the Jacobi functions, and three conventions for specifying it.

There are two conventions for specifying the first argument, written either as φ or as *u*. These are related by

The angle φ is called the **amplitude**. (Yes, it’s an *angle*, but it’s called an *amplitude*.)

When we said above that the Jacobi functions had period 4*K*, this was in terms of the variable *u*. Note that when φ = π/2, *u* = *K*.

Mathematica uses the *u* convention for the first argument and the parameter convention for the second argument.

The Mathematica function `JacobiSN[u, m]`

computes the function sn with argument *u* and parameter *m*. In the notation of A&S, sn(*u* | *m*).

Similarly, `JacobiCN[u, m]`

computes the function cn with argument *u* and parameter *m*. In the notation of A&S, cn(*u* | *m*).

We haven’t talked about the Jacobi function dn up to this point, but it is implemented in Mathematica as `JacobiDN[u, m]`

.

The function that solves for the amplitude φ as a function of *u* is `JacobiAmplitude[um m]`

.

The function that computes the quarter period *K* from the parameter *m* is `EllipticK[m]`

.

The SciPy library has one Python function that computes four mathematical functions at once. The function `scipy.special.ellipj`

takes two arguments, *u* and *m*, just like Mathematica, and returns sn(*u* | *m*), cn(*u* | *m*), dn(*u* | *m*), and the amplitude φ(*u*, *m*).

The function *K*(*m*) is implemented in Python as `scipy.special.ellipk`

.

- Jacobi functions defined by a system of differential equations
- Surprising curves with sine and sn
- Diagram of relationships between special functions

[1] The plot was made using `JacobiSN[0.5, z]`

and the function `ComplexPlot`

described here.

[2] Strictly speaking, 4*iK*‘ is *a* period. It’s the smallest vertical period for cn, but 2*iK*‘ is the smallest vertical period for sn.

But that’s not usually how we multiply matrices. That notion of multiplication hardly involves the matrix structure; it treats the matrix as an ordered container of numbers, but not as a way of representing a linear transformation. Once you have a little experience with linear algebra, the customary way of multiplying matrices seems natural, and the way that may have seemed natural at first glance seems kinda strange.

The componentwise product of matrices is called the **Hadamard product** or sometimes the **Schur product**. Given two *m* by *n* matrices *A* and *B*, the Hadamard product of *A* and *B*, written *A* ∘ *B*, is the *m* by *n* matrix *C* with elements given by

*c*_{ij} = *a*_{ij} *b*_{ij}.

Because the Hadamard product hardly uses the linear structure of a matrix, you wouldn’t expect it to interact nicely with operations that depend critically on the linear structure. And yet we can give a couple theorems that do show a nice interaction, at least when *A* and *B* are positive semi-definite matrices.

The first is the **Schur product theorem**. It says that if *A* and *B* are positive semi-definite *n* by *n* matrices, then

det(A ∘ *B*) ≥ det(*A*) det(*B*)

where det stands for determinant.

Also, there is the following theorem of Pólya and Szegö. Assume *A* and *B* are symmetric positive semi-definite *n* by *n* matrices. If the eigenvalues of *A* and *B*, listed in increasing order, are α_{i} and β_{i} respectively, then for every eigenvalue λ of A ∘ *B*, we have

α_{1} β_{1} ≤ λ ≤ α_{n} β_{n}.

If you multiply two (multidimensional) arrays in NumPy, you’ll get the componentwise product. So if you multiply two matrices *as arrays* you’ll get the Hadamard product, but if you multiply them *as matrices* you’ll get the usual matrix product. We’ll illustrate that below. Note that the function `eigvalsh`

returns the eigenvalues of a matrix. The name may look a little strange, but the “h” on the end stands for “Hermitian.” We’re telling NumPy that the matrix is Hermitian so it can run software specialized for that case [1].

from numpy import array, matrix, array_equal, all from numpy.linalg import det, eigvalsh A = array([ [3, 1], [1, 3] ]) B = array([ [5, -1], [-1, 5] ]) H = array([ [15, -1], [-1, 15] ]) AB = array([ [14, 2], [ 2, 14] ]) # Hadamard product assert(array_equal(A*B, H)) # Ordinary matrix product assert(array_equal(A@B, AB)) # Schur product theorem assert(det(H) >= det(A)*det(B)) # Eigenvalues eigA = eigvalsh(A) eigB = eigvalsh(B) eigH = eigvalsh(A*B) lower = eigA[0]*eigB[0] upper = eigA[1]*eigB[1] assert(all(eigH >= lower)) assert(all(eigH <= upper))

The code above shows that the eigenvalues of *A* are [2, 4], the eigenvalues of *B* are [4, 6], and the eigenvalues of A ∘ *B* are [14, 16].

- How fast can you multiply matrices?
- Computing SVD and pseudoinverse
- Circular law for random matrices

[1] For complex matrices, Hermitian means conjugate symmetric, which in the real case reduces to simply symmetric. The theorem of Pólya and Szegö is actually valid for Hermitian matrices, but I simplified the statement for the case of real-valued matrices.

]]>It turns out the person interrupting you shouldn’t be so sure. There are no restrictions on the digits a prime number can begin with. (Ending digits are another matter. No prime ends in 0, for example.) Said another way, given any sequence of digits, it’s possible to add more digits to the end and make a prime. [1]

For example, take today’s date in ISO format: 20181008. Obviously not a prime. Can we find digits to add to make it into a prime? Yes, we can add 03 on to the end because 2018100803 is prime.

What about my work phone number: 83242286846? Yes, just add a 9 on the end because 832422868469 is prime.

This works in any base you’d like. For example, the hexadecimal number CAFEBABE is not prime, but CAFEBABE1 is. Or if you interpret SQUEAMISH as a base 36 number, you can form a base 36 prime by sticking a T on the end. [2]

In each of these example, we haven’t had to add much on the end to form a prime. You can show that this is to be expected from the distribution of prime numbers.

[1] Source: R. S. Bird. Integers with Given Initial Digits. The American Mathematical Monthly, Vol. 79, No. 4 (Apr., 1972), pp. 367-370

[2] CAFEBABE is a magic number used at the beginning of Java bytecode files. The word “squeamish” here is an homage to “The Magic Words are Squeamish Ossifrage,” an example cleartext used by the inventors of RSA encryption.

]]>Suppose you have two baseball teams, *A* and *B*, playing in the World Series. If you like, say *A* stands for Houston Astros and *B* for Milwaukee Brewers. Suppose that in each game the probability that *A* wins is *p*, and the probability of *A* losing is *q* = 1 – *p*. What is the probability that *A* will win the series?

The World Series is a **best-of-seven series**, so the first team to win 4 games wins the series. Once one team wins four games there’s no point in playing the rest of the games because the series winner has been determined.

At least four games will be played, so if you win the series, you win on the 4th, 5th, 6th, or 7th game.

The probability of *A* winning the series after the **fourth game** is simply *p*^{4}.

The probability of *A* winning after the **fifth game** is 4 *p*^{4} *q* because *A* must have lost one game, and it could be any one of the first four games.

The probability of *A* winning after the **sixth game** is 10 *p*^{4} *q*^{2} because *A* must have lost two of the first five games, and there are 10 ways to choose two items from a set of five.

Finally, the probability of *A* winning after the **seventh game** is 20 *p*^{4} *q*^{3} because *A* must have lost three of the first six games, and there are 20 ways to choose three items from a set of six.

**The probability of winning the World Series** is the sum of the probabilities of winning after 4, 5, 6, and 7 games which is

*p*^{4}(1 + 4*q* + 10*q*^{2} + 20*q*^{3})

Here’s a plot:

Obviously, the more likely you are to win each game, the more likely you are to win the series. But it’s not a straight line because the better team is more likely to win the series than to win any given game.

Now if you only wanted to compute the probability of winning the series, not the probability of winning after different numbers of games, **you could pretend that all the games are played**, even though some may be unnecessary to determine the winner. Then we compute the probability that a Binomial(7,

35*p*^{4}*q*^{3} + 21*p*^{5}*q*^{2} + 7*p*^{6}*q* + *p*^{7}

While looks very different than the expression we worked out above, they’re actually the same. If you stick in (1 – *p*) for *q* and work everything out, you’ll see they’re the same.

`scipy.constants`

. The most frequently used constants are available directly, and hundreds more are in a dictionary `physical_constants`

.
The fine structure constant α is defined as a function of other physical constants:

The following code shows that the fine structure constant and the other constants that go into it are available in `scipy.constants`

.

import scipy.constants as sc a = sc.elementary_charge**2 b = 4 * sc.pi * sc.epsilon_0 * sc.hbar * sc.c assert( abs(a/b - sc.fine_structure) < 1e-12 )

In the 1930’s Arthur Eddington believed that the number of protons in the observable universe was *exactly* the Eddington number

Since at the time the fine structure constant was thought to be 1/136, this made the number of protons a nice even 136 × 2^{256}. Later he revised his number when it looked like the fine structure constant was 1/137. According to the Python code above, the current estimate is more like 1/137.036.

Eddington was a very accomplished scientist, though he had some ideas that seem odd today. His number is a not a bad estimate, though nobody believes it could be exact.

The constants in `scipy.constants`

have come up in a couple previous blog posts.

The post on Koide’s coincidence shows how to use the `physical_constants`

dictionary, which includes not just the physical constant values but also their units and uncertainty.

The post on Benford’s law shows that the leading digits of the constants in `scipy.constants`

follow the logarithmic distribution observed by Frank Benford (and earlier by Simon Newcomb).

If you’re reading this in a web browser, you probably know what browser you’re using. If not, you’re at least aware that you *are* using a browser, even if you forget momentarily which one you have open. And you probably know what operating system is hosting your browser. You understand that you are reading a page on my site, that this page is not your browser, but content hosted by your browser. If you’ve subscribed via RSS or email, you know what email or RSS client you’re using and understand how this post is organized with respect to your other content.

Some people do not have this kind of context. Anything on their computer is simply “The Computer.” They don’t really understand what an operating system, web browser, or email client are. And they don’t need to know, most of the time. They can get their work done, but then occasionally they have inexplicable problems.

I’m not saying this to criticize or make fun of the people who don’t have the kind of technological context I’m talking about. It’s a remarkable achievement that software has gotten so easy to use that people can get along without knowing much technological context. But if this sort of thing is second nature to you, you might have a hard time understanding how a large number of people work.

You probably take it for granted that you can access the same web site from different computers. Some people do not. Their desktop at work is one computer, and their iPhone is a different computer. They don’t really understand what a web site is.

I know what a web browser is because I have been using computers since before there was a web. Old timers know what various technological components are because they’ve seen them develop. And “digital natives” know to get things done because they’ve grown up with computers, though their gaps in context show occasionally. Seems like the people in the middle would have the hardest time, not having grown up with the technology but not having watched it develop either.

I’m writing this because I’m becoming increasingly aware of how difficult life can be for people who don’t have adequate mental models for technology. I imagine most of my readers are tech savvy, and may have a hard time seeing some of the same things that I’ve had a hard time seeing, that a lot of people don’t understand things we take for granted.

It used to be that anybody who used a computer had to know some basic things. If you were a Unix user a generation ago, you might not know anything about the internals of Unix, but you at least knew that you were a Unix user. There were wizards and mere mortals, but the two groups shared more context than the most tech savvy and least tech savvy share today.

It’s good that people don’t need to know as much context, but occasionally it produces bewildering situations, both for the user and the person trying to help them.

]]>If *p* and *q* are primes, then there are ostensibly at least two groups of order *pq*: the cyclic group *Z*_{pq}, and *Z*_{p} + *Z*_{q}, the direct sum of the cyclic groups of orders *p* and *q*. However, there may just be one group of order *pq* after all because the two groups above could be isomorphic.

If *p* = *q* = 2, then *Z*_{4} and *Z*_{2} + *Z*_{2} are not isomorphic. But the groups *Z*_{15} and *Z*_{3} + *Z*_{5} *are* isomorphic. That is, there is only one group of order 15, even though 15 is composite. This is the smallest such example.

Let *p* and *q* be primes with *p* > *q*. If *q* does not divide *p*-1, then there is only one group of order *pq*. That is, all groups of order *pq* are isomorphic to the cyclic group *Z*_{pq}. So when *p* = 5 and *q* = 3, there is only one group of order 15 because 3 does not evenly divide 5-1 = 4. The same reasoning shows, for example, that there must only be one group with 77 elements because 7 does not divide 10.

Now if *q* does divide *p*-1, then there are two distinct groups of order *pq*. One is the cyclic group with *pq* elements. But the other is non-Abelian, and so it cannot be *Z*_{p} + *Z*_{q}. So once again *Z*_{pq} is isomorphic to *Z*_{p} + *Z*_{q}, but there’s a new possibility, a non-Abelian group.

Note that this does not contradict our earlier statement that *Z*_{4} and *Z*_{2} + *Z*_{2} are different groups, because we assumed *p* > q. If *p* = *q*, then *Z*_{pq} is not isomorphic to *Z*_{p} + *Z*_{q}.

]]>

alternate. That is, between any two consecutive zeros of one solution, there is exactly one zero of the other solution. This is an important theorem because a lot of differential equations of this form come up in applications.

If we let *p*(*x*) = 0 and *q*(*x*) = 1, then sin(*x*) and cos(*x*) are independent solutions and we know that their zeros interlace. The zeros of sin(*x*) are of the form *n*π, and the zeros of cos(*x*) are multiples of (n + 1/2)π.

What’s less obvious is that if we take two different linear combinations of sine and cosine, as long as they’re not proportional, then their zeros interlace as well. For example, we could take *f*(*x*) = 3 sin(*x*) + 5 cos(*x*) and *g*(*x*) = 7 sin(*x*) – 11 cos(*x*). These are also linearly independent solutions to the same differential equation, and so the Sturm separation theorem says their roots have to interlace.

If we take *p*(*x*) = 1/*x* and *q*(*x*) = 1 – (ν/*x*)² then our differential equation becomes **Bessel’s equation**, and the Bessel functions *J*_{ν} and *Y*_{ν} are independent solutions. Here’s a little Python code to show how the zeros alternate when ν = 3.

import matplotlib.pyplot as plt from scipy import linspace from scipy.special import jn, yn x = linspace(4, 30, 100) plt.plot(x, jn(3, x), "-") plt.plot(x, yn(3, x), "-.") plt.legend(["$J_3$", "$Y_3$"]) plt.axhline(y=0,linewidth=1, color="k") plt.show()

And here’s what it would look like if September had 31 days, 9/31/2018:

]]>A residential power outage is usually just an inconvenience, especially if the power comes back on within a few hours. A power outage to a hospital could be disastrous, and so hospitals have redundant power systems. The problem is in between, if power is reliable enough that you don’t expect it to go out, but the consequences of an outage are serious [2].

If a system fails occasionally, you prepare for that. And if it never fails, that’s great. In between is the problem, a system just reliable enough to lull you into complacency.

For example, **GPS** used to be unreliable. It made useful suggestions, but you wouldn’t blindly trust it. Then it got a little better and became dangerous as people trusted it when they shouldn’t. Now it’s much better. Not perfect, but less dangerous.

For another example, people who live in flood planes have **flood insurance**. Their mortgage company requires it. And people who live on top of mountains don’t need flood insurance. The people at most risk are in the middle. They live in an area that could flood, but since it hasn’t yet flooded they don’t buy flood insurance.

So **safety is not an increasing function of reliability**, not always. It might dip down before going up. There’s a valley between unreliable and highly reliable where people are tempted to take unwise risks.

I expect we’ll see a lot of this with artificial intelligence. Clumsy AI is not dangerous; pretty good AI is dangerous. Moderately reliable systems in general are dangerous, but this especially applies to AI.

As in the examples above, the better AI becomes, the more we rely on it. But there’s something else going on. As AI failures become **less frequent** they also become **weird**.

You’ll see stories of someone putting a tiny sticker on a stop sign and now a computer vision algorithm thinks the **stop sign** is a **frog** or an **ice cream sundae**. In this case, there was a deliberate attack: someone knew how to design a sticker to fool the algorithm. But strange failures can also happen unprompted.

Amazon’s search feature, for example, is usually very good. Sometimes I’ll get every word in a book title wrong and yet it will figure out what I meant. But one time I was searching for the book Universal Principles of Design.

I thought I remembered a “25” in the title. The subtitle turns out to be “125 ways to enhance reliability …” I searched on “25 Universal Design Principles” and the top result was a **massage machine** that will supposedly increase the size of a woman’s breasts. I tried the same search again this morning. The top result is a book on design. The next five results are

- a clip-on rear view mirror
- a case of adult diapers
- a ratchet adapter socket
- a beverage cup warmer, and
- a folding bed.

The book I was after, and whose title I remembered pretty well, was nowhere in the results.

Because AI is literally artificial, it makes mistakes no human would make. If I went to a brick-and-mortar book store and told a clerk “I’m looking for a book. I think the title is something like ’25 Universal Design Principles,” the clerk would not say “Would you like to increase your breast size? Or maybe buy a box of diapers?”

In this case, the results were harmless, even entertaining. But unexpected results in a mission-critical system would not be so entertaining. Our efforts to make systems fool-proof has been based on experience with human fools, not artificial ones.

[1] This post is an elaboration on what started as a Twitter thread.

[2] I’m told that in Norway electrical power is very reliable, but also very dependent on electricity, including for heating. Alternative sources of fuel such as propane are hard to find.

]]>List the vertices starting anywhere and moving counterclockwise around the polygon: (*x*_{1}, *y*_{1}), (*x*_{2}, *y*_{2}), …, (*x*_{n}, *y*_{n}). Then the area is given by the formula below.

But what does that mean? The notation is meant to be suggestive of a determinant. It’s not literally a determinant because the matrix isn’t square. But you evaluate it in a way analogous to 2 by 2 determinants: add the terms going down and to the right, and subtract the terms going up and to the right. That is,

*x*_{1} *y*_{2} + *x*_{2} *y*_{3} + … *x*_{n} *y*_{1} – *y*_{1} *x*_{2} – *y*_{2} *x*_{3} – … – *y*_{n} *x*_{1}

This formula is sometimes called the **shoelace formula** because the pattern of multiplications resembles lacing a shoe. It’s also called the **surveyor’s formula** because it’s obviously useful in surveying.

As someone pointed out in the comments, in practice you might want to subtract the minimum *x* value from all the *x* coordinates and the minimum *y* value from all the *y* coordinates before using the formula. Why’s that?

If you add a constant amount to each vertex, you move your polygon but you don’t change the area. So in theory it makes no difference whether you translate the polygon before computing its area. But in floating point, it can make a difference.

The cardinal rule of floating point arithmetic is to avoid subtracting nearly equal numbers. If you subtract two numbers that agree to *k* significant figures, you can lose up to *k* figures of precision. We’ll illustrate this by taking a right triangle with base 2 and height π. The area should remain π as we translate the triangle away from the origin, we lose precision the further out we translate it.

Here’s a Python implementation of the shoelace formula.

def area(x, y): n = len(x) s = 0.0 for i in range(-1, n-1): s += x[i]*y[i+1] - y[i]*x[i+1] return 0.5*s

If you’re not familiar with Python, a couple things require explanation. First, `range(n-1)`

is a list of integers starting at 0 but less than *n*-1. Second, the -1 index returns the last element of the array.

Now, watch how the precision of the area degrades as we shift the triangle by powers of 10.

import numpy as np x = np.array([0.0, 0.0, 2.0]) y = np.array([np.pi, 0.0, 0.0]) for n in range(0, 10): t = 10**n print( area(x+t, y+t) )

This produces

3.141592653589793 3.1415926535897825 3.1415926535901235 3.1415926535846666 3.141592651605606 3.1415929794311523 3.1416015625 3.140625 3.0 0.0

Shifting by small amounts doesn’t make a noticeable difference, but we lose between one and two significant figures each time we increase *t* by a multiple of 10. We only have between 15 and 16 significant figures to start with in a standard floating point number, and so eventually we completely run out of significant figures.

When implementing the shoelace formula, we want to do the opposite of this example: instead of shifting coordinates so that they’re similar in size, we want to shift them toward the origin so that they’re *not* similar in size.

**Related post**: The quadratic formula and low-precision arithmetic

*f*_{r} = *C* *r*^{–s}

where *s* is on the order of 1. The value of *s* that best fit the data depended on the set of passwords, but their estimates of *s* varied from 0.46 to 0.91.

This means that the most common passwords are very common and easy to guess.

If passwords come from an alphabet of size *A* and have length *n*, then there are *A*^{n} possibilities. For example, if a password has length 10 and consists of uppercase and lowercase English letters and digits, there are

62^{10} = 839,299,365,868,340,224

possible such passwords. If users chose passwords randomly from this set, brute force password attacks would be impractical. But brute force attacks *are* practical because passwords are *not* chosen uniformly from this large space of possibilities, far from it.

Attackers do not randomly try passwords. They start with the most common passwords and work their way down the list. In other words, attackers use Pareto’s rule.

Rules requiring, say, one upper case letter, don’t help much because most users will respond by using exactly one upper case letter, probably the first letter. If passwords must have one special character, most people will use exactly one special character, most likely at the end of the word. Expanding the alphabet size *A* exponentially increases the *possible* passwords, but does little to increase the actual number of passwords.

What’s interesting about the power law distribution is that there’s not a dichotomy between naive and sophisticated users. If there were, there would be a lot of common passwords, and all the rest uniformly distributed. Instead, there’s a continuum between the most naive and most sophisticated. That means **a lot of people are not exactly naive, but not as secure as they think they are**.

Under the Zipf model [3], the number of times we’d expect to see the most common password is *NC* where *N* is the size of the data set. The constant *C* is what it has to be for the frequencies to sum to 1. That is, *C* depends on the number of data points *N* and the exponent *s* and is given by

We can bound the sum in the denominator from above and below with integrals, as in the integral test for series convergence. This gives us a way to see more easily how *C* depends on its parameters.

When *s* = 1 this reduces to

and otherwise to

Suppose you have *N* = 1,000,000 passwords. The range of *s* values found by Wang et al varied from roughly 0.5 to 0.9. Let’s first set *s* = 0.5. Then *C* is roughly 0.0005. This mean the most common password appears about 500 times.

If we keep *N* the same and set *s* to 0.9, then C is roughly 0.033, and so the most common password would appear about 33,000 times.

If you need to come up with a password, randomly generated passwords are best, but hard to remember. So people either use weak but memorable passwords, or use strong passwords and don’t try to remember them. The latter varies in sophistication from password management software down to Post-it notes stuck on a monitor.

One compromise is to concatenate a few randomly chosen words. Something like “thebestoftimes” would be weak because they are consecutive words from a famous novel. Something like “orangemarbleplungersoap” would be better.

Another compromise, one that takes more effort than most people are willing to expend, is to use Manuel Blum’s mental hash function.

[1] In case the link rots in the future, the paper is “Zipf’s Law in Passwords” by Ding Wang, Haibo Cheng, Ping Wang, Xinyi Huang, and Gaopeng Jian. IEEE Transactions on Information Forensics and Security, vol. 12, no. 11, November 2017.

[2] Nothing follows a power law distribution exactly and everywhere. But that’s OK: nothing exactly follows any other distribution everywhere: not Gaussian, not exponential, etc. But a lot of things, like user passwords, approximately follow a power law, especially over the middle of their range. Power law’s like Zipf’s law tend to not fit as well at the beginning and the end.

[3] Here I’m using a pure Zipf model for simplicity. The paper [1] used a Zipf-like model that I’m not using here. Related to the footnote [2] above, it’s often necessary to use a minor variation on the pure Zipf model to get a better fit.

]]>C. A. R. Hoare wrote an article How Did Software Get So Reliable Without Proof? in 1996 that still sounds contemporary for the most part.

In the 1980’s many believed that programs could not get much bigger unless we started using formal proof methods. The argument was that bugs are fairly common, and that each bug has the potential to bring a system down. Therefore the only way to build much larger systems was to rely on formal methods to catch bugs. And yet programs continued to get larger and formal methods never caught on. Hoare asks

Why have twenty years of pessimistic predictions been falsified?

Another twenty years later we can ask the same question. Systems have gotten far larger, and formal methods have not become common. Formal methods are used—more on that shortly—but have not become common.

It’s interesting that Hoare was the one to write this paper. He is best known for the **quicksort**, a sorting algorithm that works better in practice than in theory! Quicksort is commonly used in practice, even though has terrible worst-case efficiency, because its average efficiency has optimal asymptotic order [1], and in practice it works better than other algorithms with the same asymptotic order.

It is logically possible that the smallest bug could bring down a system. And there have been examples, such as the **Mars Climate Orbiter**, where a single bug did in fact lead to complete failure. But this is rare. Most bugs are inconsequential.

Some will object “How can you be so blasé about bugs? A bug crashed a $300 million probe!” But what is the realistic alternative? Would spending an additional billion dollars on formal software verification have prevented the crash? Possibly, though not certainly, and the same money could send three more missions to Mars. (More along these lines here.)

It’s all a matter of **economics**. Formal verification is extremely tedious and expensive. The expense is worth it in some settings and not in others. The software that runs **pacemakers** is more critical than the software that runs a **video game**. For most software development, less formal methods have proved more cost effective at achieving acceptable quality: code reviews, unit tests, integration testing, etc.

I have some experience with formal software verification, including formal methods software used by NASA. When someone says that software has been formally verified, there’s an implicit disclaimer. It’s usually the **algorithms** have been formally verified, **not the implementation** of those algorithms in software. Also, maybe not all the algorithms have been verified, but say 90%, the remaining 10% being too difficult to verify. In any case, formally verified software can and has failed. Formal verification greatly reduces the probability of encountering a bug, but it does not reduce the probability to zero.

There has been a small **resurgence of interest** in formal methods since Hoare wrote his paper. And again, it’s all about economics. Theorem proving technology has improved over the last 20 years. And software is being used in contexts where the consequences of failure are high. But for most software, the most economical way to achieve acceptable quality is not through theorem proving.

There are also **degrees of formality**. Full theorem proving is extraordinarily tedious. If I remember correctly, one research group said that they could formally verify about one page of a mathematics textbook per man-week. But there’s a continuum between full formality and no formality. For example, you could have formal assurance that your software satisfies certain conditions, even if you can’t formally prove that the software is completely correct. Where you want to be along this continuum of formality is again a matter of **economics**. It depends on the probability and consequences of errors, and the cost of reducing these probabilities.

[1] The worst-case performance of quicksort is O(*n*²) but the average performance is O(*n* log *n*).

Photograph of C. A. R. Hoare by Rama, Wikimedia Commons, Cc-by-sa-2.0-fr, CC BY-SA 2.0 fr

]]>If anyone else claimed a simple proof of RH they’d immediately be dismissed as a crank. In fact, many people have sent me simple proofs of RH just in the last few days in response to my blog post, and I imagine they’re all cranks [1]. But Atiyah is not a crank. He won the Fields Medal in 1966 and the Abel prize in 2004. In other words, he was in the top echelon of mathematicians 50 years ago and has kept going from there. There has been speculation that although Atiyah is not a crank, he has gotten less careful with age. (He’s 89 years old.)

QuinteScience, source of the image above, quoted Atiyah as saying

Solve the Riemann hypothesis and you’ll become famous. But if you’re already famous, you run the risk of becoming infamous.

If Atiyah had a simple *self-contained* proof of RH that would be too much to believe. Famous conjectures that have been open for 150 years don’t have simple self-contained proofs. It’s logically possible, but practically speaking it’s safe to assume that the space of possible simple proofs has been very thoroughly explored by now.

But Atiyah’s claimed proof is not self-contained. It’s really a corollary, though I haven’t seen anyone else calling it that. He is claiming that a proof of RH follows easily from his work on the **Todd function**, which hasn’t been published. If his proof is correct, the hard work is elsewhere.

Andrew Wiles’ proof of Fermat’s last theorem was also a corollary. He proved a special case of the Taniyama–Shimura conjecture, and at end of a series of lectures noted, almost as an afterthought, that his work implied a proof to Fermat’s last theorem. Experts realized this was where he was going before he said it. Atiyah has chosen the opposite approach, presenting his corollary first.

Atiyah has spoken about connections between mathematics and physics for years. Maybe he was alluding to his work on the the **fine structure constant** which he claims yields RH as a corollary. And he is not the only person talking about connections between the Riemann hypothesis specifically and physics. For example, there was a paper in Physical Review Letters last year by Bender, Brody, and Müller stating a possible connection. I don’t know whether this is related to Atiyah’s work.

The fine structure constant is a dimensionless physical constant α, given by

where *e* is the elementary charge, ε_{0} is vacuum permittivity, *ħ* is the reduced Planck constant, and *c* is the speed of light in a vacuum. Its value is roughly 1/137.

The Todd function *T* is a function introduced by Atiyah, named after his teacher J. A. Todd. We don’t know much about this function, except that it is key to Atiyah’s proof. Atiyah says the details are to be found in his manuscript *The Fine Structure Constant* which has been submitted to the Proceedings of the Royal Society.

Atiyah says that his manuscript shows that on the critical line of the Riemann zeta function, the line with real part 1/2, the Todd function has a limit ж and that the fine structure constant α is exactly 1/ж. That is,

lim_{y → ∞} *T*(1/2 + *yi*) = ж = 1/α.

Now I don’t know what he means by proving that a physical constant has an exact mathematical value; the fine structure constant is something that is empirically measured. Perhaps he means that in some mathematical model of physics, the fine structure constant has a precise mathematical value, and that value is the limit of his Todd function.

Or maybe it’s something like Koide’s coincidence where a mathematical constant is within the error tolerance of a physical constant, an interesting but not necessarily important observation.

Michael Atiyah is taking a big risk. I’ve seen lots of criticism of Atiyah online. As far as I know, none of the critics have a Fields Medal or Abel Prize in their closet.

Atiyah’s proof is probably wrong, just because proofs of big theorems are usually wrong. Andrew Wiles’ proof of Fermat’s Last Theorem had a flaw that took a year to patch. We don’t know who Atiyah has shown his work to. If he hasn’t shown it to anyone, then it is almost certainly flawed: nobody does flawless work alone. Maybe his proof has a patchable flaw. Maybe it is flawed beyond repair, but contains interesting ideas worth pursuing further.

The worst case scenario is that Atiyah’s work on the fine structure constant and the Todd function is full of holes. He has made other big claims in the last few years that didn’t work out. Some say he should quit doing mathematics because he has made big mistakes.

I’ve made big mistakes too, and I’m not quitting. I make mistakes doing far less ambitious work than trying to prove the Riemann hypothesis. I doubt I’ll ever produce anything as deep as a plausible but flawed proof of the Riemann hypothesis.

The longer paper has been leaked, presumably without permission from Atiyah or the Royal Society, and it doesn’t seem to hold up. No one is saying the proof can be patched, but there has been some discussion about whether the Todd trick could be useful.

In writing this post I wanted to encourage people to give Atiyah a chance, to wait until more was known before assuming the proof wasn’t good. I respect Atiyah as a mathematician and as a person—I read some of his work in college and I’ve had the privilege of meeting him on a couple occasions—and I hoped that he had a proof even though I was skeptical. I think no less of him for attempting a to prove a big theorem. I hope that I’m swinging for the fences in my ninth decade.

[1] I don’t call someone a crank just because they’re wrong. My idea of a crank is someone without experience in an area, who thinks he has found a simple solution to a famous problem, and who believes there is a conspiracy to suppress his work. Cranks are not just wrong, they can’t conceive that they might be wrong.

]]>*a*^{p-1} = 1 (mod *p*).

**Euler’s generalization** of Fermat’s little theorem says that if *a* is relatively prime to *m*, then

*a*^{φ(m)} = 1 (mod *m*)

where φ(*m*) is Euler’s so-called **totient** function. This function counts the number of positive integers less than *m* and relatively prime to *m*. For a prime number *p*, φ(*p*) = *p*-1, and to Euler’s theorem generalizes Fermat’s theorem.

Euler’s totient function is **multiplicative**, that is, if *a* and *b* are relatively prime, then φ(*ab*) = φ(*a*) φ(*b*). We will need this fact below.

This post looks at three **applications** of Fermat’s little theorem and Euler’s generalization:

- Primality testing
- Party tricks
- RSA public key encryption

The **contrapositive** of Fermat’s little theorem is useful in primality testing: if the congruence

*a*^{p-1} = 1 (mod *p*)

does *not* hold, then either *p* is not prime or *a* is a multiple of *p*. In practice, *a* is much smaller than *p*, and so one can conclude that *p* is not prime.

Technically this is a test for non-primality: it can only prove that a number is not prime. For example, if 2^{p-1} is not congruent to 1 (mod *p*) then we know *p* is not a prime. But if 2^{p-1} *is* congruent to 1 (mod *p*) then all we know is that we haven’t failed the test; it’s still conceivable that *p* is prime. So we try another value of *a*, say 3, and see whether 3^{p-1} is congruent to 1 (mod *p*).

If we haven’t disproved that *p* is prime after several attempts, we have reason to believe *p* is probably prime.[1]. There are **pseudoprimes**, a.k.a. **Carmichael numbers** that are not prime but pass the primality test above for all values of *a*. But these numbers are much less common than primes.

By the way, if *p* is a huge number, say with hundreds or thousands of digits, doesn’t it seem odd that we would want to compute numbers to the power *p*? Actually computing *a*^{p} would be impossible. But because we’re computing mod *p*, this is actually easy. We can apply the fast exponentiation algorithm and take remainders by *p* at every step, so we’re never working with numbers more than twice as long as *p*.

A few days ago I wrote about the fifth root party trick. If someone raises a two-digit number to the fifth power, you can quickly tell what the number was. Part of what makes the trick work is that in base 10, any number *n* and its fifth power end in the same digit. You can prove this by trying all 10 possible last digits, but if you want to generalize the trick to **other bases**, it helps to use Euler’s theorem. For example, you could use 9th powers in base 15.

Euler’s theorem shows why raising *a* to the power φ(*m*) + 1 in base *m* keeps the last digit the same, but only if *a* is relatively prime to *m*. To extend the fifth root trick to other bases you’ll have a little more work to do.

The original [2] RSA public key cryptography algorithm was a clever use of Euler’s theorem.

Search for two enormous prime numbers *p* and *q* [3]. Keep *p* and *q* private, but make *n* = *pq* public. Pick a **private key** *d* and solve for a **public key** *e* such that *de* = 1 (mod φ(*n*)).

Since you know *p* and *q*, you can compute φ(*n*) = (*p* – 1)(*q* – 1), and so you can compute the public key *e*. But someone who doesn’t know *p* and *q*, but only their product *n*, will have a hard time solving for *d* from knowing *e*. Or at least that’s the hope! Being able to factor *n* is **sufficient** to break the encryption scheme, but it’s not logically **necessary**. Maybe recovering private keys is much easier than factoring, though that doesn’t seem to be the case.

So where does Euler come in? Someone who has your public key *e* and wants to send you a message *m* computes

*m*^{e} (mod *n*)

and sends you the result [4]. Now, because you know *d*, you can take the encrypted message *m*^{e} and compute

(*m*^{e})^{d} = *m*^{φ(n)} = 1 (mod *n*)

by Euler’s theorem.

This is the basic idea of RSA encryption, but there are many **practical details** to implement the RSA algorithm well. For example, you don’t want *p* and *q* to be primes that make *pq* easier than usual to factor, so you want to use safe primes.

[1] Saying that a number is “probably prime” makes sense the first time you see it. But then after a while it might bother you. This post examines and resolves the difficulties in saying that a number is “probably prime.”

[2] The original RSA paper used Euler’s totient function φ(*n*) = (*p* – 1)(*q* – 1), but current implementations use **Carmichael’s totient function** λ(*n*) = lcm(*p* – 1, *q* – 1). Yes, same Carmichael as Carmichael numbers mentioned above, Robert Daniel Carmichael (1879–1967).

[3] How long does it take to find big primes? See here. One of the steps in the process it to weed out composite numbers that fail the primality test above based on Fermat’s little theorem.

[4] This assumes the message has been broken down into segments shorter than *n*. In practice, RSA encryption is used to send **keys** for non-public key (symmetric) encryption methods because these methods are more computationally efficient.

Quanta Magazine has a story today saying that two mathematicians have concluded that Shinichi Mochizuki’s proof of the ABC conjecture is flawed beyond repair. The story rightly refers to a “clash of Titans” because Shinichi Mochizuki and his two critics Peter Scholze and Jakob Stix are all three highly respected.

I first wrote about the *abc* conjecture when it came out in 2012. In find the proposed proof fascinating, not because I understand it, but because *nobody* understands it. The proof is 500 pages of dense, sui generis mathematics. Top mathematicians have been trying to digest the proof for six years now. Scholze and Stix believe they’ve found an irreparable hole in the proof, but Mochizuki has not conceded defeat.

Sometimes when a flaw is found in a proof, the flaw can later be fixed, as was the case with Andrew Wiles’ proof of Fermat’s last theorem. Other times the proof cannot be salvaged entirely, but interesting work comes out of it, as was the case with the failed attempts to prove FLT before Wiles. What will happen with Mochizuki’s proof of the *abc* conjecture if it cannot be repaired? I can imagine two outcomes.

- Because the proof is so far out of the mainstream, there must be useful pieces of it that can be salvaged.
- Because the proof is so far out of the mainstream, nobody will build on the work and it will be a dead end.

This morning I heard rumors that Michael Atiyah claims to have proven the Riemann hypothesis. The Heidelberg Laureate Forum twitter account confirmed that Atiyah is scheduled to announce his work at the forum on Monday.

I was a fan of Atiyah’s work when I was a grad student, and I got to talk to him at the 2013 and 2014 Heidelberg Laureate Forums. I hope that he really has a proof, but all the reaction I’ve seen has been highly skeptical. He claims to have a relatively simple proof, and long-standing conjectures rarely have simple proofs.

It would be great for the Riemann hypothesis to be settled, and it would be great for Atiyah to be the one who settles it.

Whereas Mochizuki’s proof is radically *outside* the mainstream, Atiyah’s proof appears to be radically *inside* the mainstream. He says he has a “radically new approach … based on work of von Neumann (1936), Hirzebruch (1954) and Dirac (1928).” It doesn’t seem likely that someone could prove the Riemann hypothesis using such classical mathematics. But maybe he found a new approach by using approaches that are not fashionable.

I hope that if Atiyah’s proof doesn’t hold up, at least something new comes out of it.

**Update **(9/23/2018): I’ve heard a rumor that Atiyah has posted a preprint to arXiv, but I don’t see it there. I did find a paper online that appeared to be his that someone posted to Dropbox. It gives a very short proof of RH by contradiction, based on a construction using the **Todd function**. Apparently all the real work is in his paper *The Fine Structure Constant* which has been submitted to Proceedings of the Royal Society. I have not been able to find a preprint of that article.

Atiyah gave a presentation of his proof in Heidelberg this morning. From what I gather the presentation didn’t remove much mystery because what he did was show that RH follows as a corollary of his work on the Todd function that hasn’t been made public yet.

Michael #Atiyah at the #HLF18:

“Solve the #RiemannHypothesis and you’ll become famous.

But if you’re already famous, you run the risk of becoming infamous.”

Sense of humour above all!#RetransmisionesQS pic.twitter.com/SWU1gXvm87

— QuinteScience (@QuinteScience) September 24, 2018

**Update**: More on Atiyah’s proof here.

**The abc conjecture** claims a deep relationship between addition and multiplication of integers. Specifically, for every ε > 0, there are only finitely many coprime triples (

Here coprime means *a*, *b*, and *c* share no common divisor larger than 1. Also, rad(*abc*) is the product of the distinct prime factors of *a*, *b*, and *c*.

This is a very technical theorem (conjecture?) and would have widespread consequences in number theory if true. This is sort of the opposite of Fermat’s Last Theorem: the method of proof had widespread application, but the result itself did not. With the *abc* conjecture, it remains to be seen whether the method of (attempted?) proof has application.

**The Riemann hypothesis** concerns the Riemann zeta function, a function ζ of complex values that encodes information about primes. The so-called trivial zeros of ζ are at negative integers. The Riemann hypothesis says that the rest of the zeros are all lined up vertically with real part 1/2 and varying imaginary parts.

Many theorems have been proved conditional on the Riemann hypothesis. If it is proven, a lot of other theorems would immediately follow.

Here’s the trick in a nutshell. For any number *n*, *n*^{5} ends in the same last digit as *n*. You could prove that by brute force or by Euler’s theorem. So when someone tells you *n*^{5}, you immediately know the last digit. Now you need to find the first digit, and you can do that by learning, approximately, the powers (10k)^{5} for *i* = 1, 2, 3, …, 9. Then you can determine the first digit by the range.

Here’s where the video is a little vague. It says that you don’t need to know the powers of 10*k* very accurately. This is true, but just how accurately *do* you need to know the ranges?

If the two-digit number is a multiple of 10, you’ll recognize the zeros at the end, and the last non-zero digit is the first digit of *n*. For example, if *n*^{5} = 777,600,000 then you know *n* is a multiple of 10, and since the last non-zero digit is 6, *n* = 60.

So you need to know the fifth powers of multiples of 10 well enough to distinguish (10*k* – 1)^{5} from (10*k* + 1)^{5}. The following table shows what these numbers are.

|---+---------------+---------------| | k | (10k - 1)^5 | (10k + 1)^5 | |---+---------------+---------------| | 1 | 59,049 | 161,051 | | 2 | 2,476,099 | 4,084,101 | | 3 | 20,511,149 | 28,629,151 | | 4 | 90,224,199 | 115,856,201 | | 5 | 282,475,249 | 345,025,251 | | 6 | 714,924,299 | 844,596,301 | | 7 | 1,564,031,349 | 1,804,229,351 | | 8 | 3,077,056,399 | 3,486,784,401 | | 9 | 5,584,059,449 | 6,240,321,451 | |---+---------------+---------------|

So any number less than a million has first digit 1. Any number between 1 million and 3 million has first digit 2. Etc.

You could choose the following boundaries, if you like.

|---+----------------| | k | upper boundary | |---+----------------| | 1 | 1,000,000 | | 2 | 3,000,000 | | 3 | 25,000,000 | | 4 | 100,000,000 | | 5 | 300,000,000 | | 6 | 800,000,000 | | 7 | 1,700,000,000 | | 8 | 3,200,000,000 | | 9 | 6,000,000,000 | |---+----------------|

The Numberphile video says you should have someone say the number aloud, in words. So as soon as you hear “six billion …”, you know the first digit of *n* is 9. If you hear “five billion” or “four billion” you know the first digit is 8. If you hear “three billion” then you know to pay attention to the next number, to decide whether the first digit is 7 or 8. Once you hear the first few syllables of the number, you can stop pay attention until you hear the last syllable or two.

Let’s start with a list of primes, say the first 100 primes. The 100th prime is *p* = 541. If an even number less than *p* is the sum of two primes, it’s the sum of two primes less than *p*. So by looking at the sums of pairs of primes less than *p*, we’ll know whether the Goldbach conjecture is true for numbers less than *p*. And while we’re at it, we could keep track not just of *whether* a number is the sum of two primes, but also *how many ways* it is a sum of two primes.

from sympy import prime from numpy import zeros N = 100 p = prime(N) primes = [prime(i) for i in range(1, N+1)] sums = zeros(p, int) for i in range(N): # j >= i so we don't double count for j in range(i, N): s = primes[i] + primes[j] if s >= p: break sums[s] += 1 # Take the even slots starting with 4 evens = sums[4::2] print( min(evens), max(evens) )

This prints 1 and 32. The former means that every even number greater than 4 and less than *p* was hit at least once, that every number under consideration was the sum of two primes. The latter means that at least one number less than *p* can be written as a sum of two primes 32 different ways.

According to the Wikipedia article on the Goldbach conjecture, Nils Pipping manually verified the Goldbach conjecture for even numbers up to 100,000 in 1938, an amazing feat.

There are 9,952 primes less than 100,000 and so we would need to take *N* = 9592 in our program to reproduce Pipping’s result. This took about seven minutes.

**Update**: As suggested in the comments, nearly all of the time is being spent generating the list of primes. When I changed the line

primes = [prime(i) for i in range(1, N+1)]

to

primes = [x for x in primerange(1, p)]

the runtime dropped from 7 minutes to 18 seconds.

]]>It’s known that the number of groups of order p^n for prime p is p^{2n^3/27+O(n^(8/3))}. It might be fun to compare this to what Mathematica says.

Here goes. First let’s let *p* = 2. Mathematica’s `FiniteGroupCount`

function tops out at *n* = 10. And for the range of values Mathematica does support, 2^{2n^3/27} grossly overestimates the number of groups.

Table[{FiniteGroupCount[2^n], 2^((2./27) n^3)}, {n, 1, 11}] {{1, 1.05269}, {2, 1.50795}, {5, 4.}, {14, 26.7365}, {51, 612.794}, {267, 65536.}, {2328, 4.45033*10^7}, {56092, 2.61121*10^11}, {10494213, 1.80144*10^16}, {49487365422, 1.98847*10^22}, {FiniteGroupCount[2048], 4.7789*10^29}}

OEIS has entries for the sizes of various groups, and the entry for powers of 2 confirms the asymptotic formula above. Maybe it’s not a good approximation until *n* is very large. (Update: Here’s a reference for the asymptotic expression for all *p*.)

The OEIS entry for number of groups of order powers of 5 has an interesting result:

For a prime p >= 5, the number of groups of order p^n begins 1, 1, 2, 5, 15,

61 + 2*p + 2*gcd (p – 1, 3) + gcd (p – 1, 4),

3*p^2 + 39*p + 344 + 24*gcd(p – 1, 3) + 11*gcd(p – 1, 4) + 2*gcd(p – 1, 5), …

We can duplicate this with Mathematica.

Table[FiniteGroupCount[5^n], {n, 0, 6}] {1, 1, 2, 5, 15, 77, 684}

and the last two numbers match the calculations given in OEIS.

There’s something interesting going on with Mathematica. It doesn’t seem to know, or agree with, the formula above for groups of order *p*^{4}. For example,

Table[FiniteGroupCount[7^n], {n, 0, 6}] {1, 1, 2, 5, FiniteGroupCount[2401], 83, 860}

I get similar results when I use larger primes: it can’t handle the fourth power.

Table[FiniteGroupCount[389^n], {n, 0, 6}] {1, 1, 2, 5, FiniteGroupCount[22898045041], 845, 469548}

The results for *n* = 5 and 6 agree with OEIS.

Is OEIS wrong about the number of groups of order *p*^{4} or should Mathematica simply return 15 but there’s a gap in the software?

Also, does anybody know why the agreement with the asymptotic formula above is so bad? It’s just as bad or worse for other primes that I tried.

Some people post worthwhile original material, but they retweet things that are offensive or just not interesting. You can fix that by turning off retweets from that person. Then you’ll just see tweets they compose.

Except until yesterday, there was no way to turn off “likes.” You’d randomly see things someone “liked” even if you turned off their retweets. Now there’s a way to simply see the content you’ve subscribed to. Not only that, you’ll see it in order! Numerous times I’ve tried to go back and find something but couldn’t because Twitter saw fit to edit and rearrange my stream since the last time I looked at it.

The way to simply see your Twitter stream in order isn’t obvious. You have to go to

Settings and privacy -> Account

and uncheck the box that says “Show the best Tweets first.”

Who wouldn’t want to see the best tweets first? Sounds good to me. But by unchecking the box you’re effectively saying “Let me decide what’s best by who I choose to follow.”

I’m pleased by this new feature (actually, new ability to turn off a feature). I’ve tried to maintain a decent signal to noise ratio in my Twitter stream and Twitter has continually tried to erode it, until now.

]]>`FiniteGroupData`

and related functions in Mathematica. That would have made some of my earlier posts easier to write had I used this instead of writing my own code.
Here’s something I find interesting. For each *n*, look at the groups of order at most *n* and count how many are Abelian versus non-Abelian. At first there are more Abelian groups, but the non-Abelian groups soon become more numerous. Also, the number of Abelian groups grows smoothly, while the number of non-Abelian groups has big jumps, particularly at powers of 2.

Here’s the Mathematica code:

fgc = FoldList[Plus, 0, Table[FiniteGroupCount[n], {n, 1, 300}]] fga = FoldList[Plus, 0, Table[FiniteAbelianGroupCount[n], {n, 1, 300}]] ListLogPlot[ {fgc - fga, fga }, PlotLegends -> {"Non-Abelian", "Abelian"}, Joined -> True, AxesLabel -> {"order", "count"}]

I see the plot legend on my screen, but when saving the plot to a file the legend wasn’t included. Don’t know why. (Update: See footnote [1]). The jagged blue curve is the number of non-Abelian groups of size up to *n*. The smooth gold curve is the corresponding curve for Abelian groups.

Here’s the same plot carried out further to show the jumps at 512 and 1024.

[1] Someone from Wolfram Research saw this post and sent me a fix:

pl = ListLogPlot[...] Export["~/Desktop/img.png", pl]]]>

The **permutation symbol** ε_{ijk} is similar. It has some branching logic built into its definition, which keeps branching out of your calculation, letting you handle things more uniformly. In other words, the symbol encapsulates some complexity, keeping it out of your calculation. This is analogous to how you might reduce the complexity of a computer program. [1]

The permutation symbol, sometimes called the **Levi-Civita symbol**, can have any number of subscripts. If any two of the subscripts are equal, the symbol evaluates to 0. Otherwise, the symbol evaluates to 1 or -1. If you can order the indices with an even number of swaps, the sign of the permutation is 1. If it takes an odd number of swaps, the sign is -1. You could think of putting the indices into a **bubble sort** algorithm and counting whether the algorithm does an even or odd number of swaps.

(There’s an implicit theorem here saying that the definition above makes sense. You could change one order of indices to another by different series of swaps. Two different ways of getting from one arrangement to another may use a different number of swaps, but the number of swaps in both approaches will have the same parity.)

Incidentally, I mentioned even and odd permutations a few days ago in the context of finite simple groups. One of the families of finite simple groups are the alternating groups, the group of even permutations on a set with at least five elements. In other words, permutations whose permutation symbol is 1.

For example, ε_{213} = -1 because it takes one adjacent swap, exchanging the 2 and the 1, to put the indices in order. ε_{312} = 1 because you can put the indices in order with two adjacent swaps: 3 <-> 1, then 3 <-> 2. The symbol ε_{122} is 0 because the last two indices are equal.

You can compute permutation symbols in Mathematica with the function `Signature`

. For example,

Signature[{3, 1, 2}]

returns 1. The function works with more indices as well. For example,

Signature[{3, 1, 2, 5, 4}]

returns -1.

SymPy has a function `LeviCivita`

for computing the permutation symbol. It also has `Eijk`

as an alias for `LeviCivita`

. Both take a variable number of integers as arguments, not a list of integers as Mathematica does. If you do have a list of integers, you can use the `*`

operator to unpack the list into separate arguments.

from sympy import Eijk, LeviCivita from numpy.random import permutation print( LeviCivita(3, 1, 2) ) print( Eijk(3, 1, 2, 5, 4) ) p = permutation(5) assert(Eijk(*p) == LeviCivita(*p))

When all indices are distinct, the permutation symbol can be computed from a product. For two indices,

For three indices,

and in general

An example use of the permutation symbol is cross products. The *i*th component of the **cross product** of *b* × *c* is

Here we’re using tensor notation where components are indicated by superscripts rather than subscripts, and there’s an implied summation over repeated indices. So here we’re summing over *j* and *k*, each running from 1 to 3.

Similarly, the **triple product** of vectors *a*, *b* and *c* is

This is also the determinant of the matrix whose rows are the vectors *a*, *b*, and *c*. Determinants of larger matrices work the same way.

This post started out by talking about the more familiar Kronecker delta as an introduction to the permutation symbol. There is a nice relation between the two given below.

If we set *r* = *i* we get the special case

[1] One way of measuring the complexity of a computer program is the maximum number of logic branches in any function. If you have a moderately complex function, and you replace an if-then statement with a call to a small function that has an if-then statement, you’ve reduced the overall complexity. This is sort of what the delta and permutation functions do.

]]>D(*uv*) = *v* D*u* + *u* D*v.*

It looks strange that the first term on the right isn’t D*u* *v*.

The function *uv* is a function from *n* dimensions to *n* dimensions, so it’s derivative must be an *n* by *n* matrix. So the two terms on the right must be *n* by *n* matrices, and they are. But D*u* *v* is a 1 by 1 matrix, so it would not make sense on the right side.

Here’s why the product rule above looks strange: the multiplication by *u* is a **scalar **product, not a matrix product. Sometimes you can think of real numbers as 1 by 1 matrices and everything works out just fine, but not here. The product *uv* doesn’t make sense if you think of the output of *u* as a 1 by 1 matrix. Neither does the product *u* D*v*.

If you think of *v* as an *n* by 1 matrix and D*u* as a 1 by *n* matrix, everything works. If you think of *v* and D*u* as vectors, then *v* D*u* is the **outer product** of the two vectors. You could think of D*u* as the gradient of *u*, but be sure you think of it horizontally, i.e. as a 1 by *n* matrix. And finally, D(*uv*) and D*v* are **Jacobian matrices**.

**Update**: As Harald points out in the comments, the usual product rule applies if you write the scalar-vector product *uv* as the matrix product *vu* where now are *are* thinking of *u* as a 1 by 1 matrix! Now the product rule looks right

D(*vu*) = D*v* *u* + *v* D*u*

but the product *vu* looks wrong because you always write scalars on the left. But here *u* isn’t a scalar!

Such norms occur frequently in application [1]. Yoshio Koide discovered in 1981 that if you take the masses of the **electron**, **muon**, and **tau** particles, the ratio of the 1 norm to the 1/2 norm is very nearly 2/3. Explicitly,

to **at least four decimal places**. Since the ratio is tantalizingly close to 2/3, some believe there’s something profound going on here and the value is exactly 2/3, but others believe it’s just a coincidence.

The value of 2/3 is interesting for two reasons. Obviously it’s a small integer ratio. But it’s also exactly the midpoint between the smallest and largest possible value. More on that below.

Is the value 2/3 within the measure error of the constants? We’ll investigate that with a little Python code.

The masses of particles are available in the `physical_constants`

dictionary in `scipy.constants`

. For each constant, the dictionary contains the best estimate of the value, the units, and the uncertainty (standard deviation) in the measurement [2].

from scipy.constants import physical_constants as pc from scipy.stats import norm def pnorm(v, p): return sum([x**p for x in v])**(1/p) def f(v): return pnorm(v, 1) / pnorm(v, 0.5) m_e = pc["electron mass"] m_m = pc["muon mass"] m_t = pc["tau mass"] v0 = [m_e[0], m_m[0], m_t[0]] print(f(v0))

This says that the ratio of the 1 norm and 1/2 norm is 0.666658, slightly less than 2/3. Could the value be exactly 2/3 within the resolution of the measurements? How could we find out?

The function *f* above is minimized when its arguments are all equal, and maximized when its arguments are maximally unequal. To see this, note that *f*(1, 1, 1) = 1/3 and *f*(1, 0, 0) = 1. You can prove that those are indeed the minimum and maximum values. To see if we can make *f* larger, we want to increase the largest value, the mass of tau, and decrease the others. If we move each value one standard deviation in the desired direction, we get

v1 = [m_e[0] - m_e[2], m_m[0] - m_m[2], m_t[0] + m_t[2]] print(f(v1))

which returns 0.6666674, just slightly bigger than 2/3. Since the value can be bigger than 2/3, and less than 2/3, the intermediate value theorem says there are values of the constants within one standard deviation of their mean for which we get exactly 2/3.

Now that we’ve shown that it’s *possible* to get a value above 2/3, how likely is it? We can do a simulation, assuming each measurement is normally distributed.

N = 1000 above = 0 for _ in range(N): r_e = norm(m_e[0], m_e[2]).rvs() r_m = norm(m_m[0], m_m[2]).rvs() r_t = norm(m_t[0], m_t[2]).rvs() t = f([r_e, r_m, r_t]) if t > 2/3: above += 1 print(above)

When we I ran this, I got 168 values above 2/3 and the rest below. So based solely on our calculations here, not taking into account any other information that may be important, it’s plausible that Koide’s ratio is exactly 2/3.

***

[1] Strictly speaking, we should take the absolute values of the vector components. Since we’re talking about masses here, I simplified slightly by assuming the components are non-negative.

Also, what I’m calling the *p* “norm” is only a norm if *p* is at least 1. Values of *p* less than 1 do occur in application, even though the functions they define are not norms. I’m pretty sure I’ve blogged about such an application, but I haven’t found the post.

[2] The SciPy library obtained its values for the constants and their uncertainties from the CODATA recommended values, published in 2014.

]]>