Why is that? Define vol(*n*) to be the volume of the unit sphere in dimension *n*. Then

and so the sum of the volumes of all even dimensional spheres is

But what if you wanted to sum the volumes of all odd dimensional unit spheres? Or all dimensions, even and odd?

The answers to all these questions are easy in terms of the Mittag-Leffler function that I blogged about a while back. This function is defined as

and reduces the exponential function when α = β = 1.

The sum of the volumes of all unit spheres of all dimensions is *E*_{1/2, 1}(√π). And from the post mentioned above,

where erfc is the complementary error function. So the sum of all volumes of spheres is exp(π) erfc(-√π).

Now erfc(-√π) ≈ 1.9878 and so this says the sum of the volumes of spheres of all dimensions is about twice the sum of the even dimensional spheres alone. And so the sum of the odd dimensional unit sphere volumes is almost the same as the sum of the even dimensional ones.

By the way, you could easily answer other questions about sums of sphere volumes in terms of the Mittag-Leffler function. For example, if you want to add up the volumes in all dimensions that are a multiple of 3, you get *E*_{3/2, 1}(π^{3/2}).

There are 256 = 16 × 16 possible 8-bit numbers, and so the S-box can be represented as a 16 by 16 table mapping inputs to outputs. You can find the tables representing the S-box and its inverse in this text file in org-mode format.

This post will look in detail at how the entries of that table are filled. A high-level description of the design is as follows. For an 8-bit number *x*,

- Invert in GF(2
^{8}) - Multiply by a matrix
*L* - Add a constant
*c*.

Next we dive into what each of these steps mean. And at the end we’ll work an example in detail.

My source is The Block Cipher Companion by Knudsen and Robshaw.

Steps 1 and 3 take the inverse of a number as a member of the finite field GF(2^{8}), a finite field with 2^{8} elements.

The number of elements in a finite field determines the field, up to isomorphism. That is, in a sense there is only one field with 2^{8} = 256 elements. In fact there are 30 different fields with 256 elements (see this post for where that number came from). The 30 fields are isomorphic, but when we’re doing actual calculations, rather than abstract theory, we need to specify which representation we’re using.

Finite fields can be specified as polynomials modulo an irreducible polynomial. To carry out our calculations we need to specify a particular irreducible 8th degree polynomial with binary coefficients. The one that AES uses is

*p*(*x*) = *x*^{8} + *x*^{4} + *x*^{3} + *x* + 1.

So by taking the inverse in GF(2^{8}) we mean to take an 8-bit number *y*, interpret it as a polynomial with binary coefficients, and find another 8-bit number *x*^{-1} such that when we multiply them as polynomials, and take the remainder after dividing by *p*(*x*) we get the polynomial 1.

There’s one wrinkle in this procedure: only 255 of the 256 elements of GF(2^{8}) have an inverse. There is no inverse for 0, but for our purposes we’ll take the inverse of 0 to be 0.

The matrix *L* we need here is the 8 by 8 binary matrix whose entries are

10001111 11000111 11100011 11110001 11111000 01111100 00111110 00011111

When we say to multiply *x*^{-1} by *L* we mean to think of *x*^{-1} as a vector over the field of two elements, and carry out matrix multiplication in that context.

The constant that we add is 0x63. The reason an additive constant was chosen was so that a zero input would not map to a zero output. Note that “addition” here is vector addition, and is carried out over the field of two elements, just as the multiplication above. This amounts to XOR of the bits.

To make everything above more concrete, we’ll calculate one entry of the table by hand.

Lets start with input *y* = 0x11 = 0b10001. We represent *y* as the polynomial *f*(*x*) = *x*^{4} + 1 and look for a polynomial *g*(*x*) such that the remainder when *f*(*x*) *g*(*x*) is divided by *p*(*x*) defined above is 1.

The process of finding *g* is complicated—maybe I’ll blog about it in the future—but I claim

*g*(*x*) = *x*^{7} + *x*^{5} + *x*^{4} + *x*^{2}

which means the inverse of *y* in GF(2^{8}), represented in binary, is 0b10110100 = 0xB4. You can find a table of inverses here.

Next we multiply the matrix *L* by the vector made from the bits of *y*^{-1}. However, there is a gotcha here. When Knudsen and Robshaw say to multiply the bits of *y*^{-1} by *L*, they assume the bits are *arranged from least significant to most significant*. Since the bits of *y*^{-1} are 10110100, we multiply *L* by the vector

[0, 0, 1, 0, 1, 1, 0, 1].

This multiplication gives us the vector

[1, 0, 0, 0, 0, 1, 1, 1].

Next we add the vector formed from the bits of 0x63, again from least significant to most significant. That means we lay out 0x63 = 0b01100011 as

[1, 1, 0, 0, 0, 1, 1, 0].

This gives us the result

[0, 1, 0, 0, 0, 0, 0, 1].

Reading the bits from least significant to most, this gives 0x82.

In sum, we’ve verified that the AES S-box takes 0x11 to 0x82, as stated in the table.

[1] The Rijndael block cipher operates on blocks whose size is a multiple of 32 bits. The AES standard adopted Rijndael with block sizes 128, 192, and 256.

[2] “Rijndael” is a play on the designers of the cipher, Vincent Rijmen and Joan Daemen.

]]>`shuf`

from browsing The Art of Command Line. This could be useful for random sampling.
Given just a file name, `shuf`

randomly permutes the lines of the file.

With the option `-n`

you can specify how many lines to return. So it’s doing sampling without replacement. For example,

shuf -n 10 foo.txt

would select 10 lines from `foo.txt`

.

Actually, it would select *at most* 10 lines. You can’t select 10 lines without replacement from a file with less than 10 lines. If you ask for an impossible number of lines, the `-n`

option is ignored.

You can also sample *with* replacement using the `-r`

option. In that case you *can* select more lines than are in the file since lines may be reused. For example, you could run

shuf -r -n 10 foo.txt

to select 10 lines drawn with replacement from `foo.txt`

, regardless of how many lines `foo.txt`

has. For example, when I ran the command above on a file containing

alpha beta gamma

I got the output

beta gamma gamma beta alpha alpha gamma gamma beta

I don’t know how `shuf`

seeds its random generator. Maybe from the system time. But if you run it twice you will get different results. Probably.

The NSA has also made some suggestions for what to do in the mean time [1]. Last year the agency replaced its Suite B cryptography recommendations with the CNSA: Commercial National Security Algorithm Suite.

In a nutshell: use well-established methods for now but with longer keys.

In a little larger nutshell, the recommendations are:

- SHA-384 for secure hashing
- AES-256 for symmetric encryption
- RSA with 3072 bit keys for digital signatures and for key exchange
- Diffie Hellman (DH) with 3072 bit keys for key exchange
- Elliptic curve P-384, both for key exchange (ECDH) and for digital signatures (ECDSA)

Each of these represents a 50% or 100% increase in key length:

- from 128 to 256 for AES
- from 256 to 384 for hashing and ECC
- from 2048 to 3072 for RSA and DH.

If these are just stopgap measures, why not jump straight to quantum-resistant methods? There are quantum-resistant encryption methods available, but most of them haven’t been studied that long. As Koblitz and Menezes put it,

… most quantum-resistant systems that have been proposed are complicated, have criteria for parameter selection that are not completely clear, and in some cases (such as NTRU) have a history of successful attacks on earlier versions.

Some methods do have a long history but have other drawbacks. Robert McEliece’s encryption method, for example, dates back to 1978 and has held up well, but it requires a megabyte key to achieve 128-bit security. There is a variation on McEliece’s method that has radically smaller keys, but it’s only been around for six years. In short, the dust hasn’t settled regarding post-quantum encryption methods.

[1] People are naturally suspicious of algorithm recommendations coming from the NSA. Wouldn’t the agency like for everyone to use encryption methods that it could break? Of course. But the agency also wants US companies and government agencies to use encryption methods that foreign agencies cannot break.

There’s little downside to using established methods with longer keys. However, key length may not the weakest link. If you’re vulnerable to timing attacks, for example, doubling your key length may create a false sense of security.

]]>Radiolab did an episode on the case of a cosmic bit flip changing the vote tally in a Belgian election in 2003. The error was caught because one candidate got more votes than was logically possible. A recount showed that the person in question got 4096 more votes in the first count than the second count. The difference of exactly 2^{12} votes was a clue that there had been a bit flip. All the other counts remained unchanged when they reran the tally.

It’s interesting that the cosmic ray-induced error was discovered presumably because the software quality was high. All software is subject to cosmic bit flipping, but most of it is so buggy that you couldn’t rule out other sources of error.

Cosmic bit flipping is becoming more common because processors have become smaller and more energy efficient: the less energy it takes for a program to set a bit intentionally, the less energy it takes for radiation to set a bit accidentally.

**Related post**: Six sigma events

[1] Spacecraft are especially susceptible to bit flipping from cosmic rays because they are out from under the radiation shield we enjoy on Earth’s surface.

]]>**In cryptography**, a strong primes are roughly speaking primes whose products are likely to be hard to factor. More specifically, though still not too specific, *p* is a strong prime if

*p*is large*p*– 1 has a large prime factor*q**q*– 1 has a large prime factor*r**p*+ 1 has a large prime factor*s*

The meaning of “large” is not precise, and varies over time. In (1), large means large enough that it is suitable for use in cryptography, such as in RSA encryption. This standard increases over time due to increases in computational power and improvements in factoring technology. The meaning of “large” in (2), (3), and (4) is not precise, but makes sense in relative terms. For example in (2), the smaller the ratio (*p* – 1)/*q* the better.

The Wikipedia article on strong primes makes the following claim without any details:

A computationally large safe prime is likely to be a cryptographically strong prime.

I don’t know whether this has been proven, or even if it’s true, but I’d like to explore it empirically. (**Update**: see the section on safe primes below. I misread “safe” above as “strong.” Just as well: it lead to an interesting investigation.)

We’ll need some way to quantify whether a prime is strong in the cryptographic sense. This has probably been done before, but for my purposes I’ll use the sum of the logarithms of *q*, *r*, and *s*. We should look at these relative to the size of *p*, but all the *p*‘s I generate will be roughly the same size.

I’ll generate 100-bit primes just so my script will run quickly. These primes are too small for use in practice, but hopefully the results here will be representative of larger primes.

from sympy import nextprime, prevprime, factorint, randprime import numpy as np # largest prime factor def lpf(n): return max(factorint(n).keys()) def log2(n): np.log2(float(n)) num_samples = 100 data = np.zeros((num_samples, 5)) bitsize = 100 for i in range(num_samples): p = randprime(2**bitsize, 2**(bitsize+1)) data[i,0] = 2*p > nextprime(p) + prevprime(p) q = lpf(p-1) r = lpf(q-1) s = lpf(p+1) data[i,1] = log2(q) data[i,2] = log2(r) data[i,3] = log2(s) data[i,4] = log2(q*r*s)

The columns of our matrix correspond to whether the prime is strong in the number theory sense, the number of bits in *q*, *r*, and *s*, and the total bits in the three numbers. (Technically the log base 2 rather than the number of bits.)

There were 75 strong primes and 25 non-strong primes. Here were the averages:

|-----+--------+------------| | | strong | not strong | |-----+--------+------------| | q | 63.6 | 58.8 | | r | 41.2 | 37.0 | | s | 66.3 | 64.3 | | sum | 171.0 | 160.1 | |-----+--------+------------|

The numbers are consistently higher for strong primes. However, the differences are small relative to the standard deviations of the values. Here are the standard deviations:

|-----+--------+------------| | | strong | not strong | |-----+--------+------------| | q | 20.7 | 15.6 | | r | 19.8 | 12.3 | | s | 18.7 | 19.9 | | sum | 30.8 | 41.9 | |-----+--------+------------|

I realized after publishing this post that the Wikipedia quote didn’t say what I thought it did. It said that *safe* primes are likely to be cryptographically strong primes. I misread that as *strong* primes. But the investigation above remains valid. It shows weak evidence that strong primes in the number theoretical sense are also strong primes in the cryptographic sense.

Note that safe does not imply strong; it only implies the second criterion in the definition of strong. Also, strong does not imply safe.

To test empirically whether safe primes are likely to be cryptographically strong, I modified my code to generate safe primes and compute the strength as before, the sum of the logs base 2 of *q*, *r*, and *s*. We should expect the strength to be larger since the largest factor of *p* will always be as large as possible, (*p* – 1)/2. But there’s no obvious reason why *r* or *s* should be large.

For 100-bit safe primes, I got an average strength of 225.4 with standard deviation 22.8, much larger than in my first experiment, and with less variance.

Unifiers are people whose driving passion is to find general principles which will explain everything. They are happy if they can leave the universe looking a little simpler than they found it.

Diversifiers are people whose passion is to explore details. They are in love with the heterogeneity of nature … They are happy if they leave the universe a little more complicated than they found it.

Presumably these categories correspond to what Freeman elsewhere calls birds and frogs, or what others call hedgehogs and foxes. I imagine everyone takes pleasure in both unification and diversification, though in different proportions. Some are closer to one end of the spectrum than the other.

The scientific heroes presented to children are nearly always unifiers like Newton or Einstein [1]. You don’t see as many books celebrating, for example, a biologist who discovered that what was thought to be one species is really 37 different species. This creates an unrealistic picture of science since not many people discover grand unifying principles, though more find unifying principles on a small scale. I imagine many are discouraged from a career in science because they believe they have to be a unifier / bird / hedgehog, when in fact there are more openings for a diversifier / frog / fox.

Dyson may be taking a subtle swipe at unifiers by saying they want to leave the world *looking* a little simpler than they found it. There may be an unspoken accusation that unifiers create the illusion of unity by papering over diversity. True and significant unifying theories like general relativity are hard to come by. It’s much easier to come up with unifying theories that are incomplete or trivial.

[1] Or at least scientists best known for their unifying work. Newton, for example, wasn’t entirely a unifier, but he’s best known for discovering unifying principles of gravity and motion.

]]>However, sometimes SF authors do indeed try to predict the future. This seems to have been at least somewhat the case with John Brunner and his 1975 novel The Shockwave Rider because he cites futurist Alvin Toffler in his acknowledgement.

…

The Shockwave Riderderives in large part from Alvin Toffler’s stimulating studyFuture Shock, and in consequence I’m much obliged to him.

In light of Brunner’s hat tip to Toffler, I think it’s fair to talk about what he got right, or possibly what Toffler got right. Here’s a paragraph from the dust jacket that seemed prescient.

Webbed in a continental data-net that year by year draws tighter as more and still more information is fed to it, most people are apathetic, frightened, resigned to what ultimately will be a total abolishment of individual privacy. A whole new reason has been invented for paranoia: it is beyond doubt — whoever your are! — that someone, somewhere, knows something about you that you wanted to keep a secret … and you stand

nochance of learning what it is.

I saw this quote at the beginning of a math book when I was a student and it stuck with me. I would think of it when grading exams. Students often assume it is enough to be possible to understand, possible for an infinitely patient and resourceful reader to reverse engineer the thought process behind a stream of consciousness.

The quote is an aphorism, and so not intended to be taken literally, but I’d like to take the last part literally for a moment. I think the quote would be better advice if it said “unlikely to misunderstand.” This ruins the parallelism and the aesthetics of the quote, but it gets to an important point: trying to be impossible to misunderstand leads to bad writing. It’s appropriate when writing for computers, but not when writing for people.

Trying to please too wide and too critical an audience leads to defensive, colorless writing.

You’ll never use an allusion for fear that someone won’t catch it.

You’ll never use hyperbole for fear that some hyper-literalist will object.

You’ll never leave a qualification implicit for fear that someone will pounce on it.

Social media discourages humor, at least subtle humor. If you say something subtle, you may bring a smile to 10% of your audience, and annoy 0.1%. The former are much less likely to send feedback. And if you have a large enough audience, the feedback of the annoyed 0.1% becomes voluminous.

Much has been said about social media driving us to become partisan and vicious, and certainly that happens. But not enough has been said about an opposite effect that also happens, driving us to become timid and humorless.

]]>Differential privacy obscures *query* values by injecting enough noise to keep from revealing information on an individual.

Let’s compare two approaches for de-identifying a person’s age: truncation and differential privacy.

First consider truncating birth date to year. For example, anyone born between January 1, 1955 and December 31, 1955 would be recorded as being born in 1955. This effectively produces a 100% confidence interval that is one year wide.

Next we’ll compare this to a 95% confidence interval using ε-differential privacy.

Differential privacy adds noise in proportion to the sensitivity Δ of a query. Here sensitivity means the maximum impact that one record could have on the result. For example, a query that counts records has sensitivity 1.

Suppose people live to a maximum of 120 years. Then in a database with *n* records [1], one person’s presence in or absence from the database would make a difference of no more than 120/*n* years, the worst case corresponding to the extremely unlikely event of a database of *n*-1 newborns and one person 120 year old.

The Laplace mechanism implements ε-differential privacy by adding noise with a Laplace(Δ/ε) distribution, which in our example means Laplace(120/*n*ε).

A 95% confidence interval for a Laplace distribution with scale *b* centered at 0 is

[*b* log 0.05, –*b* log 0.05]

which is very nearly

[-3*b*, 3*b*].

In our case *b* = 120/*n*ε, and so a 95% confidence interval for the noise we add would be [-360/*n*ε, 360/*n*ε].

When *n* = 1000 and ε = 1, this means we’re adding noise that’s usually between -0.36 and 0.36, i.e. we know the average age to within about 4 months. But if *n* = 1, our confidence interval is the true age ± 360. Since this is wider than the a priori bounds of [0, 120], we’d truncate our answer to be between 0 and 120. So we could query for the age of an individual, but we’d learn nothing.

The width of our confidence interval is 720/ε, and so to get a confidence interval one year wide, as we get with truncation, we would set ε = 720. Ordinarily ε is much smaller than 720 in application, say between 1 and 10, which means differential privacy reveals far less information than truncation does.

Even if you truncate age to **decade** rather than year, this still reveals more information than differential privacy provided ε < 72.

[1] Ordinarily even the number of records in the database is kept private, but we’ll assume here that for some reason we know the number of rows a priori.

]]>φ² – φ – 1 = 0.

By analogy, **golden ratio primes** are prime numbers of the form

*p* = φ² – φ – 1

where φ is an integer. To put it another way, instead of solving the equation

φ² – φ – 1 = 0

over the real numbers, we’re looking for prime numbers *p* where the equation can be solved in the integers mod *p*. [1]

When φ is a large power of 2, these prime numbers are useful in cryptography because their special form makes modular multiplication more efficient. (See the previous post on Ed448.) We could look for such primes with the following Python code.

from sympy import isprime for n in range(1000): phi = 2**n q = phi**2 - phi - 1 if isprime(q): print(n)

This prints 19 results, including *n* = 224, corresponding to the golden ratio prime in the previous post. This is the only output where *n* is a multiple of 32, which was useful in the design of Ed448.

Of course you could look for golden ratio primes where φ is not a power of 2. It’s just that powers of 2 are the application where I first encountered them.

A prime number *p* is a golden ratio prime if there exists an integer φ such that

*p* = φ² – φ – 1

which, by the quadratic theorem, is equivalent to requiring that *m* = 4*p* + 5 is a square. In that case

φ = (1 + √*m*)/2.

Here’s some code for seeing which primes less than 1000 are golden ratio primes.

from sympy import primerange def issquare(m): return int(m**0.5)**2 == m for p in primerange(2, 1000): m = 4*p + 5 if issquare(m): phi = (int(m**0.5) + 1) // 2 assert(p == phi**2 - phi - 1) print(p)

By the way, there are faster ways to determine whether an integer is a square. See this post for algorithms.

(Update: Aaron Meurer pointed out in the comments that SymPy has an efficient function `sympy.ntheory.primetest.is_square`

for testing whether a number is a square.)

Instead of looping over primes and testing whether it’s possible to solve for φ, we could loop over φ and test whether φ leads to a prime number.

for phi in range(1000): p = phi**2 - phi - 1 if isprime(p): print(phi, p)

The smallest golden ratio prime is *p* = 5, with φ = 3.

Here’s a cute one: the pi prime 314159 is a golden ratio prime, with φ = 561.

The golden ratio prime that started this rabbit trail was the one with φ = 2^{224}, which Mike Hamburg calls the Goldilocks prime in his design of Ed448.

[1] If *p* = φ² – φ – 1 for some integer φ, then φ² – φ – 1 = 0 (mod *p*). But the congruence can have a solution when *p* is not a golden ratio prime. The following code shows that the smallest example is *p* = 31 and φ = 13.

from sympy import primerange from sympy.ntheory.primetest import is_square for p in primerange(2, 100): m = 4*p + 5 if not is_square(m): for x in range(p): if (x**2 - x - 1) % p == 0: print(p, x) exit()]]>

Mike Hamburg designed an elliptic curve for use in cryptography he calls **Ed448-Goldilocks**. The prefix **Ed** refers to the fact that it’s an Edwards curve. The number **448** refers to the fact that the curve is over a prime field where the prime *p* has size 448 bits. But why **Goldilocks**?

The prime in this case is

*p* = 2^{448} – 2^{224} – 1,

which has the same form as the NIST primes. Hamburg says in his paper

I call this the “Goldilocks” prime because its form defines the golden ratio φ = 2

^{224}.

That sentence puzzled me. What does this have to do with the golden ratio? The connection is that Hamburg’s prime is of the form

φ² – φ – 1.

The roots of this polynomial are the golden ratio and its conjugate. But instead of looking for real numbers where the polynomial is zero, we’re looking for integers where the polynomial takes on a prime value. (See the followup post on golden ratio primes.)

The particular prime that Hamburg uses is the “Goldilocks” prime by analogy with the fairy tale: the middle term 2^{224} is just the right size. He explains

Because 224 = 32*7 = 28*8 = 56*4, this prime supports fast arithmetic in radix 2

^{28}or 2^{32}(on 32-bit machines) or 2^{56}(on 64-bit machines). With 16, 28-bit limbs it works well on vector units such as NEON. Furthermore, radix-2^{64}implementations are possible with greater efficiency than most of the NIST primes.

The title of this post is “Goldilocks and the three multiplications.” Where do the three multiplications come in? It’s an allusion to an algorithm for multi-precision multiplication that lets you get by with **three multiplications** where the most direct approach would require four. The algorithm is called Karatsuba multiplication [1].

Hamburg says “The main advantage of a golden-ratio prime is fast Karatsuba multiplication” and that if we set φ = 2^{224} then

Note that the first line on the right side involves four multiplications, but the bottom line involves three. Since the variables represent 224-bit numbers, removing a multiplication at the expense of an extra addition and subtraction is a net savings [2].

The most important line of the calculation above, and the only one that isn’t routine, is the second. That’s where the special form of *p* comes in. When you eliminate common terms from both sides, the calculation boils down to showing that

which is obviously true since *p* = φ² – φ – 1.

Edwards curves only have one free parameter *d* (besides the choice of field) since they have the form

*x*² + *y*² = 1 + *d* *x*² *y*².

Hamburg chose *d* = -39081 for reasons explained in the paper.

Most elliptic curves used in ECC currently work over prime fields of order 256 bits, providing 128 bits of security. The motivation for developed Ed448 was much the same as for developing P-384. Both work over larger fields and so provide more bits of security, 224 and 192 bits respectively.

Unlike P-384, Ed448 is a safe curve, meaning that it lends itself to a secure practical implementation.

- Addition on Curve1174 (another Edwards curve)
- Elliptic curves over finite fields
- Cryptography posts

[1] Here we’ve just applied the Karatsuba algorithm one time. You could apply it recursively to multiply two *n*-bit numbers in O(*n*^{k}) time, where *k* = log_{2} 3 ≈ 1.86. This algorithm, discovered in 1960, was the first multiplication algorithm faster than O(*n*²).

[2] Addition and subtraction are O(*n*) operations. And what about multiplication? That’s not an easy question. It’s no worse than O(*n*^{1.68}) by virtue of the Karatsuba algorithm. In fact, it’s O(*n* log *n*), but only for impractically large numbers. See the discussion here. But in any case, multiplication is slower than addition for multiprecision numbers.

The NIST curves over prime fields are named after the number of bits in the prime: the name is “P-” followed by the number of bits. The primes themselves are named *p* with a subscript for the number of bits.

The five NIST primes are

*p*_{192} = 2^{192} – 2^{64} – 1

*p*_{224} = 2^{224} – 2^{96} + 1

*p*_{256} = 2^{256} – 2^{244} + 2^{192} + 2^{96} – 1

*p*_{384} = 2^{384} – 2^{128} – 2^{96} + 2^{32} – 1

*p*_{521} = 2^{521} – 1

The largest of these, *p*_{521}, is a Mersenne prime, and the rest are generalized Mersenne primes.

Except for *p*_{521}, the exponents of 2 in the definitions of the NIST primes are all multiples of 32 or 64. This leads to efficient tricks for arithmetic modulo these primes carried out with 32-bit or 64-bit integers. You can find pseudocode implementations for these tricks in Mathematical routines for the NIST prime elliptic curves.

The elliptic curve Ed448 “Goldilocks” was not part of the original set of recommended curves from NIST but has been added. It employs a multiplication trick in the same spirit as the routines referenced above, but simpler. Ed448 uses

*p* = 2^{448} – 2^{224} – 1

which has the special form φ² – φ – 1 where φ = 2^{224}. This enables a trick known as Karatsuba multiplication. More on that here.

[1] FIPS PUB 186-4. This publication is dated 2013, but the curve definitions are older. I haven’t found for certain when the curves were defined. I’ve seen one source that says 1997 and another that says 1999.

]]>This post looks at curve **P-384**. What’s special about this curve? It’s the elliptic curve that the NSA recommends everyone use until post-quantum methods have been standardized. It provides 192 bits of security, whereas more commonly used curves provide 128 bits.

Does the NSA recommend this method because they know how to get around it? Possibly, but they also need to recommend methods that they believe foreign governments cannot break.

The equation of the P-384 curve is

*y*² = *x*³ + *ax* + *b*

working over the field of integers modulo a prime *p*. We will go into each of the specific parameters *a*, *b*, and *p*, and discuss how they were chosen.

Consisting with the naming conventions for elliptic curves used in cryptography, the name “P-384” tells you that the curve is over a prime field where the prime is a 384-bit integer. Specifically, the order of the field is

*p* = 2^{384} – 2^{128} – 2^{96} + 2^{32} – 1

For a given number of bits, in this case 384, you want to pick a prime that’s relatively near the maximum size for that number of bits. In our case, our prime *p* is a prime near 2^{384} with a convenient bit pattern. (The special pattern allows implementation tricks that increase efficiency.)

Hasse’s theorem says that the number of points on a curve modulo a large prime is on the order of magnitude equal to the prime, so P-384 contains approximately 2^{384} points. In fact, the number of points *n* on the curve is

39402006196394479212279040100143613805079739270465446667946905279627659399113263569398956308152294913554433653942643

or approximately 2^{384} – 2^{190}. The number *n* is a prime, and so it is the order of P-384 as a group.

According to a footnote in the standard defining P-384, FIPS PUB 186-4,

The selection a ≡ -3 for the coefficient of x was made for reasons of efficiency; see IEEE Std 1363-2000.

All the NIST elliptic curves over prime fields use *a* = -3 because this makes it possible to use special algorithms for elliptic curve arithmetic.

The curve P-384 has Weierstrass form

*y*² = *x*³ – 3*x* + *b*

where *b* is

27580193559959705877849011840389048093056905856361568521428707301988689241309860865136260764883745107765439761230575.

The parameter *b* is between 2^{383} and 2^{384} but doesn’t have any particular binary pattern:

101100110011000100101111101001111110001000111110111001111110010010011000100011100000010101101011111000111111100000101101000110010001100000011101100111000110111011111110100000010100000100010010000000110001010000001000100011110101000000010011100001110101101011000110010101100011100110001101100010100010111011010001100111010010101010000101110010001110110111010011111011000010101011101111

The specification says that *b* was chosen at random. How can you convince someone that you chose a parameter at random?

The standard gives a 160-bit seed *s*, and a hash-based algorithm that *s* was run through to create a 384-bit parameter *c*. Then *b* is the solution to

*b*² c = -27 mod *p*.

The algorithm going from the *s* to *c* is given in Appendix D.6 and is a sort of key-stretching algorithm. The standard cites ANS X9.62 and IEEE Standard 1363-2000 as the source of the algorithm.

If *b* was designed to have a back door, presumably a tremendous amount of computation had to go into reverse engineering the seed *s*.

Koblitz and Menezes wrote a paper in which they suggest a way that the NSA might have picked seeds that lead to weak elliptic curves, but then argue against it.

It is far-fetched to speculate that the NSA would have deliberately selected weak elliptic curves in 1997 for U.S. government usage … confident that no one else would be able to discover the weakness in these curves in the ensuing decades. Such a risky move by the NSA would have been inconsistent with the Agency’s mission.

The post ended with wondering about functions analogous to sine and cosine, such as Bessel functions. This post will look at that question in more detail. Specifically we’ll look at the functions *J*_{ν} and *Y*_{ν}.

Because these two Bessel functions satisfy the same second order linear homogeneous differential equation, the Strum separation theorem says that their zeros are interlaced: between each pair of consecutive zeros of *J*_{ν} is exactly one zero of *Y*_{ν}, and between each pair of consecutive zeros of *Y*_{ν} there is exactly one zero of *J_{ν}.*

In the following Python code, we find zeros of *J*_{ν}, then look in between for places where *J*_{ν} and *Y*_{ν} cross. Next we find the angle the two curves make at each intersection and plot the angles.

from scipy.special import jn_zeros, jv, yv from scipy.optimize import bisect from numpy import empty, linspace, arccos import matplotlib.pyplot as plt n = 3 # bessel function order N = 100 # number of zeros z = jn_zeros(n, N) # Zeros of J_n crossings = empty(N-1) f = lambda x: jv(n,x) - yv(n,x) for i in range(N-1): crossings[i] = bisect(f, z[i], z[i+1]) def angle(n, x): # Derivatives of J_nu and Y_nu dj = 0.5*(jv(n-1,x) - jv(n+1,x)) dy = 0.5*(yv(n-1,x) - yv(n+1,x)) top = 1 + dj*dy bottom = ((1 + dj**2)*(1 + dy**2))**0.5 return arccos(top/bottom) y = angle(n, crossings) plt.plot(y) plt.xlabel("Crossing number") plt.ylabel("Angle in radians") plt.show()

This shows that the angles steadily decrease, apparently quadratically.

This quadratic behavior is what we should expect from the asymptotics of *J*_{ν} and *Y*_{ν}: For large arguments they act like shifted and rescaled versions of sin(*x*)/√*x*. So if we looked at √*x**J*_{ν} and √*x**Y*_{ν} rather than *J*_{ν} and *Y*_{ν} we’d expect the angles to reach some positive asymptote, and they do, as shown below.

Jim Simons replied with a proof so short it fits in a tweet: The product of the derivatives is -sin(*x*)sec²(*x*) = -tan(*x*)/cos(*x*), which is -1 if cos(*x*)=tan(*x*).

This made me wonder whether sine and cosine are orthogonal in the sense of graphs intersecting at right angles. They are orthogonal in the sense that their product integrates to zero over the interval [0, 2π] is zero, a fact that’s important fact in Fourier analysis, but are they orthogonal in the sense of their graphs intersecting at right angles? A graph makes this look plausible:

But the graph is misleading. I made the plot without specifying the aspect ratio, using the default in Mathematica. This makes the angle between the graphs look smaller. Setting the aspect ratio to 1 shows the true picture. The two curves intersect at π/4 and 5π/4, and they intersect at an angle of 2π/3, not π/2.

The product of the slopes is not -1, but it is negative and constant, so you could multiply each function by some constant to make the product of slopes -1. Said another way, you could make the curves perpendicular by adjusting he aspect ratio.

Can you do this for other functions that are orthogonal in the inner product sense? Not easily. For example, sin(2*x*) and sin(3*x*) are orthogonal in the inner product (integral) sense, but the angles of intersection are different at different places where the curves cross.

What about other functions that are like sine and cosine? I looked at the Bessel functions *J*_{1} and *J*_{2}, but the angles at the intersections vary. Ditto for Chebyshev polynomials. I suppose the difference is that sine and cosine are translations of each other, whereas that’s not the case for Bessel or Chebyshev functions. But it is true for wavelets, so you could find wavelets that are orthogonal in the sense of inner products and in the sense of perpendicular graphs.

**Update**: See more on how Bessel functions cross in the next post.

]]>“… I don’t care …”

Asribalt was horrified. “But how can you not be fascinated by—”

“I

amfascinated,” I insisted. “That’s the problem. I’m suffering from fascination burnout. Of all the things that are fascinating, I have to choose just one or two.”

This business of zero volume and infinite area may sound unreal, but the steps along the way to the limit are very tangible. Here’s a video by Aaron Benzel that let’s you fly through *M*_{3}, and watch what happens if you split *M*_{3} apart.

You can compute the volume and area at each stage to show that

and

From these equations you can see that you can make the volume as small and you’d like, and the area as large as you like, by taking *n* big enough. And in case that sounds a little hand-wavey, we can get more concrete. Here’s a little code to find exactly how big a value of *n* is big enough.

from math import log, ceil def menger_volume(n): return (20/27.)**n def menger_area(n): return 2*(20/9.)**n + 4*(8/9.)**n def volume_below(v): if v >=1: return 1 else: n = log(v)/log(20/27.) return int(ceil(n)) def area_above(a): if a < 2: return 1 else: n = (log(a) - log(2))/log(20/9.) return int(ceil(n)) n = volume_below(0.001) print( n, menger_volume(n) ) n = area_above(1000) print( n, menger_area(n) )

In practice regular expressions may have some false positives or false negatives. The expressions given here have **only false positives**. That is, no valid ICD-9 or ICD-10 codes will go unmatched, but the regular expressions may match things that are not diagnosis codes. The latter is inevitable anyway since a string of characters could coincide with a diagnosis code but not be used as a diagnosis code. For example 1234 is a valid ICD-9 code, but 1234 in a document could refer to other things, such as a street address.

Most ICD-9 diagnosis codes are just numbers, but they may also start with E or V.

Numeric ICD-9 codes are at least three digits. Optionally there may be a decimal followed by one of two more digits.

An E code begins with E and three digits. These may be followed by a decimal and one more digit.

A V code begins with a V followed by two digits. These may be followed by a decimal and one or two more digits.

Sometimes the decimals are left out.

Here are regular expressions that summarize the discussion above.

N = "\d{3}\.?\d{0,2}" E = "E\d{3}\.?\d?" V = "V\d{2}\.?\d{0,2}" icd9_regex = "|".join([N, E, V])

Usually E and V are capitalized, but they don’t have to be, so it would be best to do a case-insensitive match.

ICD-10 diagnosis codes always begin with a letter (except U) followed by a digit. The third character is usually a digit, but could be an A or B [1]. After the first three characters, there may be a decimal point, and up to three more alphanumeric characters. These alphanumeric characters are never U. Sometimes the decimal is left out.

So the following regular expression would match any ICD-10 diagnosis code.

[A-TV-Z][0-9][0-9AB]\.?[0-9A-TV-Z]{0,4}

As with ICD-9 codes, the letters are usually capitalized, but not always, so it’s best to do a case-insensitive search.

As mentioned at the beginning, the regular expressions here may have false positives. However, they don’t let any valid codes slip by. I downloaded lists of ICD-9 and ICD-10 codes from the CDC and tested to make sure the regular expressions here matched every code.

Character ranges are supported everywhere, such as `[A-TV-Z]`

for the letters A through T and V through Z.

Not every regular expression implementation supports `\d`

to represent a digit. In Emacs, for example, you would have to use`[0-9]`

instead since it doesn’t support `\d`

.

I’ve used `\.?`

for an optional decimal point. (The . is a special character in regular expressions, so it needs to be escaped to represent a literal period.) Some people wold write `[.]?`

instead on the grounds that it may be more readable. (Periods are not special characters in the context of a character classes.)

I’ve used `{`

for a pattern that is repeated exactly *m*}*m* times, and `{`

for a pattern that is repeated between *m,n*}*m* and *n* times. This is supported in Perl and Python, for example, but not everywhere. You could write `\d\d\d`

instead of `\d{3}`

and `\d?\d?`

instead of `\d{0,2}`

.

[1] The only ICD-10 codes with a non-digit in the third position are those beginning with C4A, C7A, C7B, D3A, M1A, O9A, and Z3A.

]]>But there’s something in popular articles on complexity that bothers me, and it’s the following **logical fallacy**:

Complex systems

canarise from iterating simple rules, therefore many complex systemsdoarise from iterating simple rules.

This may just be a popular misunderstanding of complexity theory, but I also get the impression that some people working in complexity theory implicitly fall for the same fallacy.

What fractals, cellular automata, and such systems show is that it is **possible** to start with simple rules and create a complex system. It says nothing about how often this happens. It does not follow that a particular complex system does in fact arise from iterating simple rules.

There’s a variation on the fallacy that says that while complex systems may not exactly come from iterating simple rules, it’s possible to **approximate** complex systems this way. Even that is dubious, as I discuss in The other butterfly effect.

At best I think these popular models serve as metaphors and cautionary tales. Strange attractors and such show that systems can exhibit unexpectedly complex behavior, and that forecasts can become useless when looking ahead too far. That’s certainly true more broadly.

But I’m skeptical of saying “You have a complex system. I have a complex system. Let’s see if my complex system models your complex system.” It’s often possible to draw loose analogies between complex systems, and these may be useful. But it’s not realistic to expect quantitative guidance to come out of this, such as using the Mandelbrot set to pick stocks.

]]>Eratosthenes had a good idea for finding all primes less than an upper bound *N* over 22 centuries ago.

Make a list of the numbers 2 to *N*. Circle 2, then scratch out all the larger multiples of 2 up to *N*. Then move on to 3. Circle it, and scratch out all the larger multiples of 3.

Every time you start over and find a number that hasn’t been scratched out before, that means it’s not divisible by any numbers smaller than itself, so it’s prime. Circle it and repeat. This algorithm is known as the sieve of Eratosthenes.

You could turn this into an algorithm for factoring every number less than *N* by not just scratching off composite numbers but keeping track of what numbers they were divisible by.

Not bad for an algorithm that predates Hindu-Arabic numerals by nine centuries. But as you might imagine, there have been a few improvements over the last couple millennia.

A paper by Helfgott published last week [1] gives a refined version of the sieve that takes less time and less space. Helfgott is not the first to improve on the sieve of Eratosthenes, but the latest.

His paper shows that it is possible to find all primes less than *N* in time

O(*N* log *N*)

and space

O(*N*^{1/3} (log *N*)^{2/3}).

Furthermore, it is possible to factor all integers less than *N* in time

O(*N* log *N*)

and space

O(*N*^{1/3} (log *N*)^{5/3}).

He also addresses finding all primes and factoring all integers in an interval [*N* – Δ, *N* + Δ] provided

*N*^{1/3} (log *N*)^{2/3} ≤ Δ ≤ *N*.

In the case of such an interval, one can find all the primes in the interval in time O(Δ log *N*) and space O(Δ). And one can factor all integers in the interval in time and space O(Δ log *N*).

Helfgott’s paper gives detailed pseudocode for all the algorithms.

- Odd Goldbach conjecture (proved by Helfgott)
- Big O tilde notation

[1] Harald Andrés Helfgott. An improved sieve of Eratosthenes, Mathematics of Computation, https://doi.org/10.1090/mcom/3438. Published online April 23, 2019.

]]>**Differential equations** are directly and commonly applied. Ask yourself what laws govern the motion of some system, write down these laws as differential equations, then solve them. Statistical models are similarly direct: propose a model and feed it data.

**Linear algebra** is extremely useful in application, but it’s not often applied so directly. Rarely would you look at a system and immediately see a linear system. Instead you might describe a system in terms of differential equations or statistical models, then use linear algebra as a key part of the solution process.

**Numerical analysis** is useful because, for example, it helps you practically solve large linear systems (which help you solve differential equations, which model physical systems).

Many areas of math are useful in application, but some are applied more directly than others. It’s a mistake to say something is not applied just because it is not applied directly.

The following quote from Colin McLarty describes how some of the most abstract math, including **cohomology** and **category theory**, is applied.

Cohomology does not itself solve hard problems in topology or algebra. It clears away tangled multitudes of individually trivial problems. It puts the hard problems in clear relief and makes their solution possible. The same holds for category theory in general.

While McLarty speaks of applications to other areas of math, the same applies to applications to other areas such as software engineering.

I suspect many supposed applications of category theory are *post hoc* glosses, dressing up ideas in categorical language that were discovered in a more tangible setting. At the same time, the applications of category theory may be understated because the theory works behind the scenes. As I discuss here, category theory may lead to insights that are best communicated in the language of the original problem, not in the language of categories.

Some of the ICD-10 codes are awfully specific, and bizarre.

For example,

- V95.4: Unspecified spacecraft accident injuring occupant
- V97.33XA: Sucked into jet engine, initial encounter
- V97.33XD: Sucked into jet engine, subsequent encounter

As I understand it, V97.33XD refers to a subsequent encounter with a health care professional, not a subsequent encounter with a jet engine. But you have to wonder how many people who have been sucked into a jet engine survive to have one, much less two, medical visits.

There is a specific code, Y92.146, for injuries in a prison swimming pool. It seems strange to combine a medical diagnosis and its location into a single code. Is a swimming injury in a prison pool medically different than a swimming injury in a YMCA pool?

I understand that the circumstance of a diagnosis is not recorded strictly for medical reasons. But while 70,000 is an unwieldy large set of codes, it’s kinda small when it has to account for both malady and circumstance. Surely there are 70,000 circumstances alone that are more common than being in a spacecraft, for instance.

Is there a code for being at the opera? Why yes there is: Y92.253. However, there are no codes that are unique to being at a Costco, Walmart, or Jiffy Lube.

A Massachusetts court ruled this week that obtaining real-time cell phone location data requires a warrant.

Utah has passed a law that goes into effect next month that goes further. Police in Utah will need a warrant to obtain location data or to search someone’s electronic files. (Surely electronic files are the contemporary equivalent of one’s “papers” under the Fourth Amendment.)

Vermont passed the nation’s first data broker law. It requires data brokers to register with the state and to implement security measures, but as far as I have read it doesn’t put much restriction what they can do.

Texas law expands HIPAA’s notation of a “covered entity” so that it applies to basically anyone handling PHI (protected health information).

California’s CCPA law goes into effect on January 1. In some ways it’s analogous to GDPR. It will be interesting to see what the law ultimately means in practice. It’s expected that the state legislature will amend the law, and we’ll have to wait on precedents to find out in detail what the law prohibits and allows.

A metaphorical quantum leap is a sudden, large change.

I can’t think of a good metaphor for a small but discrete change. I was reaching for such a metaphor recently and my first thought was “quantum leap,” though that would imply something much bigger than I had in mind.

Sometimes progress comes in small discrete jumps, and only in such jumps. Or at least that’s how it feels.

There’s a mathematical model for this called the single big jump principle. If you make a series of jumps according to a fat tailed probability distribution, most of your progress will come from your largest jump alone.

Your distribution can be continuous, and yet there’s something subjectively discrete about it. If you have a Lévy distribution, for example, your jumps can be any size, and so they are continuous in that sense. But when the lion’s share of your progress comes from one jump, it feels discrete, as if the big jump counted and the little ones didn’t.

- The single big jump principle
- What happens when you add a teller?
- Abelian consulting and Lévy consulting
- A little simplicity goes a long way

[1] A literal quantum leap, such an electron moving from one energy level to another in a hydrogen atom, is on the order of a billionth of a billionth of a joule. A joule is roughly the amount of energy needed to bring a hamburger to your mouth.

]]>One of the differences between amateur and professional software development is whether you’re writing software for yourself or for someone else. It’s like the difference between keeping a journal and being a journalist.

This morning I saw where someone pulled that quote and I thought about how I’m now somewhere that doesn’t fall into either category. **I’m a professional, and a software developer, but not a professional software developer**.

People pay me for work that may require writing software, but they usually don’t want my software per se. They want reports that may require me to write software to generate. My code is directly for my own use but indirectly for someone else. Occasionally I will have a client ask me for software, but not often.

In the last few years I’ve been asked to **read** software more often than I’ve been asked to **write** it. For example, I was an expert witness on an intellectual property case and had to read a lot of source code. But even that project fit into the pattern above: I wrote some code to help me do my review, but my client didn’t care about that code *per se*, only what I found with it.

This was the exponential sum for last Easter last year, April 1, 2018:

and this was the sum for this Easter, today, April 21, 2019:

A partial explanation is that Easter usually falls in April, the fourth month, and the sums for April often have four-fold symmetry. That said, many of the images in April look nothing like a cross. And in fact the image for next Easter, April 12, 2020, will be an asymmetric mess.

]]>This is related to another potentially misleading term, algebraic groups, mentioned in the previous post on isogenies. An algebraic group is a “group object” in the category of algebraic varieties. Note the mysterious use of the word “in.”

You may have heard the statement “A monad is just a monoid in the category of endofunctors.” While true, the statement is meant to be a joke because it abstracted so far from what most people want to know when they ask what a monad is in the context of functional programming. But notice this follows the pattern of an *X* in the category of *Y*‘s, even though here *X* stands for monoid rather than group. The meaning of “in the category of” is the same.

If you want to go down this rabbit hole, you could start with the nLab article on group objects. A group object is a lot like a group, but everything is happening “in” a category.

Take a look at the list of examples. The first says that a group object in the category of sets is a group. That’s reassuring. The second example says that a group object in the category of topological spaces is a topological group. At this point you may get the idea that an *X* in the category of *Y*‘s simply adds an *X* structure to a *Y* thing. But further down you’ll see that a group object in the category of groups is an *Abelian* group, which is an indication that something more subtle is going on.

It’s difficult to look up what an isogeny is. You’ll find several definitions, and they seem incomplete or incompatible.

If you go to Wikipedia, you’ll read “an isogeny is a morphism of algebraic groups that is surjective and has a finite kernel.” The possibly misleading term here is “algebraic group.” It may sound like they’re throwing in the term “algebraic” for clarification as if to say “a group like the kind you’d talk about in math, as opposed to the kind of group you might talk about somewhere else like in sociology.” An algebraic group is indeed a group, but one with additional structure. A “morphism” is a structure-preserving map, and so here “morphism” means a function that preserves the algebraic and topological structure of an algebraic group.

The algebraic groups we care about are elliptic curves, so let’s specialize to elliptic curves. For the rest of this post all definitions and theorems will come from Handbook of Elliptic and Hyperelliptic Curve Cryptography by Cohen and Frey. This book defines *isogeny* implicitly by defining what it means for two curves to be *isogenous*.

Two curves

E/KandE‘/Kare isogenous overKif there exists a morphism φ :E→E‘ with coefficients inKmapping the neutral element ofEto the neutral element ofE‘.

Here *K* is the field the curves are defined over. The neutral element is the identity element of the curve as a group, the point at infinity.

Something strange is going on here. The definition talks about two curves being isogenous. That sounds like a symmetric relationship, but the definition is not symmetric. It specifies the existence of a morphism in one direction but not in the other. But the book goes on to explain that if an isogeny exists in one direction, there necessarily exists a unique isogeny in the opposite direction, the dual isogeny, though it’s not obvious why this should be.

Another puzzling thing at that it doesn’t seem to take much for a function to be an isogeny, just map the group identity to the group identity. But there are other requirements implicit in the statement that φ is a morphism in the context of algebraic groups. Cohen and Frey do not require φ to be a homomorphism, as some authors do, but they point out that in fact φ will turn out to be a group homomorphism.They say “for more background on isogenies, we refer to section 4.3.4.” OK, lets go there.

In that section the authors say that a morphism between Abelian varieties (a special case of algebraic groups which includes elliptic curves) is an isogeny if and only if it is surjective and has a finite kernel. That seems like a much stronger requirement than the definition above. However, in this context simply requiring a morphism to map the group identity to the group identity implies that the morphism will be surjective and have a finite kernel, and vice versa.

An isogeny is a group homomorphism, but not an isomorphism, not in the modern usage of the term. Historically, however, the term “isomorphism” was used for what we now call an isogeny.

We said above that the existence of an isogeny in one direction implied the existence of a dual isogeny in the opposite direction. These functions are not inverses of each other: their composition is not the identity. However, their composition does have a simple form: it is multiplication by an integer *n*, called the degree of the isogeny. (Multiplication here means repeated group addition, i.e. the composition takes an element *x* to the sum of *n* copies of *x* using the group law on the elliptic curve.)

Cohen and Frey give the example

of an isogeny of degree 2 between the elliptic curves

*y*² = *x*³ + 1132*x* + 278

and

*y*² = *x*³ + 500*x* + 105

over the field with 2003 elements. The curves are not isomorphic because the group structure of the former is a cyclic group and the group structure of the latter is not.

Incidentally, there is a theorem that says two elliptic curves over a finite field are isogenous if and only if they have the same number of points. So a brute-force approach to showing that the curves above are isogenous would be to show that they both have the same number of points. And indeed, both have 1956 points.

One class of quantum-resistant encryption methods is isogeny-based encryption. This class stands out for at least a couple methods:

- it uses the shortest keys, and
- it uses the most sophisticated math.

Most post-quantum encryption schemes require much longer keys to maintain current levels of protection, two or three orders of magnitude longer. Isogeny-based encryption uses the shortest keys of any proposed post-quantum encryption methods, requiring keys roughly the same size as are currently in use.

The mathematics behind isogeny-based cryptography is deep. Even a high-level description requires quite a bit of background. I’ll take a shot at exploring the prerequisites starting with this blog post.

Elliptic curve cryptography is widely used today, and partly for one of the reasons listed above: short keys. To achieve a level of security comparable to 128-bit AES, you need a 256-bit key using elliptic curve cryptography, but a 3072-bit key using RSA.

Quantum computers could solve the elliptic curve discrete logarithm problem efficiently, and so elliptic curve cryptography as currently practiced is not quantum resistant. Isogeny-based encryption is based on elliptic curves, but not as directly as current ECC methods. While current ECC methods perform computations on a elliptic curves, isogeny methods are based on networks of functions between elliptic curves.

NIST is sponsoring a competition for post-quantum encryption methods, and only one of the contestants is related to elliptic curves, and that’s SIKE. The name stands for Supersingular Isogeny Key Encapsulation. “Supersingular” describes a class of elliptic curves, and SIKE is based on isogenies between these curves.

This post raises a lot of questions. First and foremost, what is an isogeny? That’s the next post. And what are “supersingular” elliptic curves? I hope to go over that in a future post. Then after exploring the building blocks, where does encryption come in?

I’ve written several related blot posts leading up to this topic from two directions: post-quantum encryption and elliptic curves.