Traditionally there was general agreement regarding what is pure math and what is applied. Number theory and topology, for example, are pure, while differential equations and numerical analysis are applied.

But then public key cryptography and topological data analysis brought number theory and topology over into the applied column, at least for some people. And there are people working in differential equations and numerical analysis that aren’t actually interested in applications. It would be more accurate to say that some areas of math are more directly and more commonly applied than others. Also, some areas of math are predominantly filled with people interested in applications and some are not.

The US Army is interested in applying some areas of math that you would normally think of as very pure, including homotopy type theory (HoTT).

From an Army announcement:

Modeling frameworks are desired that are able to eschew the usual computational simplification assumptions and realistically capture … complexities of real world environments and phenomena, while still maintaining some degree of computational tractability. Of specific interest are causal and predictive modeling frameworks, hybrid model frameworks that capture both causal and predictive features, statistical modeling frameworks, and abstract categorical models (cf. Homotopy Type Theory).

And later in the same announcement

Homotopy Type Theory and its applications are such an area that is of significant interest in military applications.

HoTT isn’t the only area of math the Army announcement mentions. There are the usual suspects, such as (stochastic) PDEs, but also more ostensibly pure areas of math such as topology; the word “topological” appears 23 times in the document.

This would be fascinating. It can be interesting when a statistical model works well in application, but it’s no surprise: that’s what statistics was developed for. It’s more interesting when something finds an unexpected application, such as when number theory entered cryptography. The applications the Army has in mind are even more interesting because the math involved is more abstract and, one would have thought, less likely to be applied.

***

One famous example of this is the so-called Freshman’s Dream theorem:

(*a* + *b*)^{p} = *a*^{p} + *b*^{p}

This is not true over the real numbers, but it is true, for example, when working with integers mod *p*.

(More generally, the Freshman’s Dream is true in any ring of characteristic *p*. This is more than an amusing result; it’s useful in applications of finite fields.)

***

A common misunderstanding in calculus is that a series converges if its terms converge to zero. The canonical counterexample is the harmonic series. It’s terms converge to zero, but the sum diverges.

But this can’t happen in the *p*-adic numbers. There if the terms of a series converge to zero, the series converges (though maybe not absolutely).

***

Here’s something sorta along these lines. It looks wrong, and someone might arrive at it via a wrong understanding, but it’s actually correct.

sin(*x* – *y*) sin(*x* + *y*) = (sin(*x*) – sin(*y*)) (sin(*x*) + sin(*y*))

***

Odd integers end in odd digits, but that might not be true if you’re not working in base 10. See Odd numbers in odd bases.

***

You can misunderstand how percentages work, but still get a useful results. See Sales tax included.

***

When probabilities are small, you can often get by with adding them together even when strictly speaking they don’t add. See Probability mistake can make a good approximation.

]]>Karen Uhlenbeck has just received the Abel Prize. Many say that the Fields Medal is the analog of the Nobel Prize for mathematics, but others say that the Abel Prize is a better analog. The Abel prize is a recognition of achievement over a career whereas the Fields Medal is only awarded for work done before age 40.

I had a course from Karen Uhlenbeck in graduate school. She was obviously brilliant, but what I remember most from the class was her candor about things she didn’t understand. She was already famous at the time, having won a MacArthur genius award and other honors, so she didn’t have to prove herself.

When she presented the definition of a manifold, she made an offhand comment that it took her a month to really understand that definition when she was a student. She obviously understands manifolds now, having spent her career working with them.

I found her comment about extremely encouraging. It shows it’s possible to become an expert in something you don’t immediately grasp, even if it takes you weeks to grok its most fundamental concept.

Uhlenbeck wasn’t just candid about things she found difficult in the past. She was also candid about things she found difficult at the time. She would grumble in the middle of a lecture things like “I can never remember this.” She was not a polished lecturer—far from it—but she was inspiring.

(The connection between Karen Uhlenbeck, Ted Odell, and John Tate is that they were all University of Texas math faculty.)

Photo of Karen Uhlenbeck in 1982 by George Bergman [GFDL], via Wikimedia Commons

]]>I keep running into NIST’s eclectic collection of useful information. Three examples:

- My post on Koide’s coincidence references their list of physical constants.
- My post on naming elliptic curves mentions NIST and their FIPS (Federal Information Processing Standards) publications.
- My post on testing the PCG random number generator made use of the NIST Statistical Test Suite.

I wonder what’s going to take me back to NIST next.

]]>The RSA encryption algorithm depends on the fact that computers can easily multiply enormous numbers, but they cannot efficiently factor the product of two enormous primes. Whenever you have something that’s easy to do but hard to undo, you might be able to make an encryption algorithm out of it.

The unbalanced oil and vinegar (UOV) digital signature algorithm is analogous to RSA in that it also depends on the difficulty of factoring. But UOV is based on the difficulty of factoring the composition of a linear and nonlinear operator, not multiplying prime numbers. One advantage of UOV over RSA is that UOV is quantum-resistant. That is, if large quantum computers become practical, UOV signatures will remain hard to forge (or so it is currently believed) whereas RSA signatures would be easy to forge.

Solving large systems of multivariate polynomial equations over finite fields is hard, provably NP-hard, unless there’s some special structure that makes things easier. Several proposed post-quantum digital signature algorithms are based on this, such as the LUOV variant on UOV.

The idea behind UOV is to create systems of equations that have a special structure, with some “oil” variables and some “vinegar” variables, so named because they do not mix, or rather mix in a very simple, convenient way. This special structure is kept secret, and is obscured by composition with an invertible linear operator. This operator acts like a blender, thoroughly mixing the oil and vinegar. The term “unbalanced” refers to the fact that the scheme is more secure if you do not have equal numbers of “oil” and “vinegar” variables.

Someone wanting to sign a file with the UOV algorithm knows the oil-and-vinegar structure and produces a vector that is mapped to a specified value, inverting the composition of the linear operator and the polynomials. They can do this because they know the factorization into this special structure. Someone wanting to verify a UOV signature only knows the (apparently unstructured) composition. They just see a large system of multivariate polynomial equations. They can stick a signature in and verify that the output is what it’s supposed to be, but they couldn’t produce a signature because they can’t invert the system. [1]

How large do these systems of polynomials need to be? On the order of a hundred equations and variables, though with more variables than polynomials. Not that large compared to linear systems, where one can efficiently solve systems with millions of equations and variables. And the polynomial are only quadratic. So in one sense the systems are small. But it takes several kilobytes [2] to describe such systems, which makes the public keys for UOV large relative to currently popular digital signature algorithms such as ECDSA. The signatures produced by UOV are small, but the public keys are large.

[1] The system is not invertible in the sense of being one-to-one because it’s underdetermined. By inverting the system we mean producing any input that maps to the desired output. This solution is not generally unique.

[2] Representing *m* quadratic polynomials in *n* variables over a field of size *b* bits requires *bmn*²/2 bits. So 80 quadratic polynomials in 120 variables over GF(2^{8}) would require 8 × 80 × 120²/2 = 4,608,000 bits = 576 kilobytes. The LUOV variation on UOV mentioned above reduces the key sizes quite a bit, but it still requires larger public keys than ECDSA.

For example, the AES encryption algorithm uses the finite field GF(2^{8}), i.e. the finite field with 2^{8} = 256 elements. Except we need to be a little careful about saying “the” field. Since we’re doing concrete calculations, the choice of irreducible polynomial matters, and AES dictates the polynomial

*x*^{8} + *x*^{4} + *x*^{3} + *x* + 1.

Another example from cryptography is Galois Counter Mode (GCM) which uses the finite field GF(2^{128}), specifying the irreducible polynomial

*x*^{128} + *x*^{7} + *x*^{2} + *x* + 1.

How many other irreducible polynomials are there over GF(2^{8}) or any other field for that matter? We’ll assume the leading coefficient is 1, i.e. we’ll count monic polynomials, because otherwise we can just divide by the leading coefficient.

The number of monic irreducible polynomials of degree *n* over a field with *q* elements is given by

where μ is the Möbius function and the sum is over all positive integers that divide *n*. We can implement this function succinctly in Python.

from sympy import mobius, divisors def I_q(n, q): list = [mobius(d)*q**(n/d) for d in divisors(n)] return sum(list)//n

We can compute `I_q(8, 2)`

to find out there are 30 monic irreducible polynomials of degree 8 with coefficients in GF(2), i.e. with one-bit coefficients. There are 256 monic polynomials—the coefficient of *x*^{k} can be either 0 or 1 for *k* = 0 … 7—but only 30 of these are irreducible. Similarly, there are 2^{128} monic polynomials of degree 128 with binary coefficients, and 2^{121} of them are irreducible. Clearly it’s convenient in applications like GCM to use a polynomial of low weight, i.e. one with few non-zero coefficients.

Note that in the paragraph above we count the number of monic irreducible polynomials with coefficients in GF(2) that we could use in constructing GF(2^{8}). We haven’t considered how many monic irreducible polynomials there are in GF(2^{8}), i.e. with coefficients not just in GF(2) but in GF(2^{8}). That would be a *much* larger number. If we call `I_q(8, 256)`

we get 2,305,843,008,676,823,040.

If your data project is smaller than the US Census, you can probably make differential privacy work.

The planet whose *orbit* is closest to the *orbit* of Earth is clearly Venus. But what *planet* is closest? That changes over time. If Venus is between the Earth and the sun, Venus is the closest planet to Earth. But if Mercury is between the Earth and the sun, and Venus is on the opposite side of the sun, then Mercury is the closest planet to Earth.

On average, Mercury is the closest planet to the Earth, closer than Venus! In fact, Mercury is the closest planet to *every* planet, on average. A new article in Physics Today gives a detailed explanation.

The article gives two explanations, one based on probability, and one based on simulated orbits. The former assumes planets are located at random points along their orbits. The latter models the actual movement of planets over the last 10,000 years. The results are agree to within 1%.

It’s interesting that the two approaches agree. Obviously planet positions are not random. But over time the relative positions of the planets are distributed similarly to if they were random. They’re ergodic.

My first response would be to model this as if the positions were indeed random. But my second thought is that maybe the actual motion of the planets might have resonances that keep the distances from being ergodic. Apparently not, or at least the deviation from being ergodic is small.

*y*² = *x*³ + *ax* + *b*.

where 4*a*³ + 27*b*² ≠ 0. Then later they say “except over fields of characteristic 2 or 3.”

What does characteristic 2 or 3 mean? The order of a finite field is the number of elements it has. The order is always a prime or a prime power. The characteristic is that prime. So another way to phrase the exception above is to say “except over fields of order 2^{n} or 3^{n}.”

If we’re looking at fields not just of *characteristic* 2 or 3, but *order* 2 or 3, there can’t be that many of them. Why not just list them? That’s what I plan to do here.

All elliptic curves over a finite field have the form

*y*² + *a*_{1}*xy* + *a*_{3}*y* = *x*³ + *a*_{2}*x*² + *a*_{4}*x* + *a*_{6},

even over fields of characteristic 2 or 3.

When the characteristic of the field is not 2, this can be simplified to

*y*² = 4*x*³ + *b*_{2}*x*² + 2*b*_{4}*x* + *b*_{6}

where

*b*_{2} = *a*_{1}² + 4*a*_{4},

*b*_{4} = 2*a*_{4} + *a*_{1}*a*_{3}, and

*b*_{6} = *a*_{3}² + 4*a*_{6}.

When the characteristic is at least 5, the form can be simplified further to the one at the top with just two parameters.

The discriminant of an elliptic curve is something like the discriminant of a quadratic equation. You have an elliptic curve if and only if it is not zero. For curves of characteristic at least five, the condition is 4*a*³ + 27*b*², but it’s more complicated for characteristic 2 and 3. To define the discriminant, we’ll need to use *b*_{2}, *b*_{4}, and *b*_{6} from above, and also

*b*_{8} = *a*_{1}²*a*_{6} + 4*a*_{2}*a*_{6} – *a*_{1}*a*_{3}*a*_{4} + *a*_{2}*a*_{3}² – *a*_{4}².

Now we can define the discriminant Δ in terms of all the *b*‘s.

Δ = –*b*_{2}²*b*_{8} – 8*b*_{4}³ – 27*b*_{6}² + 9*b*_{2}*b*_{4}*b*_{6}.

See Handbook of Finite Fields page 423.

Now we can enumerate which parameter combinations yield elliptic curves with the following Python code.

from itertools import product def discriminant(a1, a2, a3, a4, a6): b2 = a1**2 + 4*a4 b4 = 2*a4 + a1*a3 b6 = a3**2 + 4*a6 b8 = a1**2*a6 + 4*a2*a6 - a1*a3*a4 + a2*a3**2 - a4**2 delta = -b2**2*b8 - 8*b4**3 - 27*b6**2 + 9*b2*b4*b6 return delta p = 2 r = range(p) for (a1, a2, a3, a4, a6) in product(r,r,r,r,r): if discriminant(a1, a2, a3, a4, a6)%p != 0: print(a1, a2, a3, a4, a6)

The code above does return the values of the *a*‘s that yield an elliptic curve, but in some sense it returns too many. For example, there are 32 possible combinations of the *a*‘s when working over GF(2), the field with two elements, and 16 of these lead to elliptic curves. But some of these must lead to the same set of points because there are only 4 possible (*x*, *y*) affine points on the curve, plus the point at infinity.

Now we get into a subtle question: when are two elliptic curves the same? Can two elliptic curves have the same set of points and yet be algebraically different? Sometimes, but not usually. Lenstra and Pila [1] proved that two elliptic curves can be equal as sets but not equal as groups if and only if the curve has 5 points and the field has characteristic 2. [2]

Lenstra and Pila give the example of the two equations

*y*² + *y* = *x ³ + x²*

and

*y*² + *y* = *x³ + x*

over GF(2). Both determine the same set of points, but the two curves are algebraically different because (0,0) + (0,0) equals (1,1) on the first curve and (1,0) on the second.

The following Python code will enumerate the set of points on a given curve.

def on_curve(x, y, a1, a2, a3, a4, a6, p): left = y**2 + a1*x*y + a3*y right = x**3 + a2*x**2 + a4*x + a6 return (left - right)%p == 0 def affine_points(a1, a2, a3, a4, a6, p): pts = set() for x in range(p): for y in range(p): if on_curve(x, y, a1, a2, a3, a4, a6, p): pts.add((x,y)) return pts

We can use this code, along with Lenstra and Pila’s result, to enumerate all elliptic curves of small order.

Now we can list all the elliptic curves over the field with two elements.

The two curves in the example of Lendstra and Pila are the only ones over GF(2) with five points. So the two curves of order 5 over GF(2) are

*y*² + *y* = *x³ + x²
*

They determine the same set of points but are algebraically different.

There are four curves of order 4*.*They contain different sets of points, i.e. each omits a different one of the four possible affine points.

*y*² + *xy* = *x*³ + 1

*y*² + *xy* = *x*³ + *x*² + *x*

*y*² + *xy* + *y* = *x*³ + *x*²

*y*² + *xy* + *y* = *x*³ + *x*² + *x*

There are two distinct curves of order 3, each determined by two equations.

The first curve is determined by either of

*y*² + *y* = *x*³

*y*² + *y* = *x*³ + *x*² + *x*

and the second by either of

*y*² + *xy* + *y* = *x*³ + 1

*y*² + *y* = *x*³ + *x*² + *x* + 1

There are 4 curves of order two; each contains a different affine point.

*y*² + *xy* + *y* = *x*³ + 1

*y*² + *xy* + *y* = *x*³ + *x* + 1

*y*² + *xy* = *x*³ + *x*² + 1

*y*² + *xy* = *x*³ + *x*² + *x*

These are curves containing only the point at infinity

*y*² + *y* = *x*³ + *x* + 1

*y*² + *y* = *x*³ + *x*² + 1

There are no affine points because the left side is always 0 and the right side is always 1 for *x* and *y* in {0, 1}.

There are too many elliptic curves over GF(3) to explore as thoroughly as we did with GF(2) above, but I can report the following results that are obtainable using the Python code above.

An elliptic curve over GF(3) contains between 1 and 7 points. Here are the number of parameter combinations that lead to each number of points.

1: 9 2: 22 3: 26 4: 15 5: 26 6: 22 7: 9

Obviously there’s only one curve with one point, the point at infinity, so the nine coefficient combinations that lead to a curve of order 1 determine the same curve.

There are 9 distinct curves of order 2 and 12 distinct curves of order 3. All the curves of orders 4, 5, 6, and 7 are distinct.

[1] H. W. Lenstra, Jr and J. Pila. Does the set of points of an elliptic curve determine the group? Computational Algebra and Number Theory, 111-118.

[2] We are not considering isomorphism classes here. If two curves have a different set of points, or the same set of points but different group properties, we’re considering them different.

]]>John Abowd, chief scientist for the US Census Bureau, gave a talk a few days ago (March 4, 2019) in which he discusses the need for differential privacy and how the bureau is implementing differential privacy for the 2020 census.

Absolutely the hardest lesson in modern data science is the constraint on publication that the fundamental law of information recovery imposes. I usually call it the death knell for traditional method of publication, and not just in statistical agencies.

*y*² = *x*³ + 486662*x*² + *x*

over the prime field with order *p* = 2^{255} – 19. The curve is a popular choice in elliptic curve cryptography because its design choices are transparently justified [1] and because cryptography over the curve can be implemented very efficiently. This post will concentrate on one of the tricks that makes ECC over Curve25519 so efficient.

Curve25519 was designed for fast and secure cryptography. One of the things that make it fast is the clever way Bernstein carries out arithmetic mod 2^{255} – 19 which he describes here.

Bernstein represents numbers mod 2^{255} – 19 by polynomials whose value at 1 gives the number. That alone is not remarkable, but his choice of representation seems odd until you learn why it was chosen. Each number is represented as a polynomial of the form

∑ *u*_{i} *x*^{i}

where each *u*_{i} is an integer multiple *k*_{i} of 2^{⌈25.5i⌉}, and each *k*_{i} is an integer between -2^{25} and 2^{25} inclusive.

Why this limitation on the *k*‘s? Pentium cache optimization. In Bernstein’s words:

Why split 255-bit integers into ten 26-bit pieces, rather than nine 29-bit pieces or eight 32-bit pieces? Answer: The coefficients of a polynomial product do not fit into the Pentium M’s fp registers if pieces are too large. The cost of handling larger coefficients outweighs the savings of handling fewer coefficients.

And why unevenly spaced powers of 2: 1, 2^{26}, 2^{51}, 2^{77}, …, 2^{230}? Some consecutive exponents differ by 25 and some by 26. This looks sorta like a base 2^{25} or base 2^{26} representation, but is a mixture of both. Bernstein answers this in his paper.

Bernstein answers this question as well.

Given that there are 10 pieces, why use radix 2

^{25.5}rather than, e.g., radix 2^{25}or radix 2^{26}? Answer: My ring R contains 2^{255}x^{10}− 19, which represents 0 in Z/(2^{255}− 19). I will reduce polynomial products modulo 2^{255}x^{10}– 19 to eliminate the coefficients ofx^{10},x^{11}, etc. With radix 2^{25}, the coefficient ofx^{10}could not be eliminated. With radix 2^{26}, coefficients would have to be multiplied by 2^{5}19 rather than just 19, and the results would not fit into an fp register.

There are a few things to unpack here.

Remember that we’re turning polynomials in to numbers by evaluating them at 1. So when *x* = 1, 2^{255}*x*^{10} – 19 = *p* = 2^{255} – 19, which is the zero in the integers mod 2^{255} – 19.

If we were using base (radix) 2^{25} , the largest number we could represent with a 9th degree polynomial with the restrictions above would be 2^{250} , so we’d need a 10th degree polynomial; we couldn’t eliminate terms containing *x*^{10}.

I don’t yet see why working with radix 2^{26} would overflow an fp register. If you do see why, please leave an explanation in the comments.

[1] When a cryptographic method has an unjustified parameter, it invites suspicion that the parameter was chosen to create an undocumented back door. This is not the case with Curve25519. For example, why does it use *p* = 2^{255} – 19? It’s efficient to use a prime close to a large power of 2, and this *p* is the closes prime to 2^{255}. The coefficient 486662 is not immediately obvious, but Bernstein explains in his paper how it was the smallest integer that met his design criteria.

I first thought about this when I was working for MD Anderson Cancer Center, maybe around 2002. Our research in adaptive clinical trial methods required bursts of CPU time. We might need hundreds of hours of CPU time for a simulation, then nothing while we figure out what to do next, then another hundreds hours to run a modification.

We were always looking for CPU resources, and we installed Condor to take advantage of idle PCs, something like the SETI at Home or GIMPS projects. Then we had CPU power to spare, sometimes. What could we do between simulations that was worthwhile but not urgent? We didn’t come up with anything.

Fast forward to 2019. You can rent CPU time from Amazon for about 2.5 cents per hour. To put it another way, it’s about 300 times cheaper per hour to rent a CPU than to hire a minimum wage employee in the US. Surely it should be possible to think of something for a computer to do that produces more than 2.5 cents per CPU hour of value. But is it?

Well, there’s cryptocurrency mining. How profitable is that? The answer depends on many factors: which currency you’re mining and its value at the moment, what equipment you’re using, what you’re paying for electricity, etc. I did a quick search, and one person said he sees a 30 to 50% return on investment. I suspect that’s high, but we’ll suppose for the sake of argument there’s a 50% ROI [1]. That means you can make a profit of 30 cents per CPU day.

Can we not thinking of anything for a CPU to do for a day that returns more than 30 cents profit?! That’s mind boggling for someone who can remember when access to CPU power was a bottleneck.

Sometimes computer time is very valuable. But the value of **surplus **computer time is negligible. I suppose it all has to do with bottlenecks. As soon as CPU time isn’t the bottleneck, its value plummets.

**Update**: According to the latest episode of the Security Now podcast, it has become unprofitable for hackers to steal CPU cycles in your browser for crypto mining, primarily because of a change in Monero. Even free cycles aren’t worth using for mining! Mining is only profitable on custom hardware.

***

[1] I imagine this person isn’t renting time from Amazon. He probably has his own hardware that he can run less expensively. But that means his profit margins are so thin that it would not be profitable to rent CPUs at 2.5 cents an hour.

]]>and these chaotic-looking values for your *y* coordinates

you get this image that looks more ordered.

The image above is today’s exponential sum.

]]>Suppose the same message *m* is sent to three recipients and all three use exponent *e* = 3. Each recipient has a different modulus *N*_{i}, and each will receive a different encrypted message

*c*_{i} = *m*³ mod *N*_{i}.

Someone with access to *c*_{1}, *c*_{2}, and *c*_{3} can recover the message *m* as follows. We can assume each modulus *N*_{i} is relatively prime to the others, otherwise we can recover the private keys using the method described here. Since the moduli are relatively prime, we can solve the three equations for *m*³ using the Chinese Remainder Theorem. There is a unique *x* < *N*_{1} *N*_{2} *N*_{3} such that

*x* = *c*_{1} mod *N*_{1}

*x* = *c*_{2} mod *N*_{2}

*x* = *c*_{3} mod *N*_{3}

and *m* is simply the cube root of *x*. What makes this possible is knowing *m* is a positive integer less than each of the *N*s, and that *x* < *N*_{1} *N*_{2} *N*_{3}. It follows that we can simply take the cube root *in the integers* and not the cube root in modular arithmetic.

This is an attack on “textbook” RSA because the weakness in this post could be avoiding by real-world precautions such as adding random padding to each message so that no two recipients are sent the exact same message.

By the way, a similar trick works even if you only have access to one encrypted message. Suppose you’re using a 2048-bit modulus *N* and exchanging a 256-bit key. If you message *m* is simply the key without padding, then *m*³ is less than *N*, and so you can simply take the cube root of the encrypted message in the integers.

Here we’ll work out a specific example using realistic RSA moduli.

from secrets import randbits, randbelow from sympy import nextprime from sympy.ntheory.modular import crt def modulus(): p = nextprime(randbits(2048)) q = nextprime(randbits(2048)) return p*q N = [modulus() for _ in range(3)] m = randbelow(min(N)) c = [pow(m, 3, N[i]) for i in range(3)] x = crt(N, c)[0] assert(cbrt(x) == m) # integer cube root

Note that `crt`

is the Chinese Remainder Theorem. It returns a pair of numbers, the first being the solution we’re after, hence the `[0]`

after the call.

The script takes a few seconds to run. Nearly all the time goes to finding the 2048-bit (617-digit) primes that go into the moduli. Encrypting and decrypting *m* takes less than a second.

[1] I don’t know who first discovered this line of attack, but you can find it written up here. At least in the first edition; the link is to the 2nd edition which I don’t have.

]]>One thing that makes GM interesting is that allows a form of computing on encrypted data that we’ll describe below.

To create a public key, find two large primes *p* and *q* and publish *N* = *pq*. (There’s one more piece we’ll get to shortly.) You keep *p* and *q* private, but publish *N*, much like with RSA.

Someone can send you a message, one bit at a time, by sending you numbers that either do or do not have a square root mod *N*.

If someone wants to send you a 0, they send you a number that has a square root mod *N*. This is easy to do: they select a number between 1 and *N* at random, square it mod *N*, and send you the result.

Determining whether a random number is a square mod *N* is easy if and only if you know how to factor *N*. [1]

When you receive the number, you can quickly tell that it is a square because you know how to factor *N*. The sender knows that it’s a square because he got it by squaring something. You can *produce* a square without knowing how to factor *N*, but it’s computationally infeasible to start with a given number and tell whether it’s a square mod *N*, unless you know the factorization of *N*.

Sending a 1 bit is a little more involved. How can someone who cannot factor *N* produce a number that’s *not* a square? That’s actually not feasible without some extra information. The public key is not just *N*. It’s also a number *z* that is not a square mod *N*. So the full public key is two numbers, *N* and *z*.

To generate a non-square, you first generate a square then multiply it by *z*.

Suppose you choose *p* = 314159 and q = 2718281. (Yes, *p* is a prime. See the post on pi primes. And *q* comes from the first few digits of *e*.) In practice you’d choose *p* and *q* to be very large, hundreds of digits, and you wouldn’t pick them to have a cute pattern like we did here. You publish *N* = *pq* = 853972440679 and imagine it’s too large for anyone to factor (which may be true for someone armed with only pencil and paper).

Next you need to find a number *z* that is not a square mod *N*. You do that by trying numbers at random until you find one that is not a square mod *p* and not a square mod *q*. You can do that by using Legendre symbols, It turns out *z* = 400005 will work.

So you tell the world your public key is (853972440679, 400005).

Someone wanting to send you a 0 bit chooses a number between 1 and *N* = 853972440679, say 731976377724. Then they square it and take the remainder by *N* to get 592552305778, and so they send you 592552305778. You can tell, using Legendre symbols, that this is a square mod *p* and mod *q*, so it’s a square mod *N*.

If they had wanted to send you a 1, they could have sent us 592552305778 * 400005 mod *N* = 41827250972, which you could tell isn’t a square mod *N*.

Homomorphic encryption lets you compute things on encrypted data without having to first decrypt it. The GM encryption algorithm is homomorphic in the sense that you can compute an encrypted form of the XOR of two bits from an encrypted form of each bit. Specifically, if *c*_{1} and *c*_{2} are encrypted forms of bits *b*_{1} and *b*_{2}, then *c*_{1} *c*_{2} is an encrypted form of *b*_{1} ⊕ *b*_{2}. Let’s see why this is, and where there’s a small wrinkle.

Suppose our two bits are both 0s. Then *c*_{1} and *c*_{2} are squares mod *N*, and *c*_{1} *c*_{2} is a square mod *N*.

Now suppose one bit is a 0 and the other is a 1. Then either *c*_{1} is a square mod *N* and *c*_{2} isn’t or vice versa, but in either case their product is not a square mod *N*.

Finally suppose both our bits are 1s. Since 1⊕1 = 0, we’d like to say that *c*_{1} *c*_{2} is a square mod *N*. Is it?

The product of two non-squares is not necessarily a non-square. For example, 2 and 3 are not squares mod 35, and neither is their product 6 [2]. But if we followed the recipe above, and calculated *c*_{1} and *c*_{2} both by multiplying a square by the *z* in the public key, then we’re OK. That is, if *c*_{1} = *x*²*z* and *c*_{2} = *y*²*z*, then *c*_{1}*c*_{2} = *x*²*y*²*z*², which is a square. So if you return non-squares that you find as expected, you get the homomorphic property. If you somehow find your own non-squares, they might not work.

[1] As far as we know. There may be an efficient way to tell whether *x* is a square mod *N* without factoring *N*, but no such method has been published. The problem of actually *finding* modular square roots is equivalent to factoring, but simply telling whether modular square roots exist, without having to produce the roots, may be easier.

If quantum computing becomes practical, then factoring will be efficient and so telling whether numbers are squares modulo a composite number will be efficient.

[2] You could find all the squares mod 35 by hand, or you could let Python do it for you:

>>> set([x*x % 35 for x in range(35)]) {0, 1, 4, 9, 11, 14, 15, 16, 21, 25, 29, 30}]]>

I imagine there’s an elegant analytical solution, but since the title suggested that programming might suffice, I decided to try a little Python. I used `primerange`

from SymPy to generate the list of primes up to 200, and `cumprod`

from NumPy to generate the list of partial products.

cumprod( [(p*p+1)/(p*p-1) for p in primerange(1,200)] )

Apparently the product converges to 5/2, and a plot suggests that it converges very quickly.

Here’s another plot to look more closely at the rate of convergence. Here we look at the difference between 5/2 and the partial products, on a log scale, for primes less than 2000.

]]>

Like Base64, the goal of Base85 encoding is to encode binary data printable ASCII characters. But it uses a larger set of characters, and so it can be a little more efficient. Specifically, it can encode 4 bytes (32 bits) in 5 characters.

There are 95 printable ASCII characters, and

log_{95}(2^{32}) = 4.87

and so it would take 5 characters encode 4 bytes if you use all possible printable ASCII characters. Given that you have to use 5 characters, what’s the smallest base that will still work? It’s 85 because

log_{85}(2^{32}) = 4.993

and

log_{84}(2^{32}) = 5.006.

(If you’re not comfortable with logarithms, see an alternate explanation in the footnote [1].)

Now Base85 is different from the other bases I’ve written about because it **only works on 4 bytes at a time**. That is, if you have a number larger than 4 bytes, you break it into words of 4 bytes and convert each word to Base 85.

The 95 printable ASCII characters are 32 through 126. Base 85 uses characters 33 (“!”) through 117 (‘u’). ASCII character 32 is a space, so it makes sense you’d want to avoid that one. Since Base85 uses a consecutive range of characters, you can first convert a number to a pure mathematical radix 85 form, then add 33 to each number to find its Base85 character.

Suppose we start with the word 0x89255d9, equal to 143807961 in decimal.

143807961 = 2×85^{4} + 64×85^{3} + 14×85^{2} + 18×85 + 31

and so the radix 85 representation is (2, 64, 14, 18, 31). Adding 33 to each we find that the ASCII values of the characters in the Base85 representation are (35, 97, 47, 51, 64), or (‘#’, ‘a’, ‘/’, ‘3’, ‘@’) and so `#a/3@`

is the Base85 encoding of 0x89255d9.

The Z85 encoding method is also based on a radix 85 representation, but it chose to use a different subset of the 95 printable characters. Compared to Base85, Z85 adds seven characters

` v w x y z { }`

and removes seven characters

` ` \ " ' _ , ;`

to make the encoding work more easily with programming languages. For example, you can quote Z85 strings with single or double quotes because neither kind of quote is a valid Z85 character. And you don’t have to worry about escape sequences since the backslash character is not part of a Z85 representation.

There are a couple things that could trip someone up with Base85. First of all, Base 85 only works on 32-bit words, as noted above. For larger numbers it’s not a base conversion in the usual mathematical sense.

Second, the letter z can be used to denote a word consisting of all zeros. Since such words come up disproportionately often, this is a handy shortcut, though it means you can’t just divide characters into groups of 5 when converting back to binary.

[1] 95^{4} = 81450625 < 2^{32} = 4294967296, so four characters from an alphabet of 95 elements is not enough to represent 2^{32} possibilities. So we need at least five characters.

85^{5} = 4437053125 > 2^{32}, so five characters is enough, and in fact it’s enough for them to come from an alphabet of size 85. But 84^{5} = 4182119424 < 2^{32}, so an alphabet of 84 characters isn’t enough to represent 32 bits with five characters.

All three methods have the goal of compactly representing large numbers while maintaining readability. Douglas Crockford’s base32 encoding is the most conservative: it’s case-insensitive and it does not use the letters I, L, O, or U. The first three letters are omitted because of visual similarity to digits, and the last to avoid “accidental obscenities.”

Base 64 is not concerned with avoiding visual similarities, and uses the full upper and lower case alphabet, plus two more symbols, + and /.

Base58 is nearly as efficient as base64, but more concerned about confusing letters and numbers.The number 1, the lower case letter l, and the upper case letter I all look similar, so base58 retains the digit 1 and does not use the lower case letter l or the capital letter I.

The number 0 looks like the lower case letter o and the upper case letter O. Here base58 makes an unusual choice: it keeps the lower case letter o, but does not use the digit 0 or the capital letter O. This is odd because every other encoding that I can think of keep the 10 digits and differs over what letters to use.

Bases like 32 and 64 have the advantage of being trivial to convert back and forth with binary. To convert a binary number to base 2^{n}, you start at the least significant end and convert groups of *n* bits. Since 58 is not a power of 2, converting to base 58 is more involved.

Bitcoin addresses are written in base58, and in fact base58 was developed for Bitcoin.

A Bitcoin address is a 25 byte (200 bit) number. Now

log_{58}2^{200} = 34.14

and so it may take up to 35 characters to represent a Bitcoin address in base58. Using base64 would have taken up to 34 characters, so base58 pays a very small price for preventing a class of errors relative to base64. Base32 would require 40 characters.

As noted above, converting between binary and base58 is more complicated than converting between binary and either base32 or base64. However, converting to base58 is trivial compared to everything else that goes into forming a Bitcoin address. The steps, documented here, involve taking an ECDSA public key, applying a secure hash function three times, and appending a checksum.

First of all, the only reason to implement ChaCha in pure Python is to play with it. It would be more natural and more efficient to implement ChaCha in C.

RFC 8439 gives detailed, language-neutral directions for how to implement ChaCha, including test cases for intermediate results. At its core is the function that does a “quarter round” operation on four unsigned integers. This function depends on three operations:

- addition mod 2
^{32}, denoted`+`

- bitwise XOR, denoted
`^`

, and - bit rotation, denoted
`<<<=n`

.

In C, the `+=`

operator on unsigned integers would do what the RFC denotes by +=, but in Python working with (signed) integers we need to explicitly take remainders mod 2^{32}. The Python bitwise-or operator `^`

can be used directly. We’ll write a function `roll`

that corresponds to `<<<=`

.

So the following line of pseudocode from the RFC

a += b; d ^= a; d <<<= 16;

becomes

a = (a+b) % 2**32; d = roll(d^a, 16)

in Python. One way to implement `roll`

would be to use the `bitstring`

library:

from bitstring import Bits def roll(x, n): bits = Bits(uint=x, length=32) return (bits[n:] + bits[:n]).uint

Another approach, a little harder to understand but not needing an external library, would be

def roll2(x, n): return (x << n) % (2 << 31) + (x >> (32-n))

So here’s an implementation of the ChaCha quarter round:

def quarter_round(a, b, c, d): a = (a+b) % 2**32; d = roll(d^a, 16) c = (c+d) % 2**32; b = roll(b^c, 12) a = (a+b) % 2**32; d = roll(d^a, 8) c = (c+d) % 2**32; b = roll(b^c, 7) return a, b, c, d

ChaCha has a state consisting of 16 unsigned integers. A “round” of ChaCha consists of four quarter rounds, operating on four of these integers at a time. All the details are in the RFC.

Incidentally, the inner workings of the BLAKE2 secure hash function are similar to those of ChaCha.

The name of the project comes from a genus of fern. More on that below as well.

The **one-time pad** is a provably unbreakable way to encrypt things. You create a sheet of random bits and give your counterpart an exact copy. Then when it comes time for you to send an encrypted message, you convert your message to a stream of bits, XOR your message with the random bits you exchanged previously, and send the result. The recipient then takes the XOR of the received message with the pad of random bits, and recovers the original message.

This is called a one-time pad because it’s a pad of bits that you can only use one time. If you reuse a pad, it’s no longer unbreakable.

One-time pads are **impractical** for a couple reasons. First, it’s hard to generate truly random bits, especially in bulk. Second, exchanging the pads is almost as difficult as exchanging messages.

So here’s a bright idea: we’ll get around both of the problems with one-time pads by using pseudorandom bits rather than random bits! The both parties can generate their own random bits.

Many people have had this idea, and it’s not necessarily a bad one. It’s called a **stream cipher**. The problem is that most pseudorandom number generators are not up to the task. You need a cryptographically secure RNG, and most RNGs are far from secure. The ChaCha RNG, however, appears to be good enough to use in a stream cipher, given enough rounds of scrambling [1], and Google is using it for full disk encryption in Android devices.

If you forget your password to your computer, *you* may not be able to access your data, but a thief still could by removing the hard drive and accessing it from another computer. That is, unless the disk is encrypted.

Full disk encryption on a laptop, such as BitLocker on Windows or FileVault on OSX, is usually implemented via AES encryption with hardware acceleration. If you don’t have special hardware for encryption, AES can be too slow.

On low-end devices, ChaCha encryption can be around 5x faster than AES. So Google is using ChaCha for Android devices, using what it calls Adiantum.

You can read the technical details in [2], and you can read more about the ChaCha random number generator in [3].

So where does the name Adiantum come from? It’s a Victorian name for a genus of maidenhair ferns, symbolic of sincerity and discretion.

[1] Adiantum using ChaCha with 12 rounds. TLS 1.3 uses ChaCha with 20 rounds.

[2] Adiantum: length-preserving encryption for entry-level processors by Google employees Paul Crowley and Eric Biggers.

]]>

Representative Katie Porter: My question for you is whether you would be willing to share today your social security, your birth date, and your address at this public hearing.

Equifax CEO Mark Begor: I would be a bit uncomfortable doing that, Congresswoman. If you’d so oblige me, I’d prefer not to.

KP: Could I ask you why you’re unwilling?

MB: Well that’s sensitive information. I think it’s sensitive information that I like to protect, and I think consumers should protect theirs.

KP: My question is then, if you agree that exposing this kind of information, information like that you have in your credit reports, creates harm, therefore you’re unwilling to share it, why are your lawyers arguing in federal court that there was no injury and no harm created by your data breach?

Adi Shamir came up with the idea of using polynomials to share secrets as follows. First, encode the secret you want to share as an integer *a*_{0}. Next, generate *m* = *k*-1 other random integers *a*_{1} through *a*_{m} and use these as coefficients of a polynomial *f* of degree *m*:

A trusted party generates *n* random integers values of *x* and gives each person an *x* and the corresponding value of *f*(*x*). Since *m*+1 points completely determine a *m*th degree polynomial, if *k* = *m*+1 people share their data, they can recover *f*, and thus recover the secret number *a*_{0}. This can be efficiently, for example, by using Lagrange interpolation. But with fewer than *k* data points, the polynomial remains undetermined.

In practice we’d work over the integer modulo a large prime *p*. While fewer than *k* data points will not let someone completely determine the polynomial *f*, it will narrow down the possible coefficients if we’re working over the integers. Working modulo a large prime instead reveals less information.

There’s a possible problem with Shamir’s method. Maybe the trusted party made a mistake. Or maybe the trusted party was dishonest and shouldn’t have been trusted. How can the parties verify that they have been given valid data without unlocking the secret? Seems we’re at a logical impasse since you’d have to recover the polynomial to know if your points are on the polynomial.

Paul Feldman came up with a way to assure the participants that the secret can be unlocked without giving them the information to unlock it. The trick is to give everyone data that *in principle* would let them determine the polynomial, but in *practice* would not.

We choose a large prime *p* such that *p*-1 has a large prime factor *q* [1]. Then the multiplicative group of non-zero integers mod *p* has a subgroup of order *q*. Let *g* be a generator of that group. The idea is to let everyone verify that

for their given (*x*_{i}, *y*_{i}) by letting them verify that

where all calculations are carried out mod *p*. Our trusted party does this by computing

for each coefficient *a*_{i} and letting everyone know *g* and each of the *A _{i}*‘s.

In principle, anyone could solve for *a*_{0} if they know *A*_{0}. But in practice, provided *q* is large enough, this would not be possible because doing so would require solving the **discrete logarithm problem**, which is computationally difficult. It’s possible to compute discrete logarithms for small *q*, but the difficulty goes up quickly as *q* gets larger.

How do the the *A _{i}*‘s let everyone verify that their (

Each person can verify that

using the public data and their personal data, and so they can verify that

[1] Conceptually you pick *p*‘s until you find one so that *p*-1 has a large prime factor *q*. In practice, you’d do it the other way around: search for large primes *q* until you find one such that, say, 2*q* + 1 is also prime.

Image editing software is complicated, and I don’t use it often enough to remember how to do much. I like Paint.NET on Windows because it is in a sort of sweet spot for me, more powerful than Paint and much less complicated than Photoshop.

I found out there’s a program Pinta for Linux that was inspired by Paint.NET. (Pinta runs on Windows, Mac, and BDS as well.)

I have a page that draws a different image every day, based on putting the month, day, and the laws two digits of the year into an exponential sum. This year’s images have been more intricate than last year’s because 19 is prime.

I liked today’s image.

The page has a link to details explaining the equation behind the image, and an animate link to let you see the sequence in which the points are traversed.

Rebecca Herold posted a new episode of her podcast yesterday in which she asks me questions about privacy and artificial intelligence.

I updated my blog post on solving for probability from entropy because Sjoerd Visscher pointed out that a crude approximation I used could be made much more accurate with a minor tweak.

As a bonus, the new error plot looks cool.

Newsletter

My monthly newsletter comes out tomorrow. This newsletter highlights the most popular blog posts of the month.

I used to say something each month about what I’m up to. Then I stopped because it got to be repetitive. Tomorrow I include a few words about projects I have coming up.

I was helping my daughter with physics homework last night and she said “Why do they use *s* for arc length?!” I said that I don’t know, but that it is conventional.

Yes, it appears ‘Spatium’

Euler referenced this— Cole Shackelford (@_cole_s) February 27, 2019

By the way, this section heading is a reference to Donald Knuth’s essay The Letter S where he writes in delightful Knuthian detail about the design of the letter *S* in TeX. You can find the essay in his book Literate Programming.

Felsing apparently is able to remember the syntax of scores of tools and programming languages. I cannot. Part of the reason is practice. I cannot remember the syntax of any software I don’t use regularly. It’s tempting to say that’s the end of the story: use it or lose it. Everybody has their set of things they use regularly and remember.

But I don’t think that’s all. I remember bits of math that I haven’t used in 30 years. Math fits in my head and sticks. Presumably software syntax sticks in the heads of people who use a lot of software tools.

There is some software syntax I can remember, however, and that’s software closely related to math. As I commented here, it was easy to come back to Mathematica and LaTeX after not using them for a few years.

Imprinting has something to do with this too: it’s easier to remember what we learn when we’re young. Felsing says he started using Linux in 2006, and his site says he graduated college in 2012, so presumably he was a high school or college student when he learned Linux.

When I was a student, my software world consisted primarily of Unix, Emacs, LaTeX, and Mathematica. These are all tools that I quit using for a few years, later came back to, and use today. I probably remember LaTeX and Mathematica syntax in part because I used it when I was a student. (I also think Mathematica in particular has an internal consistency that makes its syntax easier to remember.)

I see the value in Felsing’s choice of tools. For example, the xmonad window manager. I’ve tried it, and I could imagine that it would make you more productive if you mastered it. But I don’t see myself mastering it.

I’ve learned a few tools with lots of arbitrary syntax, e.g. Emacs. But since I don’t have a prodigious memory for such things, I have to limit the number of tools I try to keep loaded in memory. Other things I load as needed, such as a language a client wants me to use that I haven’t used in a while.

Revisiting a piece of math doesn’t feel to me like revisiting a programming language. Brushing up on something from differential equations, for example, feels like pulling a book off a mental shelf. Brushing up on C# feels like driving to a storage unit, bringing back an old couch, and struggling to cram it in the door.

There are things you use so often that you remember their syntax without trying. And there are things you may never use again, and it’s not worth memorizing their syntax just in case. Some things in the middle, things you don’t use often enough to naturally remember, but often enough that you’d like to deliberately remember them. Some of these are what I call bicycle skills, things that you can’t learn just-in-time. For things in this middle ground, you might try something like Anki, a flashcard program with spaced repetition.

However, this middle ground should be very narrow, at least in my experience/opinion. For the most part, if you don’t use something often enough to keep it loaded in memory, I’d say either let it go or practice using it regularly.

The Miller-Rabin test is actually a sequence of tests, one for each prime number. First you run the test associated with 2, then the test associated with 3, then the one associated with 5, etc. If we knew the smallest numbers for which these tests fail, then for smaller numbers we know for certain that they’re prime if they pass. In other words, we can turn the Miller-Rabin test for *probable* primes into test for *provable* primes.

A recent result by Yupeng Jiang and Yingpu Deng finds the smallest number for which the Miller-Rabin test fails for the first nine primes. This number is

*N* = 3,825,123,056,546,413,051

or more than 3.8 quintillion. So if a number passes the first nine Miller-Rabin tests, and it’s less than *N*, then it’s prime. Not just a probable prime, but definitely prime. For a number *n* < *N*, this will be more efficient than running previously known deterministic primality tests on *n*.

Let’s play with this in Python. The SymPy library implements the Miller-Rabin test in a function `mr`

.

The following shows that *N* is composite, and that it is a false positive for the first nine Miller-Rabin tests.

from sympy.ntheory.primetest import mr N = 3825123056546413051 assert(N == 149491*747451*34233211) ps = [2, 3, 5, 7, 11, 13, 17, 19, 23] print( mr(N, ps) )

This doesn’t prove that *N* is the *smallest* number with these properties; we need the proof of Jiang and Deng for that. But assuming their result is right, here’s an efficient deterministic primality test that works for all *n* less than *N*.

def is_prime(n): N = 3825123056546413051 assert(n < N) ps = [2, 3, 5, 7, 11, 13, 17, 19, 23] return mr(n, ps)

Jiang and Deng assert that *N* is also the smallest composite number to slip by the first 10 and 11 Miller-Rabin tests. We can show that *N* is indeed a strong pseudoprime for the 10th and 11th primes, but not for the 12th prime.

print( mr(N, [29, 31]) ) print( mr(N, [37]) )

This code prints `True`

for the first test and `False`

for the second. That is, *N* is a strong pseudoprime for bases 29 and 31, but not for 37.

Fermat’s little theorem says that if *n* is prime, then

*a*^{n-1} = 1 mod *n*

for all 0 < *a* < *n*. This gives a necessary but not sufficient test for primality. A (Fermat) pseudoprime for base *a* is a composite number *n* such that the above holds, an example of where the test is not sufficient.

The Miller-Rabin test refines Fermat’s test by looking at additional necessary conditions for a number being prime. Often a composite number will fail one of these conditions, but not always. The composite numbers that slip by are called *strong* pseudoprimes or sometimes Miller-Rabin pseudoprimes.

Miller and Rabin’s extra testing starts by factoring *n*-1 into 2^{s}*d* where *d* is odd. If *n* is prime, then for all 0 < *a* < *n* either

*a ^{d}* = 1 mod

or

*a*^{2kd} = -1 mod *n*

for all *k* satisfying 0 ≤ *k* < *s*. If one of these two conditions holds for a particular *a*, then *n* passes the Miller-Rabin test for the base *a*.

It wouldn’t be hard to write your own implementation of the Miller-Rabin test. You’d need a way to work with large integers and to compute modular exponents, both of which are included in Python without having to use SymPy.

561 is a pseudoprime for base 2. In fact, 561 is a pseudoprime for every base relatively prime to 561, i.e. it’s a Carmichael number. But it is not a strong pseudoprime for 2 because 560 = 16*35, so *d* = 35 and

2^{35} = 263 mod 561,

which is not congruent to 1 or to -1. In Python,

>>> pow(2, 560, 561) 1 >>> pow(2, 35, 561) 263

*y*² = *x*³ + *ax* + *b*.

There are a few things missing from this definition, as indicated before, one being the mysterious “point at infinity.” I gave a hand-waving explanation that you could get rid of this exceptional point by adding an additional coordinate. Here I’ll describe that in more detail.

You could add another coordinate *z* that’s a sort of silent partner to *x* and *y* most of the time. Instead of pairs of points (*x*, *y*), we consider equivalence classes of points (*x*, *y*, *z*) where two points are equivalent if each is a non-zero multiple of the other [1]. It’s conventional to use the notation (*x* : *y* : *z*) to denote the equivalence class of (*x*, *y*, *z*).

In this construction, the equation of an elliptic curve is

*y*²*z* = *x*³ + *axz*² + *bz*³.

Since triples are in the same equivalence class if each is a multiple of the other, we can usually set *z* equal to 1 and identify the pair (*x*, *y*) with (*x *: *y* : 1). The “point at infinity” corresponds to the equivalence class (0 : 1 : 0).

From a programming perspective, you could think of *z* as a finiteness flag, a bit that is set to indicate that the other two coordinates can be taken at face value.

This three-coordinate version is called projective coordinates. Textbooks usually start out by defining projective space and then say that an elliptic curve is a set of points in this space. But if you’re focused on the elliptic curve itself, you can often avoid thinking of the projective space it sits in.

One way to think of projective space is that we add a dimension, the extra coordinate, then subtract a dimension by taking equivalence classes. By doing so we almost end up back where we started, but not quite. We have a slightly larger space that includes a couple “points at infinity,” one of which will be on our curve.

It’s inconvenient to carry around an extra coordinate that mostly does nothing. But it’s also inconvenient to have a mysterious extra point. So which is better? Much of the time you can ignore both the point at infinity and the extra coordinate. When you can’t, you have a choice which way you’d rather think of things. The point at infinity may be easier to think about conceptually, and projective coordinates may be better for doing proofs.

Let’s get concrete. We’ll look at the curve

*y*² = *x*³ + *x* + 1

over the integers mod 5. There are nine points on this curve: (0, ±1), (2, ±1), (3, ±1), (4, ±2), and ∞. (You could replace -1 with 4 and -2 with 3 if you’d like since we’re working mod 5.)

In the three-coordinate version, the points are (0 : ±1 : 1), (2 : ±1 : 1), (3 : ±1 : 1), (4 : ±2 : 1), and (0 : 1 : 0).

[1] We leave out (0, 0, 0). It doesn’t exist in the world we’re constructing, i.e. projective space.

]]>Would you like more students to major in STEM subjects? OK, what subjects would you like fewer students to major in? English, perhaps? Administrators are applauded when they say they’d like to see more STEM majors, but they know better than to say which majors they’d like to see fewer of.

We have a hard time with constraints.

I’m all for win-win, make-the-pie-bigger solutions when they’re possible. And often they are. But sometimes they’re not.

]]>Suppose you have a linear regression with a couple predictors and no intercept term:

β_{1}*x*_{1} + β_{2}*x*_{2} = *y* + ε

where the *x*‘s are inputs, the β are fixed but unknown, *y* is the output, and ε is random error.

Given *n* observations (*x*_{1}, *x*_{2}, *y* + ε), linear regression estimates the parameters β_{1} and β_{2}.

I haven’t said, but I implicitly assumed all the above numbers are real. Of course they’re real. It would be strange if they weren’t!

Well, we’re about to do something strange. We’re going to pick a prime number *p* and do our calculations modulo *p* except for the addition of the error ε. Our inputs (*x*_{1}, *x*_{2}) are going to be pairs of integers. Someone is going to compute

*r* = β_{1}*x*_{1} + β_{2}*x*_{2} mod *p*

where β_{1} and β_{2} are secret integers. Then they’re going to tell us

*r*/*p* + ε

where ε is a random variable on the interval [0, 1]. We give them *n* pairs (*x*_{1}, *x*_{2}) and they give back *n* values of *r*/*p* with noise added. Our job is to infer the βs.

This problem is called **learning with errors** or **LWE**. It’s like linear regression, but much harder when the problem size is bigger. Instead of just two inputs, we could have *m* of inputs with *m* secret coefficients where *m* is large. Depending on the number of variables *m*, the number of equations *n*, the modulus *p*, and the probability distribution on ε, the problem may be possible to solve but computationally very difficult.

Why is it so difficult? Working mod *p* is discontinuous. A little bit of error might completely change our estimation of the solution. If *n* is large enough, we could recover the coefficients anyway, using something like least squares. But how would we carry that out? If *m* and *p* are small we can just try all *p*^{m} possibilities, but that’s not going to be practical if *m* and *p* are large.

In linear regression, we assume there is some (approximately) linear process out in the real world that we’re allowed to reserve with limited accuracy. Nobody’s playing a game with us, that just how data come to us. But with LWE, we are playing a game that someone has designed to be hard. Why? For cryptography. In particular, quantum-resistant cryptography.

Variations on LWE are the basis for several proposed encryption algorithms that believed to be secure even if an adversary has access to a quantum computer.

The public key encryption systems in common use today would all be breakable if quantum computing becomes practical. They depend on mathematical problems like factoring and discrete logarithms being computationally difficult, which they appear to be with traditional computing resources. But we know that these problems could be solved in polynomial time on a quantum computer with Shor’s algorithm. But LWE is a hard problem, even on a quantum computer. Or so we suspect.

The US government’s National Institute of Standards and Technology (NIST) is holding a competition to identify quantum-resistant encryption algorithms. Last month they announced 26 algorithms that made it to the second round. Many of these algorithms depend on LWE or variations.

One variation is **LWR** (**learning with rounding**) which uses rounding rather than adding random noise. There are also ring-based counterparts **RLWE** and **RLWR** which add random errors and use rounding respectively. And there are polynomial variations such as **poly-LWE** which uses a polynomial-based learning with errors problem. The general category for these methods is **lattice methods**.

Of the public-key algorithms that made it to the second round of the the NIST competition, 9 out of 17 use lattice-based cryptography:

- CRYSTALS-KYBER
- FrodoKEM
- LAC
- NewHope
- NTRU
- NTRU Prime
- Round5
- SABER
- Three Bears

Also, two of the nine digital signature algorithms are based on lattice problems:

- CRYSTALS-DILITHIUM
- FALCON

Based purely on the names, and not on the merits of the algorithms, I hope the winner is one of the methods with a science fiction allusion in the name.

Elliptic curves have been studied for many years by pure mathematicians with no intention to apply the results to anything outside math itself. And yet elliptic curves have become a critical part of applied cryptography.

Elliptic curves are very concrete. There are some subtleties in the definition—more on that in a moment—but they’re essentially the set of point satisfying a simple equation. And yet a lot of extremely abstract mathematics has been developed out of necessity to study these simple objects. And while the objects are in some sense simple, the questions that people naturally ask about them are far from simple.

A preliminary definition of an elliptic curve is the set of points satisfying

*y*² = *x*³ + *ax* + *b*.

This is a theorem, not a definition, and it requires some qualifications. The values *x*, *y*, *a*, and *b* come from some field, and that field is an important part of the definition of an elliptic curve. If that field is the real numbers, then all elliptic curves do have the form above, known as the Weierstrass form. For fields of characteristic 2 or 3, the Weierstrass form isn’t general enough. Also, we require that

4*a*³ + 27*b*² ≠ 0.

The other day I wrote about Curve1174, a particular elliptic curve used in cryptography. The points on this curve satisfy

*x*² + *y*² = 1 – 1174 *x*² *y*²

This equation does *not* specify an elliptic curve if we’re working over real numbers. But Curve1174 is defined over the integers modulo *p* = 2^{251} – 9. There it *is* an elliptic curve. It is equivalent to a curve in Weierstrass, though that’s not true when working over the reals. So whether an equation defines an elliptic curve depends on the field the constituents come from.

An elliptic curve is not an ellipse, and it may not be a curve in the usual sense.

There is a connection between elliptic curves and ellipses, but it’s indirect. Elliptic curves are related to the integrals you would write down to find the length of a portion of an ellipse.

Working over the real numbers, an elliptic curve is a curve in the geometric sense. Working over a finite field, an elliptic curve is a finite set of points, not a continuum. Working over the complex numbers, an elliptic curve is a two-dimensional surface. The name “curve” is extended by analogy to elliptic curves over general fields.

In this section we’ll give the full definition of an algebraic curve, though we’ll be deliberately vague about some of the details.

The definition of an elliptic curve is not in terms of equations of a particular form. It says an elliptic curve is a

- smooth,
- projective,
- algebraic curve,
- of genus one,
- having a specified point
*O*.

Working over real numbers, **smoothness** can be specified in terms of derivatives. But that does smoothness mean working over a finite field? You take the derivative equations from the real case and extend them by analogy to other fields. You can “differentiate” polynomials in settings where you can’t take limits by defining derivatives algebraically. (The condition 4*a*³ + 27*b*² ≠ 0 above is to guarantee smoothness.)

Informally, **projective** means we add “points at infinity” as necessary to make things more consistent. Formally, we’re not actually working with pairs of coordinates (*x*, *y*) but equivalence classes of triples of coordinates (*x, *y*, *z). You can usually think in terms of pairs of values, but the extra value is there when you need it to deal with points at infinity. More on that here.

An **algebraic curve** is the set of points satisfying a polynomial equation.

The **genus** of an algebraic curve is roughly the number of holes it has. Over the complex numbers, the genus of an algebraic curve really is the number of holes. As with so many ideas in algebra, a theorem from a familiar context is taken as a definition in a more general context.

The **specified point O**, often the point at infinity, is the location of the identity element for the group addition. In the post on Curve1174, we go into the addition in detail, and the zero point is (0, 1).

In elliptic curve cryptography, it’s necessary to specify another point, a **base point**, which is the generator for a subgroup. This post gives an example, specifying the base point on secp256k1, a curve used in the implementation of Bitcoin.

This is a good move, but unnecessary. Here’s what I mean by that. The update was likely unnecessary for reasons I’ll explain below, but it was easy to do, and it increased consistency across Microsoft’s product line. It’s also good PR.

Let’s back up a bit. SHA-1 and SHA-2 are secure hash functions [1]. They take a file, in this case a Microsoft software update, and return a relatively small number, small relative to the original file size. In the case of SHA-1, the result is 160 bits (20 bytes). They’re designed so that if a file is changed, the function value is nearly certain to change. That is, it’s extremely unlikely that a change to the file would not result in a change to the hash value.

The concern isn’t accidental changes. The probability of accidentally producing two files with the same hash function value is tiny as I show here.

The concern is a clever attacker who could modify the software update in such a way that the hash function remains unchanged, bypassing the hash as a security measure. That would be harder to do with SHA-2 than with SHA-1, hence Microsoft’s decision years ago to move to SHA-2 for new versions of the operating system, and its recent decision to make the change retroactive.

By a collision we mean two files that hash to the same value. It’s obvious from the pigeon hole principle [2] that collisions are possible, but how hard are they to produce deliberately?

Google demonstrated two years ago that it could produce two PDF files with the same SHA-1 hash value. But doing so required over 6,500 years of CPU time running in parallel [3]. Also, Google started with a file designed to make collisions possible. According to their announcement,

We started by creating a PDF prefix specifically crafted to allow us to generate two documents with arbitrary distinct visual contents, but that would hash to the same SHA-1 digest.

It would be harder to start with a specified input, such as a software update file and generate a collision. It would be harder still to generate a collision that had some desired behavior.

According to this page, it’s known how to tamper with *two* files simultaneously so that they will have the same SHA-1 hash values. This is what Google did, at the cost of thousands of CPU years. But so far, nobody has been able to start with a given file and create another file with the same SHA-1 value.

As I said at the beginning, it made sense for Microsoft to decide to move from SHA-1 to SHA-2 because the cost of doing so was small. But the use of SHA-1 hash codes is probably not the biggest security risk in Windows 7.

- Putting Google’s SHA-1 collision in perspective
- Probability of secure hash function collisions
- The hash function menagerie

[1] SHA-1 is a hash function, but SHA-2 is actually a family of hash functions: SHA-224, SHA-256, SHA-384, SHA-512, SHA-512/224, and SHA-512/256. All are believed to provide at least 112 bits of security, while SHA-1 provides less than 63.

The SHA-*x* functions output *x* bits. The SHA-*x*/*y* functions use *x* bits of internal state and output *y* bits. To be consistent with this naming convention, SHA-1 should be called SHA-160.

[2] The pigeon hole principle says that if you put more than *n* things into *n* boxes, one of the boxes has to have more than one thing. If you hash files of more than *n* bits to *n*-bit numbers, at least two files have to go to the same value.

[3] If you were to rent this much CPU time in the cloud at 5 cents per CPU hour, it would cost about $2,800,000. If the only obstacle were the cost of computing resources, someone might be willing to pay that to tamper with a Microsoft update.

]]>