There can only be a finite number of cases, because *n*! grows faster than 10^{n} for *n* > 10, and it’s reasonable to guess that 23 might be the largest case. Turns out it’s not, but it’s close. The only cases where *n*! has *n* digits are 1, 22, 23, and 24. Once you’ve found these by brute force, it’s not hard to show that they must be the only ones because of the growth rate of *n*!.

Is there a convenient way to find the number of digits in n! without having to compute *n*! itself? Sure. For starters, the number of digits in the base 10 representation of a number *x* is

⌊ log_{10} *x* ⌋ + 1.

where ⌊ *z* ⌋ is the floor of *z*, the largest integer less than or equal to *z*. The log of the factorial function is easier to compute than the factorial itself because it won’t overflow. You’re more likely to find a function to compute the log of the gamma function than the log of factorial, and more likely to find software that uses natural logs than logs base 10. So in Python, for example, you could compute the number of digits with this:

from scipy.special import gammaln from math import log, floor def digits_in_factorial(n): return floor( gammaln(n+1)/log(10.0) ) + 1

What about a more elementary formula, one that doesn’t use the gamma function? If you use Stirling’s approximation for factorial and take log of that you should at least get a good approximation. Here it is again in Python:

from math import log, floor, pi def stirling(n): return floor( ((n+0.5)*log(n) - n + 0.5*log(2*pi))/log(10) ) + 1

The code above is exact for every *n* > 2 as far as I’ve tested, up to *n* = 1,000,000. (Note that one million factorial is an extremely large number. It has 5,565,709 digits. And yet we can easily say something about this number, namely how many digits it has!)

The code may break down somewhere because the error in Stirling’s approximation or the limitations of floating point arithmetic. Stirling’s approximation gets more accurate as *n* increases, but it’s conceivable that a factorial value could be so close to a power of 10 that the approximation error pushes it from one side of the power of 10 to the other. Maybe that’s not possible and someone could prove that it’s not possible.

You could extend the code above to optionally take another base besides 10.

def digits_in_factorial(n, b=10): return floor( gammaln(n+1)/log(b) ) + 1 def stirling(n, b=10): return floor( ((n+0.5)*log(n) - n + 0.5*log(2*pi))/log(b) ) + 1

The code using Stirling’s approximation still works for all *n* > 2, even for *b* as small as 2. This is slightly surprising since the number of bits in a number is more detailed information than the number of decimal digits.

My title speaks of “data analysis” not “statistics”, and of “computation” not “computing science”; it does not speak of “mathematics”, but only last. Why? …

My brother-in-squared-law, Francis J. Anscombe has commented on my use of “data analysis” in the following words:

Whereas the content of Tukey’s remarks is always worth pondering, some of his terminology is hard to take. He seems to identify “statistics” with the grotesque phenomenon generally known as “mathematical statistics”, and finds it necessary to replace “statistical analysis” with “data analysis.”

(Tukey calls Anscombe his “brother-in-squared-law” because Anscombe was a fellow statistician as well as his brother-in-law. At first I thought Tukey had said “brother-in-law-squared”, which could mean his brother-in-law’s brother-in-law, but I suppose it was a pun on the role of least-square methods in statistics.)

Tukey later says

I … shall stick to this attitude today, and shall continue to use the words “data analysis”, in part to indicate that we can take probability seriously, or leave it alone, as may from time to time be appropriate or necessary.

It seems Tukey was reserving the term “statistics” for that portion of data analysis which is rigorously based on probability.

]]>As with financial arbitrage, the hard part is spotting opportunities, not necessarily acting on them. If you want to be a hero with regular expressions, as in the xkcd cartoon below, the key isn’t knowing regular expressions well. The key is knowing *about* regular expressions in a context where no one else does.

To spot a technical arbitrage opportunity, you need to know what that technology can and cannot (easily) do. You also need to recognize situations where the technology can help. You don’t need to be able to stand up to a quiz show style oral exam.

**Related post**: You can be a hero with a simple idea

The life he had chosen was a cocoon. Surrounded by a web of old manuscripts and scholarly papers, he would achieve tenure, publish frequently, teach a group of carefully selected graduate students, be treated like a celebrity by the handful of people who had the faintest idea who he was, and go to his grave deluded into thinking he had achieved greatness when in fact he stayed in school all his life. Where was the plunge into the unknown?

I don’t believe the author meant to imply that an academic career is necessarily so insular. In the context of the quote, Ivan says his father, also a scholar, “hadn’t stayed in the cocoon” but had taken great risks. But there is certainly the danger of living in a tiny world and believing that you’re living in a much larger world. Others may face the same danger, though it seems particularly acute for academics.

It’s interesting that Ivan criticizes scholars for avoiding the unknown. Scholars would say they’re all about pursuing the unknown. But scholars pursue known unknowns, questions that can be precisely formulated within the narrow bounds of their discipline. The “plunge into the unknown” here refers to unknown unknowns, messier situations where not only are the outcomes unknown, even the risks themselves are unknown.

]]>There’s only one way a function on the real line can be periodic. But if you think of functions of a complex variable, it makes sense to look at functions that are periodic in two different directions. Sine and cosine are periodic as you move horizontally across the complex plane, but not if you move in any other direction. But you could imagine a function that’s periodic vertically as well as horizontally.

A doubly periodic function satisfies *f*(*x*) = *f*(*x* + ω_{1}) and *f*(*x*) = *f*(*x* + ω_{2}) for all *x* and for two different fixed complex periods, ω_{1} and ω_{2}, with different angular components, i.e. the two periods are not real multiples of each other. For example, the two periods could be 1 and *i*.

How many doubly periodic functions are there? The answer depends on how much regularity you require. If you ask that the functions be differentiable everywhere as functions of a complex variable (i.e. entire), the only doubly periodic functions are constant functions [1]. But if you relax your requirements to allow functions to have singularities, there’s a wide variety of functions that are doubly periodic. These are the *elliptic* functions. They’re periodic in two independent directions, and meromorphic (i.e. analytic except at isolated poles). [2]

What about triply periodic functions? If you require them to be meromorphic, then the only triply periodic functions are constant functions. To put it another way, if a meromorphic function is periodic in three directions, it’s periodic in every direction for every period, i.e. constant. If a function has three independent periods, you can construct a sequence with a limit point where the function is constant, and so it’s constant everywhere.

* * *

[1] Another way to put this is to say that elliptic functions must have at least one pole inside the parallelogram determined by the lines from the origin to ω_{1} and ω_{2}. A doubly periodic function’s values everywhere are repeats of its values on this parallelogram. If the function were continuous over this parallelogram (i.e. with no poles) then it would be bounded over the parallelogram and hence bounded everywhere. But Liovuille’s theorem says a bounded entire function must be constant.

[2] We don’t consider arbitrary singularities, only isolated poles. There are doubly periodic functions with essential singularities, but these are outside the definition of elliptic functions.

]]>Bob Martin published a dialog yesterday about the origin of structured programming, the idea that programs should not be written with `goto`

statements but should use less powerful, more specialized ways to transfer control. Edsgar Dijkstra championed this idea, most famously in his letter Go-to statement considered harmful. Since then there have been countless “considered harmful” articles that humorously allude to Dijkstra’s letter.

Toward the end of the dialog, Uncle Bob’s interlocutor says “Hurray for Dijkstra” for inventing the new technology of structured programming. Uncle Bob corrects him

New Technology? No, no, you misunderstand. … He didn’t invent anything. What he did was to identify something we

shouldn’t do. That’s not a technology. That’s adiscipline.

Huh? I thought Structured Programming made things better.Oh, it did. But not by giving us some new tools or technologies. It made things better by taking away a damaging tool.

The money quote is the last line above: **It made things better by taking away a damaging tool**.

Creating new tools gets far more attention than removing old tools. How might we be better off by letting go of a tool? When our first impulse is that we need a new technology, might we need a new discipline instead?

Few people have ever been able to convince an entire profession to let go of a tool they assumed was essential. If we’re to have any impact, most of us will need to aim much, much lower. It’s enough to improve our personal productivity and possibly that of a few peers. Maybe you personally would be better off without something that is beneficial to most people.

What are some technologies you’ve found that you’re better off not using?

**Related posts**:

Here are a few scattered notes on Julia, especially on how it differs from Python.

- Array indices in Julia start from 1, like Fortran and R, and unlike any recent language that I know of.
- Like Python and many other scripting languages, Julia uses
`#`

for one-line comments. It also adds`#=`

and`=#`

for multi-line comments, like`/*`

and`*/`

in C. - By convention, names of functions that modify their first argument end in
`!`

. This is not enforced. - Blocks are indented as in Python, but there is no colon at the end of the first line, and there must be an
`end`

statement to close the block. - Julia uses
`elseif`

as in Perl, not`elif`

as in Python [1]. - Julia uses square brackets to declare a dictionary. Keys and values are separated with
`=>`

, as in Perl, rather than with colons, as in Python. - Julia, like Python 3, returns 2.5 when given
`5/2`

. Julia has a`//`

division operator, but it returns a rational number rather than an integer. - The number 3 + 4
*i*would be written`3 + 4im`

in Julia and`3 + 4j`

in Python. - Strings are contained in double quotes and characters in single quotes, as in C. Python does not distinguish between characters and strings, and uses single and double quotes interchangeably.
- Julia uses
`function`

to define a function, similar to JavaScript and R, where Python uses`def`

. - You can access the last element of an array with
`end`

, not with -1 as in Perl and Python.

* * *

[1] Actually, Perl uses `elsif`

, as pointed out in the comments below. I can’t remember when to use `else if`

, `elseif`

, `elsif`

, and `elif`

.

Color is a fascinating subject. Important early contributions to our understanding of it came from physicists and mathematicians such as Newton, Young, Grassmann, Maxwell, and Helmholtz. Today, the science of color measurement and description is well established and we rely on it in our daily lives, from when we view images on a computer screen to when we order paint, wallpaper, or a car, of a specified color.

For practical purposes color space, as perceived by humans, is three-dimensional, because our retinas have three different types of cones, which have peak sensitivities at wavelengths corresponding roughly to red, green, and blue. It’s therefore possible to use linear algebra in three dimensions to analyze various aspects of color.

A good example of the use of linear algebra is to understand *metamerism*, which is the phenomenon whereby two objects can appear to have the same color but are actually giving off light having different spectral decompositions. This is something we are usually unaware of, but it is welcome in that color output systems (such as televisions and computer monitors) rely on it.

Mathematically, the response of the cones on the retina to light can be modeled as a matrix-vector product *Af*, where *A* is a 3-by-*n* matrix and *f* is an *n*-vector that contains samples of the spectral distribution of the light hitting the retina. The parameter *n* is a discretization parameter that is typically about 80 in practice. Metamerism corresponds to the fact that *Af*_{1} = *Af*_{2} is possible for different vectors *f*_{1} and *f*_{2}. This equation is equivalent to saying that *Ag* = 0 for a nonzero vector *g* = *f*_{1} – *f*_{2}, or, in other words, that a matrix with fewer rows than columns has a nontrivial null space.

Metamerism is not always welcome. If you have ever printed your photographs on an inkjet printer you may have observed that a print that looked fine when viewed indoors under tungsten lighting can have a color cast when viewed in daylight.

In digital imaging the term *channel* refers to the grayscale image representing the values of the pixels in one of the coordinates, most often R, G, or B (for red, green, and blue) in an RGB image. It is sometimes said that an image has *ten channels*. The number ten is arrived at by combining coordinates from the representation of an image in three different color spaces. RGB supplies three channels, a space called LAB (pronounced “ell-A-B”) provides another three channels, and the last four channels are from CMYK (cyan, magenta, yellow, black), the color space in which all printing is done.

LAB is a rather esoteric color space that separates luminosity (or lightness, the L coordinate) from color (the A and B coordinates). In recent years photographers have realized that LAB can be very useful for image manipulations, allowing certain things to be done much more easily than in RGB. This usage is an example of a technique used all the time by mathematicians: if we can’t solve a problem in a given form then we transform it into another representation of the problem that we can solve.

As an example of the power of LAB space, consider this image of aeroplanes at Schiphol airport.

Suppose that KLM are considering changing their livery from blue to pink. How can the image be edited to illustrate how the new livery would look? “Painting in” the new color over the old using the brush tool in image editing software would be a painstaking task (note the many windows to paint around and the darker blue in the shadow area under the tail). The next image was produced in

just a few seconds.

How was it done? The image was converted from RGB to LAB space (which is a nonlinear transformation) and then the coordinates of the A channel were replaced by their negatives. Why did this work? The A channel represents color on a green–magenta axis (and the B channel on a blue–yellow axis). Apart from the blue fuselage, most pixels have a small A component, so reversing the sign of this component doesn’t make much difference to them. But for the blue, which has a negative A component, this flipping of the A channel adds just enough magenta to make the planes pink.

You may recall from earlier this year the infamous photo of a dress that generated a huge amount of interest on the web because some viewers perceived the dress as being blue and black while others saw it as white and gold. A recent paper What Can We Learn from a Dress with Ambiguous Colors? analyzes both the photo and the original dress using LAB coordinates. One reason for using LAB in this context is its device independence, which contrasts with RGB, for which the coordinates have no universally agreed meaning.

**Nicholas J. Higham** is the Richardson Professor of Applied Mathematics at The University of Manchester, and editor of The Princeton Companion to Applied Mathematics. His article Color Spaces and Digital Imaging in The Princeton Companion to Applied Mathematics gives an introduction to the mathematics of color and the representation and manipulation of digital images. In particular, it emphasizes the role of linear algebra in modeling color and gives more detail on LAB space.

**Related posts**:

Three algorithms for converting to gray scale

FFT and Big Data (quotes from Princeton Companion to Applied Mathematics)

]]>Haskell was designed to enforce programming discipline; R was designed for interactive use. Haskell emphasizes correctness; R emphasizes convenience. Haskell emphasizes computer efficiency; R emphasizes interactive user efficiency. Haskell was written to be a proving ground for programming language theorists. R was written to be a workbench for statisticians. Very different goals lead to very different languages.

When I first heard of a project to mix Haskell and R, I was a little shocked. Could it even be done? Aside from the external differences listed above, the differences in language internals are vast. I’m very impressed that the folks at Tweag I/O were able to pull this off. Their HaskellR project lets you call R from Haskell and vice versa. (It’s primarily for Haskell calling R, though you can call Haskell functions from your R code: Haskell calling R calling Haskell. It kinda hurts your brain at first.) Because the languages are so different, some things that are hard in one are easy in the other.

I used HaskellR while it was under initial development. Our project was written in Haskell, but we wanted to access R libraries. There were a few difficulties along the way, as with any project, but these were resolved and eventually it just worked.

]]>James Cooley and John Tukey (re)discovered the FFT in 1965. It was thought to be an original discovery at the time. Only later did someone find a sketch of the algorithm in the papers of Gauss.

Daniel Rockmore wrote the article on the Fast Fourier Transform in The Princeton Companion to Applied Mathematics. He says

In this new world of 1960s “big data,” a clever reduction in computational complexity (a term not yet widely in use) could make a tremendous difference.

Rockmore adds a very interesting footnote to the sentence above:

Many years later Cooley told me that he believed that the Fast Fourier transform could be thought of as one of the inspirations for asymptotic algorithmic analysis and the study of computational complexity, as previous to the publication of his paper with Tukey very few people had considered data sets large enough to suggest the utility of an asymptotic analysis.

**Related posts**:

For the definition to be complete, you have to specify the first *n* of the *n*-Fibonacci numbers. However, these starting values hardly matter for our purposes. We want to look at the limiting ratio of consecutive *n*-Fibonacci numbers, and this doesn’t depend on the initial conditions. (If you were determined, you could find starting values where this isn’t true. It’s enough to pick integer initial values, at least one of which is not zero.)

As shown in the previous post, the ratio is the largest eigenvalue of an *n* by *n* matrix with 1’s on the first row and 1’s immediately below the main diagonal. The characteristic polynomial of such a matrix is

λ^{n} – λ^{n-1} – λ^{n-2} – … -1

and so we look for the largest zero of this polynomial. We can sum the terms with negative coefficients as a geometric series and show that the eigenvalues satisfy

λ^{n} – 1/(2 – λ) = 0.

So the limiting ratio of consecutive *n*-Fibonacci numbers is the largest root of the above equation. You could verify that when *n* = 2, we get the golden ratio φ as we should, and when *n* = 3 we get around 1.8393 as in the previous post.

As *n* gets large, the limiting ratio approaches 2. You can see this by taking the log of the previous equation.

*n* = -log(2 – λ)/log(λ).

As *n* goes to infinity, λ must approach 2 so that the right side of the equation also goes to infinity.

The eigenvector produced by this process is the eigenvector corresponding to the largest eigenvalue of *A*, largest in absolute value. This assumes *A* has a unique eigenvector associated with its largest eigenvalue. It also assumes you’re not spectacularly unlucky in your choice of vector to start with.

Assume your starting vector *x* has some component in the direction of the *v*, the eigenvector corresponding to the largest eigenvalue. (The vectors that don’t have such a component lie in an *n*-1 dimensional subspace, which would has measure zero. So if you pick a starting vector at random, with probability 1 it will have some component in the direction we’re after. That’s what I meant when I said you can’t start with a spectacularly unlucky initial choice.) Each time you multiply by *A*, the component in the direction of *v* gets stretched more than the components orthogonal to *v*. After enough iterations, the component in the direction of *v* dominates the other components.

What does this have to do with Fibonacci numbers? The next number in the Fibonacci sequence is the sum of the previous two. In matrix form this says

The ratio of consecutive Fibonacci numbers converges to the golden ratio φ because φ is the largest eigenvalue of the matrix above.

The first two Fibonacci numbers are 1 and 1, so the Fibonacci sequence corresponds to repeatedly multiplying by the matrix above, starting with the initial vector *x* = [1 1]^{T}. But you could start with any other vector and the ratio of consecutive terms would converge to the golden ratio, provided you don’t start with a vector orthogonal to [1 φ]^{T}. Starting with any pair of integers, unless both are zero, is enough to avoid this condition, since φ is irrational.

We could generalize this approach to look at other sequences defined by a recurrence relation. For example, we could look at the “Tribonacci” numbers. The Tribonacci sequence starts out 1, 1, 2, and then each successive term is the sum of the three previous terms. We can find the limiting ratio of Tribonacci numbers by finding the largest eigenvalue of the matrix below.

This eigenvalue is the largest root of *x*^{3} – *x*^{2} – *x* – 1 = 0, which is about 1.8393. As before, the starting values hardly matter. Start with any three integers, at least one of them non-zero, and define each successive term to be the sum of the previous three terms. The ratio of consecutive terms in this series will converge to 1.8393.

By the way, you could compute the limiting ratio of Tribonacci numbers with the following bit of Python code:

from scipy import matrix, linalg M = matrix([[1, 1, 1], [1, 0, 0], [0, 1, 0]]) print( linalg.eig(M) )

**Update**: The next post generalizes this one to *n*-Fibonacci numbers.

Here’s the equation in context, from the book Michael Vey 4: Hunt for Jade Dragon. The context is as follows.

Suddenly math problems she hadn’t understood made sense. Except now they weren’t just numbers and equations, they were patterns and colors. Calculus, geometry, and trigonometry were easy to understand, simple as a game, like shooting balls at a basketball hoop that was a hundred feet wide. Then a specific sequence of numbers, letters, and symbols started running through her mind.

She almost said the equation when a powerful thought came over her not to speak it out loud—that she must not ever divulge it. She new that what she was receiving was something of great importance, even if she had no idea what it meant.

I believe the symbol in question is the fourth letter of the Hebrew alphabet, ℸ (daleth).

Is this a real equation? The overall form of it looks like an integral transform. However, the two instances of ℸ in equation look suspicious.

One reason is that I’ve never seen ℸ used in math, though I read somewhere that Cantor used it for the cardinality of some set. Even so, Cantor’s use wouldn’t make sense inside an integral.

Also, the two instances of ℸ are used differently. The first is a function (or else the factors of 2 π could be cancelled out) and the second one apparently is not. Finally, the equation is symmetric in *x* and *y* if you remove the two daleths. So I suspect this was a real equation with the daleths added in for extra mystery.

I didn’t write my clients and leads on balls, but I did write them on index cards. And it helped a great deal. It’s easier to think about projects when you have physical representations you can easily move around. Moving lines up and down in an org-mode file, or even moving boxes around in 2D in OneNote, doesn’t work as well.

Electronic files are great for storing, editing, and querying ideas. But they’re not the best medium for *generating* ideas. See Create offline, analyze online. See also Austin Kleon’s idea of having two separate desks, one digital and one analog.

Suppose you want to find whether a number *n* is divisible by 7. Start with the last digit of *n* and write a 1 under the last digit, and a 3 under the next to last digit. The digits under the digits of *n* cycle through 1, 3, 2, 6, 4, 5, repeatedly until there is something under each digit in *n*. Now multiply each digit of *n* by the digit under it and add the results.

For example, suppose n = 1394. Write the digits 1, 3, 2, and 6 underneath, from right to left:

1394

6231

The sum we need to compute is 1*6 + 3*2 + 9*3 + 4*1 = 6 + 6 + 27 + 4 = 43.

This sum, 43 in our example, has the same remainder when divided by 7 as the original number, 1394 in our example. Since 43 is not divisible by 7, neither is 1394. Not only that, the result of our method has the same remainder by 7 as the number we started with. In our example, 43 leaves a remainder of 1 by 7, so 1394 also leaves a remainder of 1.

You could apply this method repeatedly, though in this case 43 is small enough that it’s easy enough to see that it leaves a remainder of 1.

Suppose you started with a 1000-digit number *n*. Each digit is no more than 9, and is being multiplied by a number no more than 6. So the sum would be less than 54000. So you’ve gone from a 1000-digit number to at most a 5-digit number in one step. One or two more steps should be enough for the remainder to be obvious.

Why does this method work? The key is that the multipliers 1, 3, 2, 6, 4, 5 are the remainders when the powers of 10 are divided by 7. Since 10^{6} has a remainder of 1 when divided by 7, the numbers 10^{6a+b} and 10^{b} have the same remainder by 7, and that’s why the multipliers have period 6.

All the trick is doing is expanding the base 10 representation of a number and adding up the remainders when each term is divided by seven. In our example, 1394 = 1000 + 3*100 + 9*10 + 4, and mod 7 this reduces to 1*6 + 3*2 + 9*3 + 4*1, the exact calculation above.

The trick presented here is analogous to casting out nines. But since every power of 10 leaves a remainder of 1 when divided by 9, all the multipliers in casting out nines are 1.

You could follow the pattern of this method to create a divisibility rule for any other divisor, say 13 for example, by letting the multipliers be the remainders when powers of 10 are divided by 13.

**Related post**: Divisibility rules in hex

((17 + 12√2)^{n} + (17 – 12√2)^{n} – 2)/32

Now 17 – 12√2 is 0.029… and so the term (17 – 12√2)^{n} approaches zero very quickly as *n* increases. So the *f*(*n*) is very nearly

((17 + 12√2)^{n} – 2)/32

The error in the approximation is less than 0.001 for all *n*, so the approximation is *exact* when rounded to the nearest integer. So the *n*th square triangular number is

⌊((17 + 12√2)^{n} +14)/32⌋

where ⌊*x*⌋ is the greatest integer less than *x*.

Here’s how you might implement this in Python:

from math import sqrt, floor def f(n): x = 17 + 12*sqrt(2) return floor((x**n + 14)/32.0)

Unfortunately this formula isn’t that useful in ordinary floating point computation. When *n* = 11 or larger, the result is needs more bits than are available in the significand of a floating point number. The result is accurate to around 15 digits, but the result is longer than 15 digits.

The result can also be computed with a recurrence relation:

*f*(*n*) = 34 *f*(*n*-1) – *f*(*n*-2) + 2

for *n* > 2. (Or *n* > 1 if you define *f*(0) to be 0, a reasonable thing to do).

This only requires integer arithmetic, so it’s only limitation is the size of the integers you can compute with. Since Python has arbitrarily large integers, the following works for very large integers.

def f(n): if n < 2: return n return f(n-1)*34 - f(n-2) + 2

This is a direct implementation, though it’s inefficient because of the redundant function evaluations. It could be made more efficient by, for example, using memoization.

]]>A triangular number is the sum of the first so many positive integers. For example, 10 is a triangular number because it equals 1+2+3+4. These numbers are called triangle numbers because you can form a triangle by having a row of one coin, two coin, three coins, etc. forming a triangle.

The smallest number that is both triangular and square is 1. The next smallest is 36. There are infinitely many numbers that are both triangular and square, and there’s even a formula for the *n*th number that is both a triangle and a square:

((17 + 12√2)^{n} + (17 – 12√2)^{n} – 2)/32

Source: American Mathematical Monthly, February 1962, page 169.

For more on triangle numbers and their generalizations, see Twelve Days of Christmas and Tetrahedral Numbers.

There is also a way to compute the square triangular numbers recursively discussed in the next post.

]]>After the alphabet and the tables of multiplication, nothing has proved quite so useful in my professional life as these six little expressions.

The six expressions he refers to are nicknamed the *vergeet-me-nietjes *in Dutch, which translates to forget-me-nots in English. They are also known as Dr. Myosotis’s equations because myosotis is the genus for forget-me-nots. The equations give the angular and linear deflections of a cantilever beam.

Imagine a beam anchored at one end and free on the other, subject to one of the kinds of load: a bending moment *M* at the opposite end, a point force *P* a the opposite end, or a force *w* distributed over the length of the beam. The equations below give the rotation (angular deflection) and displacement (linear deflection) of the free end of the beam.

Rotation | Displacement | |
---|---|---|

Bending moment | ML/EI |
ML^{2}/2EI |

Point load | PL^{2}/2EI |
PL^{3}/3EI |

Distributed load | wL^{3}/6EI |
wL^{4}/8EI |

Here *E* is the modulus of elasticity, *L* is the length of the beam, and *I* is the area moment of inertia.

People often answer the question with a list of their favorite books, perhaps skewed in favor of long books. But I don’t think you should take books that have been your favorites in the past. You should take what you think would be your favorite books ** on a deserted island**. I expect my tastes there would be very different than they are in my current suburban life.

I think of books that I could only read on a desert island, books that I’ve enjoyed in small portions but ordinarily would not have the patience to read cover-to-cover. For example, I’ve found portions of Donald Knuth’s series The Art of Computer Programming enjoyable and useful, but I can’t say I’ve read it all. Perhaps on a deserted island with little to do and few distractions I’d enjoy going through it carefully line by line, attempting all the exercises. I might even learn MMIX, something I can’t imagine doing under ordinary circumstances.

Along those lines, I might want to take some works by Thomas Aquinas such as his Summa Theologica or his commentaries on Aristotle. The little I’ve read of Aquinas has been profound, and more approachable than I expected. Still, I find it hard to read much of him. Alone on an island I might take the time to read him carefully.

For math, I might want to take Spivak’s differential geometry series, depending on how long I expect to be on this island. If I’m going to be there too long and I’m limited on books, I might want to take something else that’s more dense and less familiar.

For science, I’d take Gravitation by Misner, Thorne, and Wheeler. I’ve intended to read that book for many years and have started a couple times. In college I couldn’t afford this price; now I can’t afford the time.

For fiction, I’d take Patrick O’Brian’s Aubrey/Maturin series because I haven’t read it, I’ve heard it is very well written, and it’s long.

]]>

The root problem turned out to be an assumption that sounds reasonable. You’re making two changes, one to a numerator and one to a denominator. The total change equals the sum of the results of each change separately. Except that’s not so.

At this point, a mathematician would say “Of course you can’t split the effects like that. It’s nonlinear.” But it’s worth pursuing a little further. For one thing, it doesn’t help a general audience to just say “it’s nonlinear.” For another, it’s worth seeing when it is appropriate, at least approximately, to attribute the effects this way.

You start with a numerator and denominator, *N*/*D*, then change *N* to *N*+*n* and change *D* to *D*+*d*. The total change is then (*N*+*n*)/(*D*+*d*) – *N*/*D*.

The result from only the change in the numerator is *n*/*D*. The result from only the change in denominator is *N*/(*D*+*d*) – *N*/*D*.

The difference between the total change and the sum of the two partial changes is

–*nd*/*D*(*D*+*d*).

The assumption that you can take the total change and attribute it to each change separately is wrong in general. But it is correct if *n* or *d* is zero, and it is approximately correct with *nd* is small. This can make the bug harder to find. It could also be useful when *nd* is indeed small and you don’t need to be exact.

Also, if all the terms are positive, the discrepancy is negative, i.e. the total change is less than the sum of the partial changes. Said another way, allocating the change to each cause separately over-estimates the total change.

]]>

**JC**: Thanks for giving me a copy of the book when we were at SciPy 2015. It’s a nice book. It’s about a lot more than computational physics.

**KH**: Right. If you think of it as physical science in general, that’s the group we’re trying to target.

**JC**: Targeting physical science more than life science?

**KH**: Yes. You can see that more in the data structures we cover which are very float-based rather than things like strings and locations.

**AS**: To second that, I’d say that all the examples are coming from the physical sciences. The deep examples, like in the parallelism chapter, are most relevant to physicists.

**JC**: Even in life sciences, there’s a lot more than sequences of base pairs.

**KH**: Right. A lot of people have asked what chapters they should skip. It’s probable that ecologists or social scientists are not going to be interested in the chapter about HDF5. But the rest of the book, more or less, could be useful to them.

**JC**: I was impressed that there’s a lot of scattered stuff that you need to know that you’ve brought into one place. This would be a great book to hand a beginning grad student.

**KH**: That was a big motivation for writing the book. Anthony’s a professor now and I’m applying to be a professor and I can’t spend all my time ramping students up to be useful researchers. I’d rather say “Here’s a book. It’s yours. Come to me if it’s not in the index.”

**JC**: And they’d rather have a book that they could access any time than have to come to you. Are you thinking of doing a second edition as things change over time?

**AS**: It’s on the table to do a second edition eventually. Katy and I will have the opportunity if the book is profitable and the material becomes out of date. O’Reilly could ask someone else to write a second edition, but they would ask us first.

**JC**: Presumably putting out a second edition would not be as much work as creating the first one.

**KH**: I sure hope not!

**AS**: There’s a lot of stuff that’s not in this book. Greg Wilson jokingly asked us when Volume 2 would come out. There may be a need for a more intermediate book that extends the topics.

**KH**: And maybe targets languages other than Python where you’re going to have to deal with configuring and building, installing and linking libraries, that kind of stuff. I’d like to cover more of that, but Python doesn’t have that problem!

**JC**: You may sell a lot of books when the school year starts.

**KH**: Anthony and I both have plans for courses based around this book. Hopefully students will find it helpful.

**JC**: Maybe someone else is planning the same thing. It would be nice if they told you.

**AS**: A couple people have approached us about doing exactly that. Something I’d like to see is for people teaching courses around it to pull their curriculum together.

**JC**: Is there a web site for the book, other than an errata page at the publisher?

**KH**: Sure, there’s physics.codes. Anthony put that up.

**AS**: There’s also a GitHub repo, physics-codes. That’s where you can find code for all the examples, and that should be kept up to date. We also have a YouTube channel.

**JC**: When did y’all start writing the book?

**AS**: It was April or May last year when we finally started writing. There was a proposal cycle six or seven months before that. Katy and I were simultaneously talking to O’Reilly, so that worked out well.

**KH**: In a sense, the book process started for me in graduate school with The Hacker Within and Software Carpentry. A lot of the flows in the book come from the outlines of Hacker Within tutorials and Software Carpentry tutorials years ago.

**AS**: On that note, what happened for me, I took those tutorials and turned them into a masters course for AIMS, African Institute for Mathematical Sciences. At the end I thought it would be nice if this were a book. It didn’t occur to me that there was a book’s worth of material until the end of the course at AIMS. I owe a great debt to AIMS in that way.

**JC**: Is there something else you’d like to say about the book that we haven’t talked about?

**KH**: I think it would be a fun exercise for someone to try to determine which half of the chapters I wrote and which Anthony wrote. Maybe using some sort of clustering algorithm or pun detection. If anyone wants to do that sort of analysis, I’d love to see if you guess right. Open competition. Free beer from Katy if you can figure out which half. We split the work in half, but it’s really mixed around. People who know us well will probably know that Anthony’s chapters have a high density of puns.

**AS**: I think the main point that I would like to see come across is that the book is useful to a broader audience outside the physical sciences. Even for people who are not scientists themselves, it’s useful to describe the mindset of physical scientists to software developers or managers. That communication protocol kinda goes both ways, though I didn’t expect that when we started out.

**JC**: I appreciate that it’s *one* book. Obviously it won’t cover everything you need to know. But it’s not like here’s a book on Linux, here’s a book on git, here are several books on Python. And some of the material in here isn’t in any book.

**KH**: Like licensing. Anthony had the idea to add the chapter on licensing. We get asked all the time “Which license do you use? And why?” It’s confusing, and you can get it really wrong.

* * *

Check out Effective Computation in Physics. It’s about more than physics. It’s a lot of what you need to know to get started with scientific computing in Python, all in one place.

**Related**:

- Scientific computing in Python (notes from SciPy 2015)
- Python resources
- Winston Churchill, Bessie Braddock, and Python

If *b* is a power of 2, we showed in the previous post that the last digit of *P* is *b*-1.

If *b* is odd, we can find the last digit using Euler’s totient theorem. Let φ(*b*) be the number of positive integers less than *b* and relatively prime to *b*. Then Euler’s theorem tells us that 2^{φ(b)} ≡ 1 mod *b *since *b* is odd. So if *r* is the remainder when 57885161 is divided by φ(*b*), then the last digit of *Q* = *P*+1 in base *b* is the same as the last digit of 2^{r} in base *b*.

For example, suppose we wanted to know the last digit of *P* in base 15. Since φ(15) = 8, and the remainder when 57885161 is divided by 8 is 1, the last digit of *Q* base 15 is 2. So the last digit of *P* is 1.

So we know how to compute the last digit in base *b* if *b* is a power or 2 or odd. Factor out all the powers of 2 from *b* so that *b* = 2^{k} *d* and *d* is odd. We can find the last digit base 2^{k}, and we can find the last digit base *d*, so can we combine these to find the last digit base *b*? Yes, that’s exactly what the Chinese Remainder Theorem is for.

To illustrate, suppose we want to find the last digit of *P* in base 12. *P* ≡ 3 mod 4 and *P* ≡ 1 mod 3, so *P* ≡ 7 mod 12. (The numbers are small enough here to guess. For a systematic approach, see the post mentioned above.) So the last digit of *P* is 7 in base 12.

If you’d like to write a program to play around with this, you need to be able to compute φ(*n*). You may have an implementation of this function in your favorite environment. For example, it’s `sympy.ntheory.totient`

in Python. If not, it’s easy to compute φ(*n*) if you can factor *n*. You just need to know two things about φ. First, it’s a multiplicative function, meaning that if *a* and *b* are relatively prime, then φ(*ab*) = φ(*a*) φ(*b*). Second, if *p* is a prime, then φ(*p*^{k}) = *p*^{k} – *p*^{k-1}.

You can convert binary numbers to other bases that are powers of two, say 2^{k}, by starting at the right end and converting to the new base in blocks of size *k*. So in base 4, we’d start at the right end and convert pairs of bits to base 4. Until we get to the left end, all the pairs are 11, and so they all become 3’s in base four. So in base 4, *P* would be written as a 1 followed by 28,942,580 threes.

Similarly, we could write *P* in base 8 by starting at the right end and converting triples of bits to base 8. Since 57885161 = 3*19295053 + 2, we’ll have two bits left over when we get to the left end, so in base 8, *P* would be written as 3 followed by 19,295,053 sevens. And since 57885161 = 4*14471290 + 1, the hexadecimal (base 16) representation of *P* is a 1 followed by 14,471,290 F’s.

What about base 10, i.e. how many digits does *P* have? The number of digits in a positive integer *n* is ⌊log_{10}*n*⌋ + 1 where ⌊*x*⌋ is the greatest integer less than *x*. For positive numbers, this just means chop off the decimals and keep the integer part.

We’ll make things slightly easier for a moment and find how many digits *P*+1 has.

log_{10}(*P*+1) = log_{10}(2^{57885161}) = 57885161 log_{10}(2) = 17425169.76…

and so *P*+1 has 17425170 digits, and so does *P*. How do we know *P* and *P*+1 have the same number of digits? There are several ways you could argue this. One is that *P*+1 is not a power of 10, so the number of digits couldn’t change between *P* and *P*+1.

**Update**: How many digits does *P* have in other bases?

We can take the formula above and simply substitute b for 10: The base *b* representation of a positive integer *n* has ⌊log* _{b}n*⌋ + 1 digits.

As above, we’d rather work with *Q* = *P*+1 than *P*. The numbers *P* and *Q* will have the same number of digits unless *Q* is a power of the base *b*. But since *Q* is a power of 2, this could only happen if *b* is also a power of two, say *b* = 2^{k}, and *k* is a divisor of 57885161. But 57885161 is prime, so this could only happen if *b* = *Q*. If *b* > *P*, then *P* has one digit base *b. S*o we can concentrate on the case *b* < *Q*, in which case *P* and *Q* have the same number of digits. The number of digits in *Q* is then 57885161 log_{b}(2) + 1.

If* b* = 2^{k}, we can generalize the discussion above about bases 4, 8, and 16. Let *q* and *r* be the quotient and remainder when 57885161 is divided by *k*. The first digit is *r* and the remaining *q* digits are *b*-1.

**Next post**: Last digit of largest known prime

]]>

The 61st Fibonacci number is 2504730781961. The 62nd is 4052739537881. Since these end in 1 and 1, the 63rd Fibonacci number must end in 2, etc. and so the pattern starts over.

It’s not obvious that the cycle should have length 60, but it is fairly easy to see that there must be a cycle. There are only 10*10 possibilities for two consecutive digits. Since the Fibonacci numbers are determined by a two-term recurrence, and since the last digit of a sum is determined by the sum of the last digits, the sequence of last digits must repeat eventually. Here “eventually” means after at most 10*10 terms.

Replace “10” by any other base in the paragraph above to show that the sequence of last digits must be cyclic in any base. In base 16, for example, the period is 24. In hexadecimal notation the 25th Fibonacci number is 12511 and the 26th is 1DA31, so the 27th must end in 2, etc.

Here’s a little Python code to find the period of the last digits of Fibonacci numbers working in any base b.

from sympy import fibonacci as f def period(b): for i in range(1, b*b+1): if f(i)%b == 0 and f(i+1)%b == 1: return(i)

This shows that in base 100 the period is 300. So in base 10 the last *two* digits repeat every 300 terms.

The period seems to vary erratically with base as shown in the graph below.

]]>Once at twelve on one night’s drear, ’twas while I, weak and tired thought here

On the words in lots of quaint and odd old tomes of mind’s lost lore,

While I dozed, so near a nap, there came but then a soft, quick tap,

As of one who made a rap, a rap at my front room’s closed door.

“‘Tis some guest,” I spoke, voice low, “who taps at my front room’s closed door,

Well, Just this, and not much more.”

Morice’s entire poem is available here.

**Related post**: A rewriting of The Raven that gives a mnemonic for the digits of pi.

A young developer approached me after a conf talk and said, “You must feel really bad about the failure of object-oriented programming.” I was confused. I said, “What do you mean that object-orient programming was a failure. Why do you think that?”

He said, “OOP was supposed to fix all of our software engineering problems and it clearly hasn’t. Building software today is just as hard as it was before OOP. came along.”

“Have you ever look at the programs we were building in the early 1980s? At how limited their functionality and UIs were? OOP has been an incredible success. It enabled us to manage complexity as we grew from 100KB applications to today’s 100MB applications.”

Of course OOP hasn’t solved all software engineering problems. Neither has anything else. But OOP has been enormously successful in allowing ordinary programmers to write much larger applications. It has become so pervasive that few programmers consciously think about it; it’s simply how you write software.

I’ve written several posts poking fun at the excesses of OOP and expressing moderate enthusiasm for functional programming, but I appreciate OOP. I believe functional programming will influence object oriented programming, but not replace it.

**Related**:

- Some problems simply have no solution.
- Some problems have no simple solution.
- Some problems have many solutions.
- Determining that a solution exists may be half the work of finding it.
- Solutions that work well locally may blow up when extended too far.
- Boundary conditions are the hard part.
- Something that starts out as a good solution may become a very bad solution.
- You can fool yourself by constructing a solution where one doesn’t exist.
- Expand your possibilities to find a solution, then reduce them to see how good the solution is.
- You can sometimes do what sounds impossible by reframing your problem.

**Related posts**:

]]>

The *n*th harmonic number, *H*_{n}, is the sum of the reciprocals of the integers up to and including *n*. For example,

*H*_{4} = 1 + 1/2 + 1/3 + 1/4 = 25/12.

Here’s a curious fact about harmonic numbers, known as **Wolstenholme’s theorem**:

For a prime *p* > 3, the numerator of *H*_{p-1} is divisible by *p*^{2}.

The example above shows this for *p* = 5. In that case, the numerator is not just divisible by *p*^{2}, it *is* *p*^{2}, though this doesn’t hold in general. For example, *H*_{10} = 7381/2520. The numerator 7381 is divisible by 11^{2} = 121, but it’s not equal to 121.

The generalized harmonic numbers *H*_{n,m} are the sums of the reciprocals of the first *n* positive integers, each raised to the power *m*. Wolstenholme’s theorem also says something about these numbers too:

For a prime *p* > 3, the numerator of *H*_{p-1,2} is divisible by *p*.

For example, *H*_{4,2} = 205/144, and the numerator is clearly divisible by 5.

You can play with harmonic numbers and generalized harmonic numbers in Python using SymPy. Start with the import statement

from sympy.functions.combinatorial.numbers import harmonic

Then you can get the *n*th harmonic number with `harmonic(n)`

and generalized harmonic numbers with `harmonic(n, m)`

.

To extract the numerators, you can use the method `as_numer_denom`

to turn the fractions into (numerator, denominator) pairs. For example, you can create a list of the denominators of the first 10 harmonic numbers with

[harmonic(n).as_numer_denom()[0] for n in range(10)]

You might notice that `harmonic(0)`

returns 0, as it should. The sum defining the harmonic numbers is empty in this case, and empty sums are defined to be zero.

Traditional numerical integration routines work well in low dimensions. (“Low” depends on context, but let’s say less than 10 dimensions.) When you have the choice of a well-understood, deterministic method, and it works well, by all means use it. I’ve seen people who are so used to problems requiring Monte Carlo methods that they use these methods on low-dimensional problems where they’re not the most efficient (or robust) alternative.

Except for very special integrands, high dimensional integration requires either Monte Carlo integration or some variation such as quasi-Monte Carlo methods.

As I wrote about here, naive Monte Carlo integration isn’t practical for high dimensions. It’s unlikely that any of your integration points will fall in the region of interest without some sort of importance sampler. Constructing a good importance sampler requires some understanding of your particular problem. (Sorry. No one-size-fits-all black-box methods are available.)

Quasi-Monte Carlo methods (QMC) are interesting because they rely on something like a random sequence, but “more random” in a sense, i.e. more effective at exploring a high-dimensional space. These methods work well when an integrand has low effective dimension. The effective dimension is low when nearly all the action takes place on a lower dimensional manifold. For example, a function may ostensibly depend on 10,000 variables, while for practical purposes 12 of these are by far the most important. There are some papers by Art Owen that say, roughly, that Quasi-Monte Carlo methods work well if and only if the effective dimension is much smaller than the nominal dimension. (QMC methods are especially popular in financial applications.)

While QMC methods can be more efficient than Monte Carlo methods, it’s hard to get bounds on the integration error with these methods. Hybrid approaches, such as randomized QMC can perform remarkably well and give theoretical error bounds.

Markov Chain Monte Carlo (MCMC) is a common variation on Monte Carlo integration that uses **dependent** random samples. In many applications, such as Bayesian statistics, you’d like to be able to create independent random samples from some probability distribution, but this is not practical. MCMC is a compromise. It produces a Markov chain of dependent samples, and asymptotically these samples have the desired distribution. The dependent nature of the samples makes it difficult to estimate error and to determine how well the integration estimates using the Markov chain have converged.

(Some people speak of the Markov chain itself converging. It doesn’t. The *estimates* using the Markov chain may converge. Speaking about the chain converging may just be using informal language, or it may betray a lack of understanding.)

There is little theory regarding the convergence of estimates from MCMC, aside from toy problems. An estimate can appear to have converged, while an important region of the integration problem remains unexplored. Despite its drawbacks, MCMC is the only game in town for many applications.

If you’d like help with high dimensional integration, or any other kind of numerical integration, please contact me to discuss how we might work together.

]]>Every program attempts to expand until it can read mail. Those programs which cannot so expand are replaced by ones which can.

The folks behind Mathematica and Wolfram Alpha announced yesterday now they too can read email.

]]>