I ran across the above quote from Hamming this morning. It made me wonder whether I tried to prepare students for my past when I used to teach college students.

How do you prepare a student for the future? Mostly by focusing on skills that will always be useful, even as times change: logic, clear communication, diligence, etc.

Negative forecasting is more reliable here than positive forecasting. It’s hard to predict what’s going to be in demand in the future (besides timeless skills), but it’s easier to predict what’s probably not going to be in demand. The latter aligns with Hamming’s exhortation not to prepare students for your past.

]]>]]>… in an ideal world, people would learn this material over many years, after having background courses in commutative algebra, algebraic topology, differential geometry, complex analysis, homological algebra, number theory, and French literature.

]]>It is always strange and painful to have to change a habit of mind; though, when we have made the effort, we may find a great relief, even a sense of adventure and delight, in getting rid of the false and returning to the true.

Now spin that rose around a vertical line a distance *R* from the center of the rose. This makes a torus (doughnut) shape whose cross sections look like the rose above. You could think of having a cutout shaped like the rose above and extruding Play-Doh through it, then joining the ends in a loop.

In case you’re curious, the image above was created with the following Mathematica command:

RevolutionPlot3D[{4 + Cos[5 t] Cos[t], Cos[5 t] Sin[t]}, {t, 0, 2 Pi}]

What would the volume of resulting solid be?

This sounds like a horrendous calculus homework problem, but it’s actually quite easy. A theorem by Pappus of Alexandria (c. 300 AD) says that the volume is equal to the area times the circumference of the circle traversed by the centroid.

The area of a rose of the form *r* = cos(*k*θ) is simply π/2 if *k* is even and π/4 if *k* is odd. (Recall from the previous post that the number of petals is 2*k* if *k* is even and *k* if *k* is odd.)

The location of the centroid is easy: it’s at the origin by symmetry. If we rotate the rose around a vertical line *x* = *R* then the centroid travels a distance 2π*R*.

So putting all the pieces together, the volume is π²*R* if *k* is even and half that if *k* is odd. (We assume *R* > 1 so that the figure doesn’t intersect itself and some volume get counted twice.)

We can also find the surface area easily by another theorem of Pappus. The surface area is just the arc length of the rose times the circumference of the circle traced out by the centroid. The previous post showed how to find the arc length, so just multiply that by 2π*R*.

(I rotated the graph 36° so it would be symmetric about the vertical axis rather than the horizontal axis.)

The arc length of a curve in polar coordinates is given by

and so we can use this find the length. The integral doesn’t have a closed form in terms of elementary functions. Instead, the result turns out to use a special function *E*(*x*), the “complete elliptic integral of the second kind,” defined by

Here’s the calculation for the length of a rose:

So the arc length of the rose *r* = cos(*k*θ) with θ running from 0 to 2π is 4 *E*(-*k*² + 1). You can calculate *E* in SciPy with `scipy.special.ellipe`

.

If we compute the length of the rose at the top of the post, we get 4 *E*(-24) = 21.01. Does that pass the sniff test? Each petal goes from *r* = 0 out to *r* = 1 and back. If the petal were a straight line, this would have length 2. Since the petals are curved, the length of each is a little more than 2. There are five petals, so the result should be a little more than 10. But we got a little more than 20. How can that be? Since 5 is odd, the rose with *k* = 5 traces each petal twice, so we should expect a value of a little more than 20, which is what we got.

As *k* gets larger, the petals come closer to being straight lines. So we should expect that 4*E*(-*k*² + 1) approaches 4*k* as *k* gets large. The following plot of *E*(-*k*² + 1) – *k* provides empirical support for this conjecture by showing that the difference approaches 0, and gives an idea of the rate of convergence. It should be possible to prove that, say, that *E*(-*k*²) asymptotically approaches *k*, but I haven’t done this.

**Related posts**:

The proof is surprisingly simple. Start with the following:

Now integrate the first and last expressions between two points *a* and *b*. Note that the former integral gives the arc length of cosh between *a* and *b*, and the later integral gives the area under the graph of cosh between *a* and *b*.

By the way, the most famous catenary may be the Gateway Arch in St. Louis, Missouri.

]]>Here’s an example from [1]. Suppose you want to find the extreme values of *x*³ + 2*xyz* – *x*² on the unit sphere using Lagrange multipliers. This leads to the following system of polynomial equations where λ is the Lagrange multiplier.

There’s no obvious way to go about solving this system of equations. However, there is a systematic way to approach this problem using a “lexicographic Gröbner basis.” This transforms the problem from into something that **looks much worse** but that is actually easier to work with. And most importantly, the transformation is algorithmic. It requires some computation—there are numerous software packages for doing this—but doesn’t require a flash of insight.

The transformed system looks intimidating compared to the original:

We’ve gone from four equations to eight, from small integer coefficients to large fraction coefficients, from squares to seventh powers. And yet we’ve made progress because the four variables are **less entangled** in the new system.

The last equation involves only *z* and factors nicely:

This cracks the problem wide open. We can easily find all the possible values of *z*, and once we substitute values for *z*, the rest of the equations simplify greatly and can be solved easily.

The key is that Gröbner bases transform our problem into a form that, although it appears more difficult, is easier to work with because the variables are somewhat separated. Solving one variable, *z*, is like pulling out a thread that then makes the rest of the threads easier to separate.

**Related**: The great reformulation of algebraic geometry

* * *

[1] David Cox et al. Applications of Computational Algebraic Geometry: American Mathematical Society Short Course January 6-7, 1997 San Diego, California (Proceedings of Symposia in Applied Mathematics)

]]>White noise has equal power at all frequencies, just as white light is a combination of all the frequencies of the visible spectrum. The components of red noise are weighted toward low frequencies, just as red light is at the low end of the visible spectrum. Pink noise is weighted toward low frequencies too, but not as strongly as read. Specifically, the power in red noise drops off like 1/*f*² where *f* is frequency. The power in pink noise drops off like 1/*f*.

Generating pink noise is more complicated than you might think. The book Creating Noise, by Stefan Hollos and J. Richard Hollos, has a good explanation and C source code for generating pink noise and variations such as 1/*f *^{α} noise for 0 < α < 1. If you want even more background, check out Recursive Digital Filters by the same authors.

If you’d like to hear what pink noise sounds like, here’s a sample that was created using the software in the book with a 6th order filter.

(Download)

**Related posts**:

- Ooh, that sounds like fun!
- Run away!

I’ve been on several projects where the sponsors have identified some aspect of the status quo that clearly needs improving. Working on a project that realizes these problems and is willing to address them sounds like fun. But then the project runs to the opposite extreme and creates something worse.

For example, most software development is sloppy. So a project reacts against this and becomes so formal that nobody can get any work done. In those settings I like to say “Hold off on buying a tux. Just start by tucking your shirt tail in. Maybe that’s as formal as you need to be.”

I’d be more optimistic about a project with more modest goals, say one that wants to move 50% of the way toward an ideal, rather than wanting to jump from one end of a spectrum to the other. Or even better, a project that has identified a direction they want to move in, and thinks in terms of experimenting to find their optimal position along that direction.

]]>Start with a positive integer *n*. Compute 3*n* + 1 and divide by 2 repeatedly until you get an odd number. Then repeat the process. For example, suppose we start with 13. We get 3*13+1 = 40, and 40/8 = 5, so 5 is the next term in the sequence. 5*3 + 1 is 16, which is a power of 2, so we get down to 1.

Does this process always reach 1? So far nobody has found a proof or a counterexample.

If you pick a large starting number *n* at random, it appears that not only will the sequence terminate, the values produced by the sequence approximately follow Benford’s law (source). If you’re unfamiliar with Benford’s law, please see the first post in this series.

Here’s some Python code to play with this.

from math import log10, floor def leading_digit(x): y = log10(x) % 1 return int(floor(10**y)) # 3n+1 iteration def iterates(seed): s = set() n = seed while n > 1: n = 3*n + 1 while n % 2 == 0: n = n / 2 s.add(n) return s

Let’s save the iterates starting with a large starting value:

it = iterates(378357768968665902923668054558637)

Here’s what we get and what we would expect from Benford’s law:

|---------------+----------+-----------| | Leading digit | Observed | Predicted | |---------------+----------+-----------| | 1 | 46 | 53 | | 2 | 26 | 31 | | 3 | 29 | 22 | | 4 | 16 | 17 | | 5 | 24 | 14 | | 6 | 8 | 12 | | 7 | 12 | 10 | | 8 | 9 | 9 | | 9 | 7 | 8 | |---------------+----------+-----------|

We get a chi-square of 12.88 (*p* = 0.116) and so we get a reasonable fit.

Here’s another run with a different starting point.

it = iterates(243963882982396137355964322146256)

which produces

|---------------+----------+-----------| | Leading digit | Observed | Predicted | |---------------+----------+-----------| | 1 | 44 | 41 | | 2 | 22 | 24 | | 3 | 15 | 17 | | 4 | 12 | 13 | | 5 | 11 | 11 | | 6 | 9 | 9 | | 7 | 11 | 8 | | 8 | 6 | 7 | | 9 | 7 | 6 | |---------------+----------+-----------|

This has a chi-square value of 2.166 (*p* = 0.975) which is an even better fit.

Samples from a Cauchy distribution nearly follow Benford’s law. I’ll demonstrate this below. The more data you see, the more confident you should be of this. But with a typical statistical approach, crudely applied NHST (null hypothesis significance testing), the more data you see, the less convinced you are.

This post assumes you’ve read the previous post that explains what Benford’s law is and looks at how well samples from a Weibull distribution follow that law.

This post has two purposes. First, we show that samples from a Cauchy distribution approximately follow Benford’s law. Second, we look at problems with testing goodness of fit with NHST.

We can reuse the code from the previous post to test Cauchy samples, with one modification. Cauchy samples can be negative, so we have to modify our `leading_digit`

function to take an absolute value.

def leading_digit(x): y = log10(abs(x)) % 1 return int(floor(10**y))

We’ll also need to import `cauchy`

from `scipy.stats`

and change where we draw samples to use this distribution.

samples = cauchy.rvs(0, 1, N)

Here’s how a sample of 1000 Cauchy values compared to the prediction of Benford’s law:

|---------------+----------+-----------| | Leading digit | Observed | Predicted | |---------------+----------+-----------| | 1 | 313 | 301 | | 2 | 163 | 176 | | 3 | 119 | 125 | | 4 | 90 | 97 | | 5 | 69 | 79 | | 6 | 74 | 67 | | 7 | 63 | 58 | | 8 | 52 | 51 | | 9 | 57 | 46 | |---------------+----------+-----------|

Here’s a bar graph of the same data.

A common way to measure goodness of fit is to use a chi-square test. The null hypothesis would be that the data follow a Benford distribution. We look at the chi-square statistic for the observed data, based on a chi-square distribution with 8 degrees of freedom (one less than the number of categories, which is 9 because of the nine digits). We compute the *p*-value, the probability of seeing a chi-square statistic this larger or larger, and reject our null hypothesis if this *p*-value is too small.

Here’s how our chi-square values and *p*-values vary with sample size.

|-------------+------------+---------| | Sample size | chi-square | p-value | |-------------+------------+---------| | 64 | 13.542 | 0.0945 | | 128 | 10.438 | 0.2356 | | 256 | 13.002 | 0.1118 | | 512 | 8.213 | 0.4129 | | 1024 | 10.434 | 0.2358 | | 2048 | 6.652 | 0.5745 | | 4096 | 15.966 | 0.0429 | | 8192 | 20.181 | 0.0097 | | 16384 | 31.855 | 9.9e-05 | | 32768 | 45.336 | 3.2e-07 | |-------------+------------+---------|

The *p*-values eventually get very small, but they don’t decrease monotonically with sample size. This is to be expected. If the data came from a Benford distribution, i.e. if the null hypothesis were true, we’d expect the *p*-values to be uniformly distributed, i.e. they’d be equally likely to take on any value between 0 and 1. And not until the two largest samples do we see values that don’t look consistent with uniform samples from [0, 1].

In one sense NHST has done its job. Cauchy samples *do not* exactly follow Benford’s law, and with enough data we can show this. But we’re rejecting a null hypothesis that isn’t that interesting. We’re showing that the data don’t exactly follow Benford’s law rather than showing that they *do* approximately follow Benford’s law.

In 1881, Simon Newcomb noticed that the edges of the first pages in a book of logarithms were dirty while the edges of the later pages were clean. From this he concluded that people were far more likely to look up the logarithms of numbers with leading digit 1 than of those with leading digit 9. Frank Benford studied the same phenomena later and now the phenomena is known as Benford’s law, or sometime the Newcomb-Benford law.

A data set follows Benford’s law if the proportion of elements with leading digit *d* is approximately

log_{10}((*d* + 1*)/ d*).

You could replace “10” with *b* if you look at the leading digits in base *b*.

Sets of physical constants often satisfy Benford’s law, as I showed here for the constants defined in SciPy.

Incidentally, factorials satisfy Benford’s law exactly in the limit.

The Weibull distribution is a generalization of the exponential distribution. It’s a convenient distribution for survival analysis because it can have decreasing, constant, or increasing hazard, depending on whether the value of a shape parameter γ is less than, equal to, or greater than 1 respectively. The special case of constant hazard, shape γ = 1, corresponds to the exponential distribution.

If the shape parameter of a Weibull distributions is “not too large” then samples from that distribution approximately follow Benford’s law (source). We’ll explore this statement with a little Python code.

SciPy doesn’t contain a Weibull distribution per se, but it does have support for a generalization of the Weibull known as the exponential Weibull. The latter has two shape parameters. We set the first of these to 1 to get the ordinary Weibull distribution.

from math import log10, floor from scipy.stats import exponweib def leading_digit(x): y = log10(x) % 1 return int(floor(10**y)) def weibull_stats(gamma): distribution = exponweib(1, gamma) N = 10000 samples = distribution.rvs(N) counts = [0]*10 for s in samples: counts[ leading_digit(s) ] += 1 print (counts)

Here’s how the leading digit distribution of a simulation of 10,000 samples from an exponential (Weibull with γ = 1) compares to the distribution predicted by Benford’s law.

|---------------+----------+-----------| | Leading digit | Observed | Predicted | |---------------+----------+-----------| | 1 | 3286 | 3010 | | 2 | 1792 | 1761 | | 3 | 1158 | 1249 | | 4 | 851 | 969 | | 5 | 754 | 792 | | 6 | 624 | 669 | | 7 | 534 | 580 | | 8 | 508 | 511 | | 9 | 493 | 458 | |---------------+----------+-----------|

Looks like a fairly good fit. How could we quantify the fit so we can compare how the fit varies with the shape parameter? The most common approach is to use the chi-square goodness of fit test.

def chisq_stat(O, E): return sum( [(o - e)**2/e for (o, e) in zip(O, E)] )

Here “O” stands for “observed” and “E” stands for “expected.” The observed counts are the counts we actually saw. The expected values are the values Benford’s law would predict:

expected = [N*log10((i+1)/i) for i in range(1, 10)]

Note that we don’t want to pass `counts`

to `chisq_stat`

but `counts[1:]`

instead. This is because `counts`

starts with 0 index, but leading digits can’t be 0 for positive samples.

Here are the chi square goodness of fit statistics for a few values of γ. (Smaller is better.)

|-------+------------| | Shape | Chi-square | |-------+------------| | 0.1 | 1.415 | | 0.5 | 9.078 | | 1.0 | 69.776 | | 1.5 | 769.216 | | 2.0 | 1873.242 | |-------+------------|

This suggests that samples from a Weibull follow Benford’s law fairly well for shape γ < 1, i.e. for the case of decreasing hazard.

The golden ratio φ is (1 + √5)/2. A golden rectangle is one in which the ratio of the longer side to the shorter side is φ. Credit cards, for example, are typically golden rectangles.

You might guess that a golden angle is 1/φ of a circle, but it’s actually 1/φ^{2} of a circle. Let *a* be the length of an arc cut out of a circle by a golden angle and *b* be the length of its complement. Then by definition the ratio of *b* to *a* is φ. In other words, the golden angle is defined in terms of the ratio of its complementary arc, not of the entire circle. [1]

The video below has many references to the golden angle. It says that the golden angle is 137.5 degrees, which is fine given the context of a popular video. But this doesn’t explain where the angle comes from or give its exact value of 360/φ^{2} degrees.

[1] Why does this work out to 1/φ^{2}? The ratio *b*/*a* equals φ, by definition. So the ratio of *a* to the whole circle is

*a*/(*a* + *b*) = *a*/(*a* + φ*a*) = 1/(1 + φ) = 1/φ^{2}

since φ satisfies the quadratic equation 1 + φ = φ^{2}.

There’s one thing advocates of all the aforementioned systems agree on: the number of basic personality types is a perfect square.

]]>Here’s the airport:

And here’s the book cover:

I’ve written about the image on book cover before. Someone asked me what function it graphed and I decided it was probably the Weierstrass ℘ function.

For more on Weierstrass’ elliptic function and why I think that’s what’s on the cover of A&S, see this post.

Photo of Denver airport via Wikipedia.

]]>Sometimes we’re disappointed with a simple solution because, although we don’t realize it yet, we didn’t properly frame the problem it solves.

I’ve been in numerous conversations where someone says effectively, “I understand that 2+3 = 5, but what if we made it 5.1?” They really want an answer of 5.1, or maybe larger, for reasons they can’t articulate. They formulated a problem whose solution is to add 2 and 3, but that formulation left out something they care about. In this situation, the easy response to say is “No, 2+3 = 5. There’s nothing we can do about that.” The more difficult response is to find out why “5” is an unsatisfactory result.

Sometimes we’re uncomfortable with a simple solution even though it does solve the right problem.

If you work hard and come up with a simple solution, it may look like you didn’t put in much effort. And if someone else comes up with the simple solution, you may look foolish.

Sometimes simplicity is disturbing. Maybe it has implications we have to get used to.

**Update**: A couple people have replied via Twitter saying that we resist simplicity because it’s boring. I think beneath that is that we’re not ready to move on to a new problem.

When you’re invested in a problem, it can be hard to see it solved. If the solution is complicated, you can keep working for a simpler solution. But once someone finds a really simple solution, it’s hard to justify continuing work in that direction.

A simple solution is not something to dwell on but to build on. We want some things to be boringly simple so we can do exciting things with them. But it’s hard to shift from producer to consumer: Now that I’ve produced this simple solution, and still a little sad that it’s wrapped up, how can I use it to solve something else?

**Related posts**:

The holes are all rectangular, so it’s surprising that the geometry is so varied when you slice open a Menger sponge. For example, when you cut it on the diagonal, you can see stars! (I wrote about this here.)

I mentioned this blog post to a friend at Go 3D Now, a company that does 3D scanning and animation, and he created the video below. The video starts out by taking you through the sponge, then at about the half way part the sponge splits apart.

]]>Harmonic numbers are sort of a discrete analog of logarithms since

As *n* goes to infinity, the difference between *H _{n}* and log

How would you compute *H _{n}*? For small

Since in the limit *H _{n}* – log

But we could do much better by adding a couple terms to the approximation above. [2] That is,

The error in the approximation above is between 0 and 1/120*n*^{4}.

So if you used this to compute the 1000th harmonic number, the error would be less than one part in 120,000,000,000,000. Said another way, for *n* = 1000 the approximation differs from the exact value in the 15th significant digit, approximately the resolution of floating point numbers (i.e. IEEE 754 double precision).

And the formula is even more accurate for larger *n*. If we wanted to compute the millionth harmonic number, the error in our approximation would be somewhere around the 26th decimal place.

* * *

[1] See Julian Havil’s excellent Gamma: Exploring Euler’s Constant. It’s popular-level book, but more sophisticated than most such books.

[2] There’s a sequence of increasingly accurate approximations that keep adding reciprocals of even powers of *n, *based on truncating an asymptotic series. See Concrete Mathematics for details.

- Math diagrams
- Numerical computing
- Probability and approximations
- Differential equations
- Python
- Regular expressions
- C++
- Special functions
- Typesetting: TeX, HTML, Unicode
- Emacs
- R
- Miscellaneous math

You can find an index of all these notes here.

Some of the most popular notes:

- Diagram of probability distribution relationships
- Stand-alone code for numerical computing
- R for programmers

And here is some more relatively hidden content:

- How to subscribe by RSS or email
- Online calculators
- Journal articles and presentations

The study of the planet Mercury provides two examples of the bandwagon effect. In her new book Worlds Fantastic, Worlds Familiar, planetary astronomer Bonnie Buratti writes

The study of Mercury … illustrates one of the most confounding bugaboos of the scientific method: the bandwagon effect. Scientists are only human, and they impose their own prejudices and foregone conclusions on their experiments.

Around 1800, Johann Schroeter determined that Mercury had a rotational period of 24 hours. This view held for eight decades.

In the 1880’s, Giovanni Schiaparelli determined that Mercury was tidally locked, making one rotation on its axis for every orbits around the sun. This view also held for eight decades.

In 1965, radar measurements of Mercury showed that Mercury completes 3 rotations in every 2 orbits around the sun.

Studying Mercury is difficult since it is only visible near the horizon and around sunrise and sunset, i.e. when the sun’s light interferes. And it is understandable that someone would confuse a 3:2 resonance with tidal locking. Still, for two periods of eight decades each, astronomers looked at Mercury and concluded what they expected.

The difficulty of seeing Mercury objectively was compounded by two incorrect but satisfying metaphors. First that Mercury was like Earth, rotating every 24 hours, then that Mercury was like the moon, orbiting the sun the same way the moon orbits Earth.

Buratti mentions the famous Millikan oil drop experiment as another example of the bandwagon effect.

… Millikan’s value for the electron’s charge was slightly in error—he had used a wrong value for the viscosity of air. But future experimenters all seemed to get Millikan’s number. Having done the experiment myself I can see that they just picked those values that agreed with previous results.

Buratti explains that Millikan’s experiment is hard to do and “it is impossible to successfully do it without abandoning most data.” This is what I like to call acceptance-rejection modeling.

Acceptance-rejection modeling: Throw out data that don’t fit with your model, and what’s left will.

— Data Science Fact (@DataSciFact) July 2, 2015

The name comes from the acceptance-rejection method of random number generation. For example, the obvious way to generate truncated normal random values is to generate (unrestricted) normal random values and simply throw out the ones that lie outside the interval we’d like to truncate to. This is inefficient if we’re truncating to a small interval, but it always works. We’re conforming our samples to a pre-determined distribution, which is OK when we do it intentionally. The problem comes when we do it unintentionally.

Photo of Mercury above via NASA

]]>The previous post said that for almost all *x* > 1, the fractional parts of the powers of *x* are uniformly distributed. Although this is true for almost all *x*, it can be hard to establish for any particular *x*. The previous post ended with the question of whether the fractional parts of the powers of 3/2 are uniformly distributed.

First, lets just plot the sequence (3/2)^{n} mod 1.

Looks kinda random. But is it uniformly distributed? One way to tell would be to look at the empirical cumulative distribution function (ECDF) and see how it compares to a uniform cumulative distribution function. This is what a quantile-quantile plot does. In our case we’re looking to see whether something has a uniform distribution, but you could use a q-q plot for any distribution. It may be most often used to test normality by looking at whether the ECDF looks like a normal CDF.

If a sequence is uniformly distributed, we would expect 10% of the values to be less than 0.1. We would expect 20% of the values to be less than 0.2. Etc. In other words, we’d expect the *quantiles* to line up with their theoretical values, hence the name “quantile-quantile” plot. On the horizontal axis we plot uniform values between 0 and 1. On the vertical axis we plot the sorted values of (3/2)^{n} mod 1.

A qq-plot indicates a good fit when values line up near the diagonal, as they do here.

For contrast, let’s look at a qq-plot for the powers of the plastic constant mod 1.

Here we get something very far from the diagonal line. The plot is flat on the left because many of the values are near 0, and it’s flat on the right because many values are near 1.

Incidentally, the Kolmogorov-Smirnov goodness of fit test is basically an attempt to quantify the impression you get from looking at a q-q plot. It’s based on a statistic that measures how far apart the empirical CDF and theoretical CDF are.

]]>First a theorem:

For almost all *x* > 1, the sequence (*x*^{n}) for *n* = 1, 2, 3, … is u.d. mod 1. [1]

Here “almost all” is a technical term meaning that the set of *x*‘s for which the statement above does not hold has Lebesgue measure zero. The abbreviation “u.d.” stands for “uniformly distributed.” A sequence uniformly distributed mod 1 if the fractional parts of the sequence are distributed like uniform random variables.

Even though the statement holds for almost all *x*, it’s hard to prove for particular values of *x*. And it’s easy to find particular values of *x* for which the theorem does not hold.

From [1]:

… it is interesting to note that one does not know whether sequences such as (

e^{n}), (π^{n}), or even ((3/2)^{n}) are u.d. mod 1 or not.

Obviously powers of integers are not u.d. mod 1 because their fractional parts are all 0. And we’ve shown before that powers of the golden ratio and powers of the plastic constant are near integers, i.e. their fractional parts cluster near 0 and 1.

The curious part about the quote above is that it’s not clear whether powers of 3/2 are uniformly distributed mod 1. I wouldn’t expect powers of any rational number to be u.d. mod 1. Either my intuition was wrong, or it’s right but hasn’t been proved, at least not when [1] was written.

The next post will look at powers of 3/2 mod 1 and whether they appear to be uniformly distributed.

* * *

[1] Kuipers and Niederreiter, Uniform Distribution of Sequences

]]>One of the case studies in Michael Beirut’s book How to is the graphic design for the planned community Celebration, Florida. The logo for the town’s golf course is an illustration of the bike shed principle.

C. Northcote Parkinson observed that it is easier for a committee to approve a nuclear power plant than a bicycle shed. Nuclear power plants are complex, and no one on a committee presumes to understand every detail. Committee members must rely on the judgment of others. But everyone understands bicycle sheds. Also, questions such as what color to paint the bike shed don’t have objective answers. And so bike sheds provoke long discussions.

People argue about bike sheds because they understand bike sheds. Beirut said something similar about the Celebration Golf Club logo which features a silhouette of a golfer.

Designing the graphics for Celebration’s public golf club was much harder than designing the town seal. It took me some time to realize why: none of our clients were Schwinn-riding, polytailed girls [as in the town seal], but most of them were enthusiastic golfers. The silhouette on the golf club design was refined endlessly as various executives demonstrated their swings in client meetings.

Image credit: By Source, Fair use, https://en.wikipedia.org/w/index.php?curid=37643922

]]>The so-called plastic constant *P* is another Pisot number, in fact the smallest Pisot number. *P* is the real root of *x*^{3} – *x* – 1 = 0.

Because *P* is a Pisot number, we know that its powers will be close to integers, just like powers of the golden ratio, but the *way* they approach integers is more interesting. The convergence is slower and less regular.

We will the first few powers of *P*, first looking at the distance to the nearest integer on a linear scale, then looking at the absolute value of the distance on a logarithmic scale.

As a reminder, here’s what the corresponding plots looked like for the golden ratio.

]]>Here’s a diagram that shows the basic kinds of rings and the relations between them. (I’m only looking at commutative rings, and I assume ever ring has a multiplicative identity.)

The solid lines are unconditional implications. The dashed line is a conditional implication.

- Every field is a Euclidean domain.
- Every Euclidean domain is a principal ideal domain (PID).
- Every principal ideal domain is a unique factorization domain (UFD).
- Every unique factorization domain is an integral domain.
- A
**finite**integral domain is a field.

Incidentally, the diagram has a sort of embedded pun: the implications form a circle, i.e. a ring.

More mathematical diagrams:

]]>In his paper Mindless statistics, Gerd Gigerenzer uses a Freudian analogy to describe the mental conflict researchers experience over statistical hypothesis testing. He says that the “statistical ritual” of NHST (null hypothesis significance testing) “is a form of conflict resolution, like compulsive hand washing.”

In Gigerenzer’s analogy, the **id** represents Bayesian analysis. Deep down, a researcher wants to know the probabilities of hypotheses being true. This is something that Bayesian statistics makes possible, but more conventional frequentist statistics does not.

The **ego** represents R. A. Fisher’s significance testing: specify a null hypothesis only, not an alternative, and report a *p*-value. Significance is calculated after collecting the data. This makes it easy to publish papers. The researcher never clearly states his hypothesis, and yet takes credit for having established it after rejecting the null. This leads to feelings of guilt and shame.

The **superego** represents the Neyman-Pearson version of hypothesis testing: pre-specified alternative hypotheses, power and sample size calculations, etc. Neyman and Pearson insist that hypothesis testing is about what to *do*, not what to *believe*. [1]

* * *

I assume Gigerenzer doesn’t take this analogy too seriously. In context, it’s a humorous interlude in his polemic against rote statistical ritual.

But there really is a conflict in hypothesis testing. Researchers naturally think in Bayesian terms, and interpret frequentist results as if they were Bayesian. They really do want probabilities associated with hypotheses, and will imagine they have them even though frequentist theory explicitly forbids this. The rest of the analogy, comparing the ego and superego to Fisher and Neyman-Pearson respectively, seems weaker to me. But I suppose you could imagine Neyman and Pearson playing the role of your conscience, making you feel guilty about the pragmatic but unprincipled use of *p*-values.

* * *

[1] “No test based upon a theory of probability can by itself provide any valuable evidence of the truth or falsehood of a hypothesis. But we may look at the purpose of tests from another viewpoint. Without hoping to know whether each separate hypothesis is true or false, we may search for rules to govern behaviour in regard to them, in following which we insure that, in the long run of experience, we shall not often be wrong.”

Neyman J, Pearson E. On the problem of the most efficient tests of statistical hypotheses. *Philos Trans Roy Soc A*, 1933;231:289, 337.

This morning I was reading Terry Tao’s overview of the work of Yves Meyer and ran across this line:

The powers φ, φ

^{2}, φ^{3}, … of the golden ratio lie unexpectedly close to integers: for instance, φ^{11}= 199.005… is unusually close to 199.

I’d never heard that before, so I wrote a little code to see just how close golden powers are to integers.

Here’s a plot of the difference between φ^{n} and the nearest integer:

(Note that if you want to try this yourself, you need extended precision. Otherwise you’ll get strange numerical artifacts once φ^{n} is too large to represent exactly.)

By contrast, if we make the analogous plot replacing φ with π we see that the distance to the nearest integer looks like a uniform random variable:

The distance from powers of φ to the nearest integer decreases so fast that cannot see it in the graph for moderate sized *n*, which suggests we plot the difference on the log scale. (In fact we plot the log of the *absolute value* of the difference since the difference could be negative and the log undefined.) Here’s what we get:

After an initial rise, the curve is apparently a straight line on a log scale, i.e. the absolute distance to the nearest integer decreases almost exactly exponentially.

**Related posts**:

In a recent interview, Tyler Cowen discusses complacency, (neruo-)diversity, etc.

Let me give you a time machine and send you back to Vincent van Gogh, and you have some antidepressants to make him better. What actually would you do, should you do, could you do? We really don’t know. Maybe he would have had a much longer life and produced more wonderful paintings. But I worry about the answer to that question.

And I think in general, for all the talk about diversity, we’re grossly undervaluing actual human diversity and actual diversity of opinion. Ways in which people—they can be racial or ethnic but they don’t have to be at all—ways in which people are actually diverse, and obliterating them somewhat. This is my Toquevillian worry and I think we’ve engaged in the massive social experiment of a lot more anti-depressants and I think we don’t know what the consequences are. I’m not saying people shouldn’t do it. I’m not trying to offer any kind of advice or lecture.

I don’t share Cowen’s concern regarding antidepressants. I haven’t thought about it before. But I am concerned with how much we drug restless boys into submission. (Girls too, of course, but it’s usually boys.)

]]>