The so-called plastic constant *P* is another Pisot number, in fact the smallest Pisot number. *P* is the real root of *x*^{3} – *x* – 1 = 0.

Because *P* is a Pisot number, we know that its powers will be close to integers, just like powers of the golden ratio, but the *way* they approach integers is more interesting. The convergence is slower and less regular.

We will the first few powers of *P*, first looking at the distance to the nearest integer on a linear scale, then looking at the absolute value of the distance on a logarithmic scale.

As a reminder, here’s what the corresponding plots looked like for the golden ratio.

]]>Here’s a diagram that shows the basic kinds of rings and the relations between them. (I’m only looking at commutative rings, and I assume ever ring has a multiplicative identity.)

The solid lines are unconditional implications. The dashed line is a conditional implication.

- Every field is a Euclidean domain.
- Every Euclidean domain is a principal ideal domain (PID).
- Every principal ideal domain is a unique factorization domain (UFD).
- Every unique factorization domain is an integral domain.
- A
**finite**integral domain is a field.

Incidentally, the diagram has a sort of embedded pun: the implications form a circle, i.e. a ring.

More mathematical diagrams:

]]>In his paper Mindless statistics, Gerd Gigerenzer uses a Freudian analogy to describe the mental conflict researchers experience over statistical hypothesis testing. He says that the “statistical ritual” of NHST (null hypothesis significance testing) “is a form of conflict resolution, like compulsive hand washing.”

In Gigerenzer’s analogy, the **id** represents Bayesian analysis. Deep down, a researcher wants to know the probabilities of hypotheses being true. This is something that Bayesian statistics makes possible, but more conventional frequentist statistics does not.

The **ego** represents R. A. Fisher’s significance testing: specify a null hypothesis only, not an alternative, and report a *p*-value. Significance is calculated after collecting the data. This makes it easy to publish papers. The researcher never clearly states his hypothesis, and yet takes credit for having established it after rejecting the null. This leads feelings of guilt and shame.

The **superego** represents the Neyman-Pearson version of hypothesis testing: pre-specified alternative hypotheses, power and sample size calculations, etc. Neyman and Pearson insist that hypothesis testing is about what to do, not what to believe.

* * *

I assume Gigerenzer doesn’t take this analogy too seriously. In context, it’s a humorous interlude in his polemic against rote statistical ritual.

But there really is a conflict in hypothesis testing. Researchers naturally think in Bayesian terms, and interpret frequentist results as if they were Bayesian. They really do want probabilities associated with hypotheses, and will imagine they have them even though frequentist theory explicitly forbids this. The rest of the analogy, comparing the ego and superego to Fisher and Neyman-Pearson respectively, seems weaker to me. But I suppose you could imagine Neyman and Pearson playing the role of your conscience, making you feel guilty about the pragmatic but unprincipled use of *p*-values.

The powers φ, φ

^{2}, φ^{3}, … of the golden ratio lie unexpectedly close to integers: for instance, φ^{11}= 199.005… is unusually close to 199.

I’d never heard that before, so I wrote a little code to see just how close golden powers are to integers.

Here’s a plot of the difference between φ^{n} and the nearest integer:

(Note that if you want to try this yourself, you need extended precision. Otherwise you’ll get strange numerical artifacts once φ^{n} is too large to represent exactly.)

By contrast, if we make the analogous plot replacing φ with π we see that the distance to the nearest integer looks like a uniform random variable:

The distance from powers of φ to the nearest integer decreases so fast that cannot see it in the graph for moderate sized *n*, which suggests we plot the difference on the log scale. (In fact we plot the log of the *absolute value* of the difference since the difference could be negative and the log undefined.) Here’s what we get:

After an initial rise, the curve is apparently a straight line on a log scale, i.e. the absolute distance to the nearest integer decreases almost exactly exponentially.

**Related posts**:

In a recent interview, Tyler Cowen discusses complacency, (neruo-)diversity, etc.

Let me give you a time machine and send you back to Vincent van Gogh, and you have some antidepressants to make him better. What actually would you do, should you do, could you do? We really don’t know. Maybe he would have had a much longer life and produced more wonderful paintings. But I worry about the answer to that question.

And I think in general, for all the talk about diversity, we’re grossly undervaluing actual human diversity and actual diversity of opinion. Ways in which people—they can be racial or ethnic but they don’t have to be at all—ways in which people are actually diverse, and obliterating them somewhat. This is my Toquevillian worry and I think we’ve engaged in the massive social experiment of a lot more anti-depressants and I think we don’t know what the consequences are. I’m not saying people shouldn’t do it. I’m not trying to offer any kind of advice or lecture.

I don’t share Cowen’s concern regarding antidepressants. I haven’t thought about it before. But I am concerned with how much we drug restless boys into submission. (Girls too, of course, but it’s usually boys.)

]]>Two finite-dimensional vector spaces over the same field are isomorphic if and only if they have the same dimension.

For a finite dimensional space *V*, its dual space *V** is defined to be the vector space of linear functionals on *V*, that is, the set of linear functions from *V* to the underlying field. The space *V** has the same dimension as *V*, and so the two spaces are isomorphic. You can do the same thing again, taking the dual of the dual, to get *V***. This also has the same dimension, and so *V* is isomorphic to *V*** as well as *V**. However, *V* is **naturally** isomorphic to *V*** but not to *V**. That is, the transformation from *V* to *V** is **not** natural.

Some things in linear algebra are easier to see in infinite dimensions, i.e. in Banach spaces. Distinctions that seem pedantic in finite dimensions clearly matter in infinite dimensions.

The category of Banach spaces considers linear spaces and **continuous** linear transformations between them. In a finite dimensional Euclidean space, all linear transformations are continuous, but in infinite dimensions a linear transformation is not necessarily continuous.

The dual of a Banach space *V* is the space of **continuous** linear functions on *V*. Now we can see examples of where not only is *V** not naturally isomorphic to *V*, it’s not isomorphic at all.

For any real *p* > 1, let *q* be the number such that 1/*p* + 1/*q* = 1. The Banach space *L*^{p} is defined to be the set of (equivalence classes of) Lebesgue integrable functions *f* such that the integral of |*f*|^{p} is finite. The dual space of *L*^{p} is *L*^{q}. If *p* does not equal 2, then these two spaces are different. (If *p* does equal 2, then so does *q*; *L*^{2} is a Hilbert space and its dual is indeed the same space.)

In the finite dimensional case, a vector space *V* is isomorphic to its second dual *V***. In general, *V* can be embedded into *V***, but *V*** might be a larger space. The embedding of *V* in *V*** is natural, both in the intuitive sense and in the formal sense of natural transformations, discussed in the previous post. We can turn an element of *V* into a linear functional on linear functions on *V* as follows.

Let *v* be an element of *V* and let *f* be an element of *V**. The action of *v* on *f* is simply *fv*. That is, *v* acts on linear functions by letting them act on it!

This shows that *some* elements of *V*** come from evaluation at elements of *V*, but there could be more. Returning to the example of Lebesgue spaces above, the dual of *L*^{1} is *L*^{∞}, the space of essentially bounded functions. But the dual of *L*^{∞} is larger than *L*^{1}. That is, one way to construct a continuous linear functional on bounded functions is to multiply them by an absolutely integrable function and integrate. But there are other ways to construct linear functionals on *L*^{∞}.

A Banach space *V* is reflexive if the **natural** embedding of *V* in *V*** is an isomorphism. For *p* > 1, the spaces *L*^{p} are reflexive.

However, R. C. James proved the surprising result that there are Banach spaces that are isomorphic to their second duals, **but not naturally**. That is, there are spaces *V* where *V* is isomorphic to *V***, but not via the natural embedding; the natural embedding of *V* into *V*** is **not** an isomorphism.

**Related**: Applied functional analysis

A category is a collection of objects and arrows between objects. Usually these “arrows” are functions, but in general they don’t have to be.

A functor maps a category to another category. Since a category consists of objects and arrows, a functor maps objects to objects and arrows to arrows.

A natural transformation maps functors to functors. Sounds reasonable, but what does that mean?

You can think of a functor as a way to create a picture of one category inside another. Suppose you have some category and pick out two objects in that category, *A* and *B*, and suppose there is an arrow *f* between *A* and *B*. Then a functor *F* would take *A* and *B* and give you objects *FA* and *FB* in another category, and an arrow *Ff* between *FA* and *FB*. You could do the same with another functor *G*. So the objects *A* and *B* and the arrow between them in the first category have counterparts under the functors *F* and *G* in the new category as in the two diagrams below.

A natural transformation α between *F* and *G* is something that connects these two diagrams into one diagram that commutes.

The natural transformation α is a collection of arrows in the new category, one for every object in the original category. So we have an arrow α_{A} for the object *A* and another arrow α_{B} for the object *B*. These arrows are called the *components* of α at *A* and *B* respectively.

Note that the components of α depend on the objects *A* and *B* but not on the arrow *f*. If *f* represents any other arrow from *A* to *B* in the original category, the same arrows α_{A} and α_{B} fill in the diagram.

Natural transformations are meant to capture the idea that a transformation is “natural” in the sense of not depending on any arbitrary choices. If a transformation does depend on arbitrary choices, the arrows α_{A} and α_{B} would not be reusable but would have to change when *f* changes.

The next post will discuss the canonical examples of natural and unnatural transformations.

**Related**: Applied category theory

Eight short, accessible blog posts based on the work of the intriguing mathematician Srinivasa Ramanujan:

]]>Larry Wall, creator of the Perl programming language, created a custom degree plan in college, an interdisciplinary course of study in natural and artificial languages, i.e. linguistics and programming languages. Many of the features of Perl were designed as an attempt to apply natural language principles to the design of an artificial language.

I’ve been thinking of a different connection between natural and artificial languages, namely using natural language processing (NLP) to reverse engineer source code.

The source code of computer program is text, but not a text. That is, it consists of plain text files, but it’s not a text in the sense that Paradise Lost or an email is a text. The most efficient way to parse a programming language is as a programming language. Treating it as an English text will loose vital structure, and wrongly try to impose a foreign structure.

But what if you have *two* computer programs? That’s the problem I’ve been thinking about. I have code in two very different programming languages, and I’d like to know how functions in one code base relate to those in the other. The connections are not ones that a compiler could find. The connections are more psychological than algorithmic. I’d like to reverse engineer, for example, which function in language *A* a developer had in mind when he wrote a function in language *B*.

Both code bases are in programming language, but the function names are approximately natural language. If a pair of functions have the same name in both languages, and that name is not generic, then there’s a good chance they’re related. And if the names are similar, maybe they’re related.

I’ve done this sort of thing informally forever. I imagine most programmers do something like this from time to time. But only recently have I needed to do this on such a large scale that proceeding informally was not an option. I wrote a script to automate some of the work by looking for fuzzy matches between function names in both languages. This was far from perfect, but it reduced the amount of sleuthing necessary to line up the two sets of source code.

Around a year ago I had to infer which parts of an old Fortran program corresponded to different functions in a Python program. I also had to infer how some poorly written articles mapped to either set of source code. I did all this informally, but I wonder now whether NLP might have sped up my detective work.

Another situation where natural language processing could be helpful in software engineering is determining code authorship. Again this is something most programmers have probably done informally, saying things like “I bet Bill wrote this part of the code because it looks like his style” or “Looks like Pat left her fingerprints here.” This could be formalized using NLP techniques, and I imagine it has been. Just as Frederick Mosteller and colleagues did a statistical analysis of The Federalist Papers to determine who wrote which paper, I’m sure there have been similar analyses to try to find out who wrote what code, say for legal reasons.

Maybe this already has a name, but I like “unnatural language processing” for the application of natural language processing to unnatural (i.e. programming) languages. I’ve done a lot of ad hoc unnatural language processing, and I’m curious how much of it I could automate in the future.

]]>I wondered what a collage of these images would look like, so I used the ImageQuilts software by Edward Tufte and Adam Schwartz to create the image below.

**Related**: Applied complex analysis

Here’s a condensed view of the graph. You can find the full image here.

The graph is so dense that it’s hard to tell which areas have the most or least connections. Here are some tables to clarify that. First, counting how many areas an particular area contributes to, i.e. number of outgoing arrows.

|-------------------------------------+---------------| | Area | Contributions | |-------------------------------------+---------------| | Homological algebra | 12 | | Lie groups | 11 | | Algebraic and differential topology | 10 | | Categories and sheaves | 9 | | Commutative algebra | 9 | | Commutative harmonic analysis | 9 | | Algebraic geometry | 8 | | Differential geometry and manifolds | 8 | | Integration | 8 | | Partial differential equations | 8 | | General algebra | 7 | | Noncommutative harmonic analysis | 6 | | Ordinary differential equations | 6 | | Spectral theory of operators | 6 | | Analytic geometry | 5 | | Automorphic and modular forms | 5 | | Classical analysis | 5 | | Mathematical logic | 5 | | Abstract groups | 4 | | Ergodic theory | 4 | | Probability theory | 4 | | Topological vector spaces | 4 | | General topology | 3 | | Number theory | 3 | | Von Neumann algebras | 2 | | Set theory | 1 | |-------------------------------------+---------------|

Next, counting the sources each area draws on, i.e. counting incoming arrows.

|-------------------------------------+---------| | Area | Sources | |-------------------------------------+---------| | Algebraic geometry | 13 | | Number theory | 12 | | Lie groups | 11 | | Noncommutative harmonic analysis | 11 | | Algebraic and differential topology | 10 | | Analytic geometry | 10 | | Automorphic and modular forms | 10 | | Ordinary differential equations | 10 | | Ergodic theory | 9 | | Partial differential equations | 9 | | Abstract groups | 8 | | Differential geometry and manifolds | 8 | | Commutative algebra | 6 | | Commutative harmonic analysis | 6 | | Probability theory | 5 | | Categories and sheaves | 4 | | Homological algebra | 4 | | Spectral theory of operators | 4 | | Von Neumann algebras | 4 | | General algebra | 2 | | Mathematical logic | 1 | | Set theory | 1 | | Classical analysis | 0 | | General topology | 0 | | Integration | 0 | | Topological vector spaces | 0 | |-------------------------------------+---------|

Finally, connectedness, counting incoming and outgoing arrows.

|-------------------------------------+-------------| | Area | Connections | |-------------------------------------+-------------| | Lie groups | 22 | | Algebraic geometry | 21 | | Algebraic and differential topology | 20 | | Noncommutative harmonic analysis | 17 | | Partial differential equations | 17 | | Differential geometry and manifolds | 16 | | Homological algebra | 16 | | Ordinary differential equations | 16 | | Analytic geometry | 15 | | Automorphic and modular forms | 15 | | Commutative algebra | 15 | | Commutative harmonic analysis | 15 | | Number theory | 15 | | Categories and sheaves | 13 | | Ergodic theory | 13 | | Abstract groups | 12 | | General algebra | 10 | | Spectral theory of operators | 10 | | Probability theory | 9 | | Integration | 8 | | Mathematical logic | 6 | | Von Neumann algebras | 6 | | Classical analysis | 5 | | Topological vector spaces | 4 | | General topology | 3 | | Set theory | 2 | |-------------------------------------+-------------|

There are some real quirks here. The most foundational areas get short shrift. Set theory contributes to only one area of math?! Topological vector spaces don’t depend on anything, not even topology?!

I suspect Dieudonné had in mind fairly high-level contributions. Topological vector spaces, for example, obviously depend on topology, but not deeply. You could do research in the area while seldom drawing on more than an introductory topology course. Elementary logic and set theory are used everywhere, but most mathematicians have no need for advanced logic or set theory.

More math diagrams:

]]>Analytic number theory uses the tools of analysis, especially complex analysis, to prove theorems about integers. The first time you see this it’s quite a surprise. But then you might expect that since analysis contributes to number theory, then number theory must contribute to analysis. But it doesn’t much.

Topology imports ideas from algebra. But it exports more than in imports, to algebra and to other areas. Topology started as a generalization of geometry. Along the way it developed ideas that extend far beyond geometry. For example, computer science, which ostensibly has nothing to do with whether you can continuously deform one shape into another, uses ideas from category theory that were developed initially for topology.

Here’s how Jean Dieudonné saw things. The following are my reconstructions of a couple diagrams from his book Panorama of Pure Mathematics, as seen by N. Bourbaki. An arrow from A to B means that A contributes to B, or B uses A.

For number theory, some of Dieudonné’s arrows go both ways, some only go into number theory. No arrows go only outward from number theory.

With topology, however, there’s a net flux out arrows going outward.

These diagrams are highly subjective. There’s plenty of room for disagreement. Dieudonné wrote his book 35 years ago, so you might argue that they were accurate at the time but need to be updated. In any case, the diagrams are interesting.

**Update**: See the next post of a larger diagram, sewing together little diagrams like the ones above.

Those days are dead and gone and the eulogy was delivered by Perl.

The other was a line from James Hague:

… if you romanticize Unix, if you view it as a thing of perfection, then you lose your ability to imagine better alternatives and become blind to potentially dramatic shifts in thinking.

This brings up something I’ve long wondered about: What did the Unix shell get right that has made it so hard to improve on? It has some truly awful quirks, and yet people keep coming back to it. Alternatives that seem more rational don’t work so well in practice. Maybe it’s just inertia, but I don’t think so. There are other technologies from the 1970’s that had inertia behind them but have been replaced. The Unix shell got something so right that it makes it worth tolerating the flaws. Maybe some of the flaws aren’t even flaws but features that serve some purpose that isn’t obvious.

(By the way, when I say “the Unix shell” I have in mind similar environments as well, such as the Windows command line.)

On a related note, I’ve wondered why programming languages and shells work so differently. We want different things from a programming language and from a shell or REPL. Attempts to bring a programming language and shell closer together sound great, but they inevitably run into obstacles. At some point, we have different expectations of languages and shells and don’t want the two to be too similar.

Anthony Scopatz and I discussed this in an interview a while back in the context of xonsh, “a Python-powered, cross-platform, Unix-gazing shell language and command prompt.” While writing this post I went back to reread Anthony’s comments and appreciate them more now than I did then.

Maybe the Unix shell is near a local optimum. It’s hard to make much improvement without making big changes. As Anthony said, “you quickly end up where many traditional computer science people are not willing to go.”

**Related post**: What’s your backplane?

It’s a very crude technique in general; you can get much more accuracy with the same number of function evaluations by using a more sophisticated method. But for smooth periodic functions, the trapezoid rule works astonishingly well.

This post will look at two similar functions. The trapezoid rule will be very accurate for one but not for the other. The first function is

*g*(*x*) = exp( cos(*x*) ).

The second function, *h*(*x*) replaces the cosine with its Taylor approximation 1 – *x*^{2}/2. That is,

*h*(*x*) = exp(1 – *x*^{2}/2 ).

The graph below shows both functions.

Both functions are perfectly smooth. The function *g* is naturally periodic with period 2π. The function *h* could be modified to be a periodic function with the same period since *h*(-π) = *h*(π).

But the periodic extension of *h* is not smooth. It’s continuous, but it has a kink at odd multiples of π. The derivative is not continuous at these points. Here’s a close-up to show the kink.

Now suppose we want to integrate both functions from -π to π. Over that range both functions are smooth. But the behavior of *h* “off stage” effects the efficiency of the trapezoid rule. Making *h* periodic by pasting copies together that don’t match up smoothly does not make it act like a smooth periodic function as far as integration is concerned.

Here’s the error in the numerical integration using 2, 3, 4, …, 10 integration points.

The integration error for both functions decreases rapidly as we go from 2 to 5 integration points. And in fact the integration error for *h* is slightly less than that for *g* with 5 integration points. But the convergence for *h* practically stops at that point compared to *g* where the integration error decreases exponentially. Using only 10 integration points, the error has dropped to approximately 7×10^{-8} while the error for *h* is five orders of magnitude larger.

**Related**: Numerical integration consulting

First, *r* is typically a function of θ, just as *y* is typically a function of *x*. But in rectangular coordinates, the independent variable is the first element of a pair while in polar coordinates it is the second element.

Second, if you’re going to walk a mile northwest, how do you proceed? You first face northwest, then you walk for a mile. You don’t walk a mile to the east and then walk 135° counterclockwise along an arc centered where you started. That is, you set your θ first, then increase your *r*.

The (*r*, θ) convention isn’t going to change. Maybe the only take-away from this discussion is to appreciate someone’s confusion who sees polar coordinates for the first time and wants to reverse their order.

**Related post**: Why use *j* for imaginary unit

(I don’t use *j* for imaginary unit, except when writing Python code. The *i* notation is too firmly engraved in my brain. But I understand why *j* has advantages.)

Here’s a close-up of the graph from a photo of my copy of A&S.

This was my reply.

It looks like a complex function of a complex variable. I assume the height is the magnitude and the markings on the graph are the phase. That would make it an elliptic function because it’s periodic in two directions.

It has one pole and one zero in each period. I think elliptic functions are determined, up to a constant, by their periods, zeros, and poles, so it should be possible to identify the function.

In fact, I expect it’s the Weierstrass *p* function. More properly, the Weierstrass ℘ function, sometimes called Weierstass’ elliptic function. (Some readers will have a font installed that will properly render ℘ and some not. More on the symbol ℘ here.)

**Related posts**:

Fourier series arise naturally when working in rectangular coordinates. Bessel series arise naturally when working in polar coordinates.

The Fourier series for a constant is trivial. You can think of a constant as a cosine with frequency zero.

The Bessel series for a constant is not as simple, but more interesting. Here we have

Since

we can write the series above more symmetrically as

**Related posts**:

It is supposed to be computationally impractical to create two documents that have the same secure hash code, and yet Google has demonstrated that they have done just that for the SHA1 algorithm.

I’d like to make two simple observations to put this in perspective.

**This is not a surprise**. Cryptography experts have suspected since 2005 that SHA1 was vulnerable and recommended using other algorithms. The security community has been gleeful about Google’s announcement. They feel vindicated for telling people for years not to use SHA1

**This took a lot of work**, both in terms of research and computing. Crypto researchers have been trying to break SHA1 for 22 years. And according to their announcement, these are the resources Google had to use to break SHA1:

- Nine quintillion (9,223,372,036,854,775,808) SHA1 computations in total
- 6,500 years of CPU computation to complete the attack first phase
- 110 years of GPU computation to complete the second phase

While SHA1 is no longer recommended, it’s hardly a failure. I don’t imagine it would take 22 years of research and millions of CPU hours to break into my car.

**I’m not saying people should use SHA1**. Just a few weeks ago I advised a client not to use SHA1. Why not use a better algorithm (or at least what is currently widely believed to be a better algorithm) like SHA3? But I am saying it’s easy to exaggerate what it means to say SHA1 has failed.

]]>

Even though he saw no value in the books he was assigned, he bought and saved every one of them. Then sometime near the end of college he began to read and enjoy the books he hadn’t touched.

I wanted to read their works now, all of them, and so I began. After I graduated, after Ellen and I moved together to New York, I piled the books I had bought in college in a little forest of stacks around my tattered wing chair. And I read them. Slowly, because I read slowly, but every day, for hours, in great chunks. I pledged to myself I would never again pretend to have read a book I hadn’t or fake my way through a literary conversation or make learned reference on the page to something I didn’t really know. I made reading part of my daily discipline, part of my workday, no matter what. Sometimes, when I had to put in long hours to make a living, it was a real slog. …

It took me twenty years. In twenty years, I cleared those stacks of books away. I read every book I had bought in college, cover to cover. I read many of the other books by the authors of those books and many of the books those authors read and many of the books by the authors of those books too.

There came a day when I was in my early forties … when it occurred to me that I had done what I set out to do. …

Against all odds, I had managed to get an education.

]]>

What I’ve done: Math prof, programmer, statistician

What I do now: Consulting

— John D. Cook (@JohnDCook) February 21, 2017

(The formatting is a little off above. It’s leaving out a couple line breaks at the end that were in the original tweet.)

That’s not a bad summary. I’ve worked in applied math, software development, and statistics. Now I consult in those areas.

Next, I did the same for Frank Sinatra.

Frank Sinatra #microresume:

Experience: puppet, pauper, pirate, poet, pawn, king, up, down, over, out

— John D. Cook (@JohnDCook) February 21, 2017

This one’s kinda obscure. It’s a reference to the title cut from his album That’s Life.

]]>I’ve been a puppet, a pauper, a pirate

A poet, a pawn and a king.

I’ve been up and down and over and out

And I know one thing.

Each time I find myself flat on my face

I pick myself up and get back in the race.

So maybe it would be useful to visualize graph spectra the way you visualize chemical spectra. How could you do this?

First, scale the spectrum of the graph to the visual spectrum. There are different kinds of matrices you can associate with a graph, and these have different natural ranges. But if we look at the spectrum of the Laplacian matrix, the eigenvalues are between 0 and *n* for a graph with *n* nodes.

Eigenvalues typically correspond to frequencies, not wavelengths, but chemical spectra are often displayed in terms of wave length. We can’t be consistent with both, so we’ll think of our eigenvalues as wavelengths. We could easily redo everything in terms of frequency.

This means we map 0 to violet (400 nm) and *n* to red (700 nm). So an eigenvalue of λ will be mapped to 400 + 300 λ/*n*. That tells us *where* to graph a spectral line, but what *color* should it be? StackOverflow gives an algorithm for converting wavelength to RGB values. Here’s a little online calculator based on the same code.

Here are some examples. First we start with a random graph.

Here’s what the spectrum looks like:

Here’s a cyclic graph:

And here’s its spectrum:

Finally, here’s a star graph:

And its spectrum:

**Related**: Ten spectral graph theory posts

First imagine a thin wire running through the coil of the shell. In cylindrical coordinates, this wire follows the parameterization

*r* = *e*^{kθ}

*z* = *Tt*

If *T* = 0 this is a logarithmic spiral in the (*r*, θ) plane. For positive *T*, the spiral is stretched so that its vertical position is proportional to its radius.

Next we build a shell by putting a tube around this imaginary wire. The radius *R* of the tube at each point is proportional to the *r* coordinate: *R = Dr.*

The image above was created using *k* = 0.1, *T* = 2.791, and *D* = 0.8845 using Øyvind Hammer’s seashell generating software. You can download Hammer’s software for Windows and experiment with your own shell simulations by adjusting the parameters.

See also Hammer’s book and YouTube video:

]]>Hammer’s code uses a statistical language called Past that I’d never heard of. Here’s my interpretation of his code using Python.

import matplotlib.pyplot as plt from numpy import arange, sin, cos, exp i = arange(5000) x1 = 1.0*cos(i/10.0)*exp(-i/2500.0) y1 = 1.4*sin(i/10.0)*exp(-i/2500.0) d = 450.0 vx = cos(i/d)*x1 - sin(i/d)*y1 vy = sin(i/d)*x1 + cos(i/d)*y1 plt.plot(vx, vy, "k") h = max(vy) - min(vy) w = max(vx) - min(vx) plt.axes().set_aspect(w/h) plt.show()

This code produces what’s called a harmonograph, related to the motion of a pendulum free to move in *x* and *y* directions:

It’s not exactly the same as the movie poster, but it’s definitely similar. If you find a way to modify the code to make it closer to the poster, leave a comment below.

]]>If we know *n* is a Fibonacci number, how can we tell which one it is? That is, if *n* = *F*_{m}, how can we find *m*?

For large *m*, *F*_{m} is approximately φ^{m} / √ 5 and the error decreases exponentially with *m*. By taking logs, we can solve for *m* and round the result to the nearest integer.

We can illustrate this with SymPy. First, let’s get a Fibonacci number.

>>> from sympy import * >>> F = fibonacci(100) >>> F 354224848179261915075

Now let’s forget that we know *F* is a Fibonacci number and test whether it is one.

>>> sqrt(5*F**2 - 4) sqrt(627376215338105766356982006981782561278121)

Apparently 5*F*^{2} – 4 is not a perfect square. Now let’s try 5*F*^{2} + 4.

>>> sqrt(5*F**2 + 4) 792070839848372253127

Bingo! Now that we know it’s a Fibonacci number, which one is it?

>>> N((0.5*log(5) + log(F))/log(GoldenRatio), 10) 100.0000000

So *F* must be the 100th Fibonacci number.

It looks like our approximation gave an exact result, but if we ask for more precision we see that it did not.

>>> N((0.5*log(5) + log(F))/log(GoldenRatio), 50) 99.999999999999999999999999999999999999999996687654

**Related posts**:

In this post we’ll write a little Python code to kick the tires on Cantrell’s approximation. The post also illustrates how to do some common tasks using SciPy and matplotlib.

Here are the imports we’ll need.

import matplotlib.pyplot as plt from scipy import pi, e, sqrt, log, linspace from scipy.special import lambertw, gamma, psi from scipy.optimize import root

First of all, the gamma function has a local minimum *k* somewhere between 1 and 2, and so it only makes sense to speak of its inverse to the left or right of this point. Gamma is strictly increasing for real values larger than *k*.

To find *k* we look for where the derivative of gamma is zero. It’s more common to work with the derivative of the logarithm of the gamma function than the derivative of the gamma function itself. That works just as well because gamma has a minimum where its log has a minimum. The derivative of the log of the gamma function is called ψ and is implemented in SciPy as `scipy.special.psi`

. We use the function `scipy.optimize.root`

to find where ψ is zero.

The `root`

function returns more information than just the root we’re after. The root(s) are returned in the array`x`

, and in our case there’s only one root, so we take the first element of the array:

k = root(psi, 1.46).x[0]

Now here is Cantrell’s algorithm:

c = sqrt(2*pi)/e - gamma(k) def L(x): return log((x+c)/sqrt(2*pi)) def W(x): return lambertw(x) def AIG(x): return L(x) / W( L(x) / e) + 0.5

Cantrell uses `AIG`

for Approximate Inverse Gamma.

How well goes this algorithm work? For starters, we’ll see how well it does when we do a round trip, following the exact gamma with the approximate inverse.

x = linspace(5, 30, 100) plt.plot(x, AIG(gamma(x))) plt.show()

This produces the following plot:

We get a straight line, as we should, so next we do a more demanding test. We’ll look at the absolute error in the approximate inverse. We’ll use a log scale on the *x*-axis since gamma values get large quickly.

y = gamma(x) plt.plot(y, x- AIG(y)) plt.xscale("log") plt.show()

This shows the approximation error is small, and gets smaller as its argument increases.

Cantrell’s algorithm is based on an asymptotic approximation, so it’s not surprising that it improves for large arguments.

**Related posts**:

Morse code was designed so that the most frequently used letters have the shortest codes. In general, code length increases as frequency decreases.

How efficient is Morse code? We’ll compare letter frequencies based on Google’s research with the length of each code, and make the standard assumption that a dash is three times as long as a dot.

|--------+------+--------+-----------| | Letter | Code | Length | Frequency | |--------+------+--------+-----------| | E | . | 1 | 12.49% | | T | - | 3 | 9.28% | | A | .- | 4 | 8.04% | | O | --- | 9 | 7.64% | | I | .. | 2 | 7.57% | | N | -. | 4 | 7.23% | | S | ... | 3 | 6.51% | | R | .-. | 5 | 6.28% | | H | .... | 4 | 5.05% | | L | .-.. | 6 | 4.07% | | D | -.. | 5 | 3.82% | | C | -.-. | 8 | 3.34% | | U | ..- | 5 | 2.73% | | M | -- | 6 | 2.51% | | F | ..-. | 6 | 2.40% | | P | .--. | 8 | 2.14% | | G | --. | 7 | 1.87% | | W | .-- | 7 | 1.68% | | Y | -.-- | 10 | 1.66% | | B | -... | 6 | 1.48% | | V | ...- | 6 | 1.05% | | K | -.- | 7 | 0.54% | | X | -..- | 8 | 0.23% | | J | .--- | 10 | 0.16% | | Q | --.- | 10 | 0.12% | | Z | --.. | 8 | 0.09% | |--------+------+--------+-----------|

There’s room for improvement. Assigning the letter O such a long code, for example, was clearly not optimal.

But how much difference does it make? If we were to rearrange the codes so that they corresponded to letter frequency, how much shorter would a typical text transmission be?

Multiplying the code lengths by their frequency, we find that an average letter, weighted by frequency, has code length 4.5268.

What if we rearranged the codes? Then we would get 4.1257 which would be about 9% more efficient. To put it another way, Morse code achieved 91% of the efficiency that it could have achieved with the same codes. This is relative to Google’s English corpus. A different corpus would give slightly different results.

Toward the bottom of the table above, letter frequencies correspond poorly to code lengths, though this hardly matters for efficiency. But some of the choices near the top of the table are puzzling. The relative frequency of the first few letters has remained stable over time and was well known long before Google. (See ETAOIN SHRDLU.) Maybe there were factors other than efficiency that influenced how the most frequently used characters were encoded.

**Update**: Some sources I looked at said that a dash is three times as long as a dot, including the space between dots or dashes. Others said there is a pause as long as a dot between elements. If you use the latter timing, it takes an average time equal to 6.0054 dots to transmit an English letter, and this could be improved to 5.6616. By that measure Morse code is about 93.5% efficient. (I only added time for space inside the code for a letter because the space between letters is the same no matter how they are coded.)

Universal algebra is the study of features common to familiar algebraic systems … [It] places the algebraic notions in their proper setting; it often reveals connexions between seemingly different concepts and helps to systemize one’s thoughts. … [T]his approach does not usually solve the whole problem for us, but only

.tidies up a mass of rather trivial detail, allowing us to concentrate our powers on the hard core of the problem

Emphasis added. Source: Universal Algebra by P. M. Cohn

**Related**: Applied category theory

The robustness of a graph is defined as

Then [2] explains that

Nis the total number of nodes in the initial network andS(q) is the relative size of the largest connected component after removingqnodes with the highest degrees. The normalization factor 1/Nensures 1/N≤R≤ 0.5. Two special cases exist:R= 1/Ncorresponds to star-like networks whileR= 0.5 represents the case of the fully connected network.

Apparently “relative size” means relative to the size of the original network, not to the network with *q* nodes removed.

There is one difference between [1] and [2]. The former defines the sum up to *N* and the latter to *N*-1. But this doesn’t matter because *S*(*N*) = 0: when you remove *N* nodes from a graph with *N* nodes, there’s nothing left!

I have a couple reservations. One is that it’s not clear whether *R* is well defined.

Consider the phrase “removing *q* nodes with the highest degrees.” What if you have a choice of nodes with the highest degree? Maybe all nodes have degree no more than *n* and several have degree exactly *n*. It seems your choice of node to remove matters. Removing one node of degree *n* may give you a very different network than removing another node of the same degree.

Along the same lines, it’s not clear whether it matters how one removes more than one degree. For *S*(2), for example, does it matter whether I remove the two most connected nodes from the original graph at once, or remove the most connected node first, then the most connected node from what remains?

My second reservation is that I don’t think the formula above gives exactly the extreme values it says. Start with a star graph. Once you remove the center node, there are only isolated points left, and each point is 1/*N* of the original graph. So all the *S*(*q*) are equal to 1/*N*. Then *R* equals (*N* – 1)/ *N*^{2}, which is less than 1/*N*.

With the complete graph, removing a point leaves a complete graph of lower order. So *S*(*q*) = (*N* – *q*)/*N*. Then *R* equals (*N* – 1)/2*N*, which is less than 1/2.

* * *

[1] CM Schneider et al, Mitigation of malicious attacks on networks. P. Natl. Acad. March 8, 2011, vol. 108. no. 10

[2] Big Data of Complex Networks, CRC Press. Matthias Dehmer et al editors.

]]>**JC**: *Can you start off by telling us a little bit about Give Directly, and what you do?*

**PN**: GiveDirectly is the first nonprofit that lets individual donors like you and me send money directly to the extreme poor. And that’s it—we don’t buy them things we think they need, or tell them what they should be doing, or how they should be doing it. Michael Faye and I co-founded GD, along with Jeremy Shapiro and Rohit Wanchoo, because on net we felt (and still feel) the poor have a stronger track record putting money to use than most of the intermediaries and experts who want to spend it for them.

**JC**: *What are common objections you brush up against, and how do you respond?*

**PN**: We’ve all heard and to some extent internalized a lot of negative stereotypes about the extreme poor—you can’t just give them money, they’ll blow it on alcohol, they won’t work as hard, etc. And it’s only in the last decade or so with the advent of experimental testing that we’ve build a broad evidence base showing that in fact quite the opposite is the case—in study after study the poor have used money sensibly, and if anything drank less and worked more. So to us it’s simply a question of catching folks up on the data.

**JC**: *Why do you think randomized controlled trials are emerging in development economics just in the past decade or so when it has been a standard tool gold standard in other areas for much longer?*

**PN**: I agree that experimental testing in development is long overdue. And to be blunt, I think it came late because we worry more about getting real results when we’re helping ourselves than we do when we’re helping others. When it comes to helping others, we get our serotonin from *believing* we’re making a difference, not the actual difference we make (which we may never find out, for example when we give to charities overseas). And so it’s tempting to succumb to wishful thinking rather than rigorous testing.

**JC**: *What considerations went into the design of your pending basic income trial? What would you have loved to do differently methodologically if you had 10X the budget? 100X?*

**PN**: This experiment is all about scale, in a couple of ways. First, there have been some great basic income pilots in the past, but they haven’t committed to supporting people for more than a few years. That’s important because a big argument the “pro” camp makes is that guaranteeing long-term economic security will free people up to take risks, be more creative, etc.—and a big worry the “con” camp raises is that it will cause people to stop trying. So it was important to commit to support over a long period. We’re doing over a decade—12 years—and with more funding we’d go even longer.

Second, it’s important to test this by randomizing at the community level, not just the individual level. That’s because a lot of the debate over basic income is about how community interactions will change (vs purely individual behavior). So we’re enrolling entire villages—and with more funding, we could make that entire counties, etc. That lets you start to understanding impacts on community cohesion, social capital, the macroeconomy, etc.

**JC**: *In what ways do you think math has served as a good or poor guide for development economics over the years?*

**PN**: I think the far more important question is why has math—and in particular statistics—played such a small role in development decision-making, while “success stories” and “theories of change” have played such large ones.

**JC**: *Can you say something about the efficiency of GiveDirectly?*

**PN**: What we’ve tried to do at GD is, first, be very clear about our marginal cost structure—typically around 90% in the hands of the poor, 10% on costs of enrolling them and delivering funds; and second, provide evidence on how these transfers affect a wide range of outcomes and let donors judge for themselves how valuable those outcomes are.

**JC**: *What is your vision for a methodologically sound poverty reduction research program? What are the main pitfalls and challenges you see?*

**PN**: First, we need to run experiments at larger scales. Testing new ideas in a few villages, run by an NGO, is a great start, but it’s not always an accurate to guide to how an intervention will perform when a government tries to deliver it nation-wide, or how doing something at that scale will affect the broader economy (what we call “general equilibrium effects”). I’ve written about this recently with Karthik Muralidharan based on some of our recent experiences running large-scale evaluations in India.

Second, we need to measure value created for the poor. RCTs tell us how an intervention changes “outcomes,” but not how valuable those outcomes are. That’s fine if you want to assign your own values to outcomes—I could be an education guy, say, and care only about years of formal schooling. But if we care at all about the values and priorities of the poor themselves, we need a different approach. One simple step is to ask people how much money an intervention is worth to them—what economists call their “willingness to pay.” If we’re spending $100 on a program, we’d hope it’s worth at least that much to the beneficiary. If not, begs the question why we don’t just give them the money.

**JC**: *What can people do to help?*

**PN**: Lots of things. Here are a few:

- Set up a recurring donation, preferably to the basic income project. Worst case scenario your money will make life much better for someone in extreme poverty; best case, it will also generate evidence that redefines anti-poverty policy.
- Follow ten recipients on GDLive. Share things they say that you find interesting. Give us feedback on the experience (which is very beta).
- Ask five friends whether they give money to poor people. Find out what they think and why. Share the evidence and information we’ve published and then give us feedback—what was helpful? What was missing?
- Ask other charities to publish the experimental evidence on their interventions prominently on their websites, and to explain why they are confident that they can add more value for the poor by spending money on their behalf than the poor could create for themselves if they had the money. Some do! But we need to create a world where simply publishing a few “success stories” doesn’t cut it any more.

**Related post**: Interview with Food for the Hungry CIO