The problem is more interesting when the interval is unknown. You may be trying to estimate the end points of the interval by taking the max and min of the samples you’ve drawn. But in fact we might as well assume the interval is [0, 1] because the probability of a new sample falling within the previous sample range does not depend on the interval. The location and scale of the interval cancel out when calculating the probability.

Suppose we’ve taken *n* samples so far. The range of these samples is the difference between the 1st and the *n*th order statistics, and for a uniform distribution this difference has a beta(*n*-1, 2) distribution. Since a beta(*a*, *b*) distribution has mean *a*/(*a*+*b*), the expected value of the sample range from *n* samples is (*n*-1)/(*n*+1). This is also the probability that the next sample, or any particular future sample, will lie within the range of the samples seen so far.

If you’re trying to estimate the size of the total interval, this says that after *n* samples, the probability that the next sample will give you any new information is 2/(*n*+1). This is because we only learn something when a sample is less than the minimum so far or greater than the maximum so far.

]]>

There are also many other proofs of the infinitude of primes that use more sophisticated arguments. For example, here is such a proof by Paul Erdős. Another proof shows that there must be infinitely many primes because the sum of the reciprocals of the primes diverges. There’s even a proof that uses topology.

When I first saw one of these proofs, I wondered whether they were circular. When you use advanced math to prove something elementary, there’s a chance you could use a result that depends on the very thing you’re trying to prove. The proofs are not circular as far as I know, and this is curious: the fact that there are infinitely many primes is elementary but not foundational. It’s elementary in that it is presented early on and it builds on very little. But it is not foundational. You don’t continue to use it to prove more things, at least not right away. You can develop a great deal of number theory without using the fact that there are infinitely many primes.

The Fundamental Theorem of Algebra is an example in the other direction, something that is foundational but not elementary. It’s stated and used in high school algebra texts but the usual proof depends on Liouville’s theorem from complex analysis.

It’s helpful to distinguish which things are elementary and which are foundational when you’re learning something new so you can emphasize the most important things. But without some guidance, you can’t know what will be foundational until later.

The notion of what is foundational, however, is conventional. It has to do with the order in which things are presented and proved, and sometimes this changes. Sometimes in hindsight we realize that the development could be simplified by changing the order, considering something foundational that wasn’t before. One example is Cauchy’s theorem. It’s now foundational in complex analysis: textbooks prove it as soon as possible then use it to prove things for the rest of course. But historically, Cauchy’s theorem came after many of the results it is now used to prove.

**Related**: Advanced or just obscure?

… If all men count with you, but none too much …

It would be good career advice for a mathematician to say “Let all areas of math count with you, but none too much.” This warns against dismissing something offhand because you’re sure you’ll never use it, and becoming so fond of something that it becomes a solution in search of a problem.

The same applies to technology: Let all technologies count with you, but none too much.

**Related posts**:

A couple definitions of applied math

]]>

Which side is correct depends on what’s out there waiting to be discovered, which of course we don’t know. We can only guess. Timid research is rational if you believe there are only marginal improvements that are likely to be discovered.

Sample size increases quickly as the size of the effect you’re trying to find decreases. To establish small differences in effect, you need very large trials.

If you think there are only small improvements on the status quo available to explore, you’ll explore each of the possibilities very carefully. On the other hand, if you think there’s a miracle drug in the pipeline waiting to be discovered, you’ll be willing to risk falsely rejecting small improvements along the way in order to get to the big improvement.

Suppose there are 500 drugs waiting to be tested. All of these are only 10% effective except for one that is 100% effective. You could quickly find the winner by giving each candidate to one patient. For every drug whose patient responded, repeat the process until only one drug is left. One strike and you’re out. You’re likely to find the winner in three rounds, treating fewer than 600 patients. But if all the drugs are 10% effective except one that’s 11% effective, you’d need hundreds of trials with thousands of patients each.

The best research strategy depends on what you believe is out there to be found. People who know nothing about cancer often believe we could find a cure soon if we just spend a little more money on research. Experts are more sanguine, except when they’re asking for money.

]]>To install Paul Taylor’s package on Windows, I created a directory called `localtexmf`

, set the environment variable `TEXINPUTS`

to its location, and copied `diagrams.sty`

file in that directory.

Here are a couple examples, diagrams used in the definition of product and coproduct.

And here’s the LaTeX to produce the diagrams.

\begin{diagram} & & X & & \\ & \ldTo^{f_1} & \dDashto_f & \rdTo^{f_2} & \\ A & \lTo_{\pi_1} & A\times B & \rTo_{\pi_2} & B \\ \end{diagram} \begin{diagram} & & X & & \\ & \ruTo^{f_1} & \uDashto_f & \luTo^{f_2} & \\ A & \rTo_{i_1} & A\oplus B & \lTo_{i_2} & B \\ \end{diagram}

For much more information, see the package page.

]]>You could read this aloud as “the mean of the mean is the mean.” More explicitly, it says that the expected value of the average of some number of samples from some distribution is equal to the expected value of the distribution itself. The shorter reading is confusing since “mean” refers to three different things in the same sentence. In reverse order, these are:

- The mean of the distribution, defined by an integral.
- The sample mean, calculated by averaging samples from the distribution.
- The mean of the sample mean as a random variable.

The hypothesis of this theorem is that the underlying distribution **has** a mean. Lets see where things break down if the distribution does not have a mean.

It’s tempting to say that the Cauchy distribution has mean 0. Or some might want to say that the mean is infinite. But if we take any value to be the mean of a Cauchy distribution — 0, ∞, 42, etc. — then the theorem above would be false. The mean of *n* samples from a Cauchy has the same distribution as the original Cauchy! The variability does not decrease with *n*, as it would with samples from a normal, for example. The sample mean doesn’t converge to any value as *n* increases. It just keeps wandering around with the same distribution, no matter how large the sample. That’s because the mean of the Cauchy distribution simply doesn’t exist.

]]>Every time code is patched, it becomes a little uglier, harder to understand, harder to maintain, bugs get introduced.

If you don’t start with a spec, every piece of code you write is a patch.

Which means the program starts out from Day One being ugly, hard to understand, and hard to maintain.

One reason this is interesting is that the series above has a special form that makes is a hypergeometric function of *z*. You can read more about it here.

I could imagine situations where having such an expression for a root is useful, though I doubt the series would be much use if you just wanted to find the roots of a fifth degree polynomial numerically. Direct application of something like Newton’s method would be much simpler.

]]>One response I had was that it’s not necessarily the same people who are being so bold and so timid. There’s a tension between the risk-tolerant and the risk-averse in America. The former are free to be bold in the private sector while the latter outvote them in the public sector.

Another explanation might be that an individual can be fearless and fearful about different things. Someone may be willing to risk millions of dollars but not be willing to risk eating unpasteurized food. There may be some sort of general risk homeostasis, though I imagine people willing to take risks in one area are often more willing to take risks in another area.

]]>MathWorld says that this approximation is accurate to 18457734525360901453873570 decimal digits. How could you get an idea whether this claim is correct? We could show that the approximation is near *e* by showing that its logarithm is near 1. That is, we want to show

Define *k* to be 3^(2^85) and notice that *k* also equals 9^(4^42). From the power series for log(1 + *x*) and the fact that the series alternates, we have

where η is some number between 0 and 1/*k*. This tells that the error is extremely small because 1/*k* is extremely small. It also tells us that the approximation underestimates *e* because its logarithm is slightly less than 1.

Just how small is 1/*k*? Its log base 10 is around -1.8 × 10^25, so it’s plausible that the approximation is accurate to 10^25 decimal digits. You could tighten this argument up a little and get the exact number of correct digits.

After wandering around for a bit, I found where I believed I should wait for the train, though I wasn’t entirely sure. While I was standing there a group of half-drunk young men from Scotland walked to the platform and asked me questions about the train. One of the group thought they were on the wrong platform, but I heard their leader say “He’s got glasses and a beard. He’s obviously more intelligent than us.” Apparently they found this argument convincing and they stayed.

Neither my nearsightedness nor my facial hair made me an expert on Dutch trains. This was my first time catching a train in a new country where most of the signs were written in a language I do not know. I imagine they’ve ridden more trains than I have. The only advantage I had over them was my sobriety. Maybe my experience as a consultant has enabled me to give confidence-insprirng advice on subjects I know less about than I’d like.

]]>For example, suppose there are 10,000 people, each with a 51% chance of answering a question correctly. The probability that more than 5,000 people will be right is about 98%. [1]

The key assumption here is **independence**, which is not realistic in most cases. But as people move in the direction of independence, the quality of the majority vote improves. Another assumption is that people are what machine learning calls “weak learners,” i.e. that they perform slightly better than chance. This holds more often than independence, but on some subjects people tend to do worse than chance, particularly experts.

You could call this the wisdom of crowds, but it’s closer to the wisdom of markets. As James Surowiecki points out in his book The Wisdom of Crowds, crowds (as in mobs) aren’t wise; large groups of independent decision makers are wise. Markets are wiser than crowds because they aggregate more independent opinions. Markets are subject to group-think as well, but not to the same extent as mobs.

***

[1] Suppose there are *N* people, each with independent probability *p* of being correct. Suppose *N* is large and *p* is near 1/2. Then the probability of a majority answering correctly is approximately

*Prob*( *Z* > (1 – 2*p*) sqrt(*N*) )

where *Z* is a standard normal random variable. You could calculate this in Python by

from scipy.stats import norm from math import sqrt print( norm.sf( (1 - 2*p)*sqrt(N) ) )

This post is an elaboration of something I first posted on Google+.

]]>From Computational Category Theory

]]>

I’ll be speaking at the Snow Unix Event in The Netherlands in a couple weeks and I plan to go to Germany in September. I’ve made a couple trips to California this year and it looks like I’ll be flying out there more often. And of course you can always find me in Houston. If you’d like to meet in person, please let me know.

]]>

I told the doctor I broke my leg in two places. He told me to quit going to those places.

Sometimes tech choices are that easy: if something is too hard, stop doing it. A great deal of pain comes from using a tool outside its intended use, and often that’s avoidable.

For example, when regular expressions get too hard, I stop using regular expressions and write a little procedural code. Or when Python is too slow, I try some simple ways of speeding it up, and if that’s not good enough I switch from Python to C++. If something is too hard to do in Windows, I’ll do it in Linux, and vice versa.

Sometimes there’s not a better tool available and you just have to slog through with what you have. And sometimes you don’t have the freedom to use a better tool even though one is available. But a lot of technical pain is self-imposed. If you keep breaking your leg somewhere, stop going there.

]]>]]>In Gilbert and Sullivan’s

The Gondoliers, Casilda is told that as an infant she was married to the heir of the King of Batavia, but that due to a mix-up no one knows which of two individuals, Marco or Giuseppe, is the heir. Alarmed, she wails “Then do you mean to say that I am married to one of two gondoliers, but it is impossible to say which?” To which the response is “Without any doubt of any kind whatever.”… Intuitionists … insist that a proof of

A∨Bmust show which ofAorBholds, and hence they would reject the claim that Casilda is married to Marco or Giuseppe until one of the two was identified as her husband. Perhaps Gilbert and Sullivan anticipated intuitionism, for their story’s outcome is that the heir turns out to be a third individual, Luiz, with whom Casilda is, conveniently, already in love.

Statistics is in many ways much more useful for most students than calculus. The problem is, to teach it

wellis extraordinarily difficult. It’s very easy to teach a horrible statistics class where you spit back the definitions of mean and median. But you become dangerous because you think you know something about data when in fact it’s kind of subtle.

A little knowledge is a dangerous thing, more so for statistics than calculus.

This reminds me of a quote by Stephen Senn:

Statistics: A subject which most statisticians find difficult but in which nearly all physicians are expert.

**Related**: Elementary statistics book recommendation

]]>

The terminology used throughout this document

enormously overloadsthe symbol p(). That is, we are using, in each line of this discussion, the function p() to mean something different; its meaning is set by the letters used in its arguments. That is a nomenclatural abomination. I apologize, and encourage my readers to do things that aren’t so ambiguous (like maybe add informative subscripts), but it is so standard in our business that I won’t change (for now).

I found this terribly confusing when I started doing statistics. The meaning is not explicit in the notation but implicit in the conventions surrounding its use, conventions that were foreign to me since I was trained in mathematics and came to statistics later. When I would use letters like *f* and *g* for functions collaborators would say “I don’t know what you’re talking about.” Neither did I understand what they were talking about since they used one letter for everything.

Monads are hard because there are so many bad monad tutorials getting in the way of finally finding Wadler’s nice paper.

Here’s the paper by Philip Wadler that I expect Luke Gorrie had in mind: Monads for functional programming.

Here’s the key line from Wadler’s paper:

Pure functional languages have this advantage: all flow of data is made explicit. And this disadvantage: sometimes it is painfully explicit.

That’s the problem monads solve: they let you leave implicit some of the repetitive code otherwise required by functional programming. That simple but critical point left out of many monad tutorials.

Dan Piponi wrote a blog post You Could Have Invented Monads! (And Maybe You Already Have) very much in the spirit of Wadler’s paper. He starts with the example of adding logging to a set of functions. You could have every function return not just the return value that you’re interested in but also the updated state of the log. This quickly gets messy. For example, suppose your basic math functions write to an error log if you call them with illegal arguments. That means your square root function, for example, has to take as input not just a real number but also the state of the error log before it was called. Monads give you a way of effectively doing the same thing, but conceptually separating the code that takes square roots from the code that maintains the error log.

As for “so many bad monad tutorials,” see Brent Yorgey on the monad tutorial fallacy.

By the way, this post is not Yet Another Monad Tutorial. It’s simply an advertisement for the tutorials by Philip Wadler and Dan Piponi.

]]>First, we’ll need to include +∞ and -∞ with the reals.

We define the new **addition** of two elements *x* and *y* to be -log (exp(-*x*) + exp(-*y*) ).

We define the new **multiplication** to be ordinary **addition**. (!)

In this new arithmetic +∞ is the additive identity and 0 is the multiplicative identity.

This new algebraic structure is called the log semiring. It’s called a semiring because it satisfies all the properties of a ring except that elements don’t necessarily have additive inverses. We’ll get into the details of the definition below.

***

Let’s put a subscript *S* on everything associated with our semiring in order to distinguish them from their more familiar counterparts. Then we can summarize the definitions above as

*a*+_{S}*b*= -log (exp(-*a*) + exp(-*b*) )*a**_{S}*b*=*a*+*b*- 0
_{S}= +∞ - 1
_{S}= 0

Note that if we define

*f*(*a*, *b*) = *a* +_{S} *b*

then

*f*(*a*, *b*) = -*g*(-*a*, -*b*)

where *g*(*a*, *b*) is the soft maximum of *a* and *b*.

***

Finally, we list the axioms of a semiring. Note that these equations all hold when we interpret +, *, 0, and 1 in the context of *S*, i.e. we imagine that each has a subscript *S* and is as defined above.

- (
*a*+*b*) +*c*=*a*+ (*b*+*c*) - 0 +
*a*=*a*+ 0 =*a* *a*+*b*=*b*+*a*- (
*a***b*) **c*=*a** (*b***c*) - 1 *
*a*=*a** 1 =*a* *a** (*b*+*c*) = (*a***b*) + (*a***c*)- (
*a*+*b*) **c*= (*a***c*) + (*b***c*) - 0 *
*a*=*a** 0 = 0

Each of these follows immediately from writing out the definitions.

]]>At the time I could think of two examples of functors that I cared about, one involving fundamental groups and one involving linear operators. The former puts the star down and the latter puts the star up.

If *X* and *Y* are topological spaces and *f* is a continuous function between *X* and *Y*, then *f* induces a map *f*_{*} between the π_{1}(*X*) and π_{1}(*Y*), the fundamental groups of *X* and *Y*. The star goes down.

If *X* and *Y* are linear spaces and *T* is a continuous linear transformation between *X* and *Y*, then *T** induces a map *Y*^{*} between the *Y*^{*}, the dual spaces of *Y* and *X*. The stars go up.

What was this arcane knowledge that Ted Odell obliquely referred to? It’s that by convention, stars go downstairs for covariant functors (like that between fundamental groups) and upstairs for contravariant functors (like that between dual spaces).

Covariant means that the induced maps go in the same direction as the original maps. Notice above that *f* goes from *X* to *Y* and *f*_{*} goes from π_{1}(*X*) to π_{1}(*Y*). The *X* and *Y* are in the same order, hence *co*variant, going in the same direction.

Contravariant means that the induces maps go in the opposite direction as the original maps. Notice that *T* goes from *X* to *Y*, but *T** goes from *Y*^{*} to *X*^{*}, the order of *X* and *Y* is reversed, hence *contra*variant, going in opposite directions.

The fundamental group is defined in terms of paths in a space, i.e. functions from an interval into a a space. The dual of a vector space is defined in terms of linear functions from that space to real numbers. Functors defined in terms of functions **from** a fixed space (like the unit interval) into the space you’re interested in are covariant. Functors defined in terms of functions from the space you’re interested **into** a fixed space (like the reals) are contravariant.

Another example of stars going up or down comes from differential geometry. A function *f* from *R*^{n} to *R*^{m} induces a map *f*_{*} from the tangent space *R*^{n}_{p} to *R*^{m}_{f(p)}. Star downstairs, *n* and *m* appearing in the same order. The analogous map on differential forms, however, is *f*^{*}, star upstairs, and goes in the opposite direction. (At least in Spivak‘s notation.) Unfortunately, the use of the words “covariant” and “contravariant” in the context of tensors is backward to what is now customary usage, though for good historical reasons.

***

The convention described here of putting stars downstairs for covariant functors and upstairs for contravariant functors is common, but it’s not universal. I ran into an exception right after writing this post. But I believe the convention is followed more often than not.

***

My initial impression of category theory was positive: it could tell me why the star goes downstairs in my topology course and upstairs in my functional analysis course! My next exposure to category theory left a bad impression that lasted for years. Category theory can be used to clarify or to obfuscate, to solve problems or to create problems, like any other tool.

]]>**JC**: *When you went to PNG to learn Nakui was there any writing system?*

**GG**: No, they had no way of writing words or numbers. They had names for only seven numbers — that was the extent of their counting system — but they could coordinate meetings more than a few days future by tying an equal number of knots in two vines. Each party would take a vine with them and loosen a knot each morning until they counted down to the appointed time — like and advent calendar, but without numbers!

**JC**: *I believe you said that Nakui has a similar grammar to other languages in PNG but completely different vocabulary. Were you able to benefit from any other translation work in the area? Will your work serve as a starting point for translating another language?*

**GG**: Yes. Missionaries had moved into our general area back in the 80’s and the languages they studied had mostly the same grammar features that we saw in the Nakui language. This sped the process, but in truth, the slowest part of learning to speak was not analyzing the grammar, it was training yourself to use it. Each Nakui verb could conjugate 300 different ways depending on person, recipient, tense, aspect, direction, and number.

**JC**: *Do any of the Nakui speak a pidgin or any other language that served as a bridge for you to learn their language?*

**GG**: We were blessed with one very good pidgin speaker when we moved into the village and several others who spoke the Melanesian Tok Pisin at a moderate level. That was a help in eliciting phrases and getting correction, but there were limits to how much they understood about their own speech. They had never thought about their language analytically and could not state a reason for the different suffixes used in the language, they just knew it was the right way to say something. Most peculiar was that they couldn’t even tell where certain word breaks were. Their language had never been “visible” to them. Nouns were easy to separate out, but it was hard to know if certain modifiers were affixes or separate function words.

**JC**: *What are a few things that English speakers would find surprising about the language?*

**GG**: Nakui has very few adjectives but innumerable verbs. They do the majority of their describing by using a very specific verb. There is a different verb for cutting something in the middle as opposed to cutting off a small section or cutting it long ways or cutting something to fall downward or cutting something to bits. The correct verb for the situation will also depend on the object being cut — is it log-like, leaf-like, or flesh-like.

**JC**: *Do you try to learn a new language aurally first before starting on a writing system, or do you start developing a writing system early on so you can take notes?*

**GG**: It’s language, so you always want your ears and tongue to lead your eyes. We use the international phonetic alphabet to write down notes, but we encourage language learners to hold off on writing anything for almost of month. After they are memorizing and pronouncing correctly a host of nouns and some noun modifiers, then we encourage them to start writing down their data with the phonetic alphabet. About half way to fluency they usually have enough perspective on the variety of sounds and where those sounds occur that they can narrow down a useful alphabet.

**JC**: *How long did it take to learn the language? To create the writing system? How many people were working on the project?*

**GG**: Primarily Tim Askew and myself, but there were three other missionaries that worked among the Nakui in the early days and each had some contribution to our understanding. The biggest credit really belongs to God, by whose grace we were allowed to live there and by whose strength we were able to chip away for three years until the job was done.

**JC**: *Can you describe the process you went through to create the writing system? *

**GG**: Well after collecting thousands of words written phonetically we could start a process of regularizing all that data. Looking at the language as a whole we had to decided the sleekest, most uniform way to consistently spell those sounds phonetically. After cleaning up our data we could take a hard look at which of the sounds were unique and which are just modified by surrounding sounds. Phonetically the English language has three different ‘t’ sounds: an aspirated ‘t’, and unaspirated ‘t’, and an unreleased ‘t’. We only think of one ‘t’ sound, though, and it would be confusing (and irritating) if we had three ‘t’ symbols in our alphabet. Our ‘t’s just become unreleased automatically when they are at the end of a word and they become unaspirated when they blend with an ‘s’. You need to narrow down the number of sounds to just the ones that are significant and meaningful. It is only those unique sounds that are given letters in the alphabet.

**JC**: *What is the alphabet you settled on? Do the letters represent sounds similar to what an English speaker would expect? *

**GG**: We found 10 vowel sounds and 10 consonant sounds were needful in the Nakui language. The 10 consonants are symbolized the same as English, but the vowels had to be augmented. We borrowed the letter ‘v’ to become the short ‘u’ sound and used accent marks over other vowels to show their lowered and more nasalized quality.

**JC**: *Could you give an example of something written in Nakui?*

**GG**: This is John 3:16. Músvilv Sisasvyv me iyemvi, “Kotvlv asini niyanú nonú mvli mvlei ufau. Múmvimvilv, tuwani aluwalv siyv tuwayv itii, asinosv. Mvimeni notalv me svmvleliyelv, ‘Sisasvni yáfu kokuimvisv,’ múmeni nonv ba itinvini, i tínonvminvini.

**JC**: *Is it unusual for a language to have as many vowel symbols as consonant symbols?*

**GG**: Good question. I’m not sure what normal would be worldwide, but I know we have a lot more than five vowel sounds in English! All our English vowel symbols have double or triple allophones (long sounds, short sounds, alternate sounds) making it very challenging to instruct my 6 yr old in the skill of reading.

**JC**: *Have you published anything on your work, say in a linguistics journal?*

**GG**: No, there is nothing unique in what we did. Truth is, we organized our findings and reported our work in an informal, non-academic, pragmatic way because our goal was not a PhD in anthropology, but a redeeming ministry in the lives of tribal men and women and ultimately a commendation from the Maker of men.

**JC**: *How many of the Nakui can read their language?*

**GG**: All the young men in our village are readers now, with the exception of a dyslexic man who attempted our literacy course three times without result. About 1/3 of the young women can read. They have less available time for study, and in the early days the village leaders did not allow the girl to be in class with the boys. As a result, literacy for ladies had a slower start and fewer participants. There were older men who tried to learn to read, but none that succeeded. The gray matter seemed to have stiffened.

**JC**: *If English didn’t have a writing system and you were designing one from scratch, what would you do differently?*

**GG**: Wow, we have six kids we’ve taught to read – we would do a lot differently! The ideal is to have one symbol for each meaningful sound in the language. English would need more vowels and fewer consonants. Life would actually be more pleasant without ‘c’, ‘j’, ‘q’ and ‘x’. If you removed those and added about six vowels we could come up with a spelling system that was predictable and easy to learn.

**JC**: *Anything else you’d like to say?*

**GG**: One of the most remarkable things about the Nakui language is how sophisticated and complex it is. Unblended with other languages, it is extremely consistent in its grammar. Although it is in a constant state of change (rapid change relative to written languages of the world), it remains organized. This order is not to the credit of the society that speaks it, it happens utterly in spite of them. They are completely unaware of the structure of their own grammar and are perhaps the most disorderly people on the planet. The aspects of life over which they hold sway are corrupt and miserable, but in this small, unmanaged area of their lives, a glimpse of the divine still peeks out. The God of order, who created them in His own image, gave them an innate ability to communicate complex ideas in and orderly way. Like Greek, they specify singular, dual and plural. They have six tenses (English has only three), and 11 personal pronouns (English has only six). Every verb is conjugated to carry this specificity with extreme consistency. The verb to ‘go’ is the only irregular verb in the whole Nakui language. English, although vast, is a disheveled mess compared to what is spoken by these stone age people in the swamps of interior New Guinea.

**Related posts**:

GlennF pointed out that taking the larger root of the equation

defines the golden ratio when *n* = 1, the silver ratio when *n* = 2, etc. φ_{n} is also the continued fraction made entirely of *n*‘s.

With this definition, we have

]]>Here’s a proof:

By De Moivre’s formula,

and so

**Related posts**:

Golden ratio and special angles

Golden strings and the rabbit ratio

… I’m older and, I hope, more able to cope with stress: just as carpenters get calloused hands that make them insensitive to small abrasions, I like to imagine that academics get calloused minds that allow them not to be bothered by small stresses and strains.

Mental callouses are an interesting metaphor. Without the context above, “calloused minds” would have a negative connotation. We say people are calloused or insensitive if they are unconcerned for other people, but Leinster is writing of people unperturbed by distractions.

You could read the quote above as implying that only academics develop mental discipline, though I’m sure that’s not what was intended. Leinster is writing a personal post about the process of writing books. He’s an academic, and so he speaks of academics.

Not only do carpenters become more tolerant of minor abrasions, they also become better at avoiding them. I’m not sure that I’m becoming more tolerant of stress and distractions as I get older, but I do think I’m getting a little better at anticipating and avoiding stress and distractions.

]]>

Some mathematicians are birds, others are frogs. Birds fly high in the air and survey broad vistas of mathematics out to the far horizon. They delight in concepts that unify our thinking and bring together diverse problems from different parts of the landscape. Frogs live in the mud below and see only the flowers that grow nearby. They delight in the details of particular objects, and they solve problems one at a time.

It’s an interesting metaphor. Like all metaphors it has its limits and Dyson discusses that. Some people are somewhere between a bird and a frog, whatever kind of creature that would be, and some alternate between being birds and frogs.

The other day I thought about Dyson’s classification and wondered whether category theorists would be birds or frogs. At first category theory seems avian, looking for grand patterns across mathematics. But as you wander further in, it seems more batrachian, absorbed in drawing little boxes and arrows.

I find it interesting that category theory can profound or trivial, depending on your perspective.

The motivations and applications are profound. Category theory has been called “metamathematics” because it formalizes analogies between diverse areas of math. But basic category theory itself is very close to its axioms. The path from first principles to common definitions and theorems in category theory is much shorter than, say, the path from the definition of the real numbers to the fundamental theorem of calculus.

(This diagram quantifies the last claim to some extent: the graph of concept dependencies in category theory is more wide than deep, and not that deep. Unfortunately I don’t have a similar diagram for calculus.)

**Related post**: Concepts, explosions, and developments

… we can ask Coq to “extract,” from a Definition, a program in some other, more conventional, programming language (OCaml, Scheme, or Haskell) with a high-performance compiler.

Most programmers would hardly consider OCaml, Scheme, or Haskell “conventional” programming languages, but they are conventional relative to Coq. As the authors said, these languages are “more conventional,” not “conventional.”

I don’t mean to imply anything negative about OCaml, Scheme, or Haskell. They have their strengths — I briefly mentioned the advantages of Haskell just yesterday — but they’re odd birds from the perspective of the large majority of programmers who work in C-like languages.

]]>- Work on the right problem.
- Explore the design space of solutions.
- Look at the data.
- Use the back of the envelope.
- Build prototypes.
- Make tradeoffs when you have to.
- Keep it simple.

]]>

It’s true that Haskell accounts for a tiny portion of the world’s commercial software and that the language is more popular in research. (There would be no need to put “real world” in the title of a book on PHP, for example. You won’t find a lot of computer science researchers using PHP for its elegance and nice theoretical properties.) But people do use Haskell on real projects, particularly when correctness is a high priority.[1] In any case, Haskell is “real world” for me since one of my clients uses it. As I wrote about before, applied is in the eye of the client.

I’m not that far into Real World Haskell yet, but so far it’s just what I was looking for. Another book I’d recommend is Graham Hutton’s Programming in Haskell. It makes a good introduction to Haskell because it’s small (184 pages) and focused on the core of the language, not so much on “real world” complications.

A very popular introduction to Haskell is Learn You a Haskell for Great Good. I have mixed feelings about that one. It explains most things clearly and the informal tone makes it easy to read, but the humor becomes annoying after a while. It also introduces some non-essential features of the language up front that could wait until later or be left out of an introductory book.

***

[1] Everyone would say that it’s important for their software to be correct. But in practice, correctness isn’t always the highest priority, nor should it be necessarily. As the probability of error approaches zero, the cost of development approaches infinity. You have to decide what probability of error is acceptable given the consequences of the errors.

It’s more important that the software embedded in a pacemaker be correct than the software that serves up this blog. My blog fails occasionally, but I wouldn’t spend $10,000 to cut the error rate in half. Someone writing pacemaker software would jump at the chance to reduce the probability of error so much for so little money.

On a related note, see Maybe NASA could use some buggy software.

]]>