If you read the last post on the Sleeping Beauty problem, you may recall I did not pledge allegiance to either the “halfer” or the “thirder” camp, because I was still thinking my position through. More than a month later, I still can’t say I am satisfied. Mathematically, the thirder position seems to be the most coherent, but intuitively, it doesn’t seem quite right.

Mathematically the thirder position works well because it is the same as a simpler problem. Imagine the director of the research lab drops in to see how things are going. The director knows all of the details of the Sleeping Beauty experiment, but does not know whether today is day one or two of the experiment. Looking in, she sees Sleeping Beauty awake. To what degree should she believe that the coin toss was Heads? Here there is no memory-wiping and the problem fits neatly into standard applications of probability and the answer is 1/3.

My intuitive difficulty with the thirder is better expressed with a more extreme version of the Sleeping Beauty problem. Instead of flipping the coin once, the experimenters flip the coin 19 times. If there are 19 tails in a row (which has a probability of 1 in 524,288), Sleeping Beauty will be woken 1 million times. Otherwise (i.e. if there was at least one Heads tossed), she will only be woken once. Following the standard argument of the thirders, when Sleeping Beauty is awoken and asked for her degree of belief that the coin tosses turned up at least one Heads, she should say approximately 1/3 (or more precisely, 524287/1524287). Intuitively, this doesn’t seem right. Notwithstanding the potential for 1 million awakenings, I would find it hard to bet against something that started off as a 524287/524288 chance. Surely when Sleeping Beauty wakes up, she would be quite confidence that at least one Heads came up and she is in the single awakening scenario.

Despite the concerns my intuition throws up, the typical thirder argues that Sleeping Beauty should assign 1/3 to Heads on the basis that she and the director have identical information. For example, here is an excerpt from a comment by RSM on the original post:

I want to know if halfers believe that two people with identical information about a problem, and with an identical set of priors, should assign identical probabilities to a hypothesis. I see the following possibilities:

- The answer is no -> could be a halfer (but not necessarily).
- The answer is yes, but the person holds that conditionalization is not a valid procedure –> could be a halfer.
- The answer is yes and the person accepts conditionalization, but does not accept that the priors for the four possibilities in the Sleeping Beauty puzzle should be equal –> could be a halfer.
- Otherwise, must be a thirder.

My intuition suggests, in a way I struggle to make precise, that Sleeping Beauty and the director do not in fact have identical information. All I can say is that Sleeping Beauty knows she will be awake on Monday (even if she subsequently forgets the experience), but the director may not observe Sleeping Beauty on Monday at all.

Nevertheless, option 2 raises interesting possibilities, on that have been explored in a number of papers. For example in D.J. Bradley’s “Self-location is no problem for conditionalization“, *Synthese* **182,** 393–411 (2011), it is argued that learning about temporal information involves “belief mutation”, which requires a different approach to updating beliefs than “discovery” of non-temporal information, which makes use of conditionalisation.

All of this serves as a somewhat lengthy introduction to an interesting approach to the problem developed by Giulio Katis, who first introduced me to the problem. The Stubborn Mule may not be a well-known mathematical imprint, but I am pleased to be able to publish his paper, *Sleeping Beauty, the probability of an experiment being in a state, and composing experiments,* here on this site. In this post I will include excerpts from the paper, but encourage those interested in a mathematical framing of a halfer’s approach to the problem. I am sure that Giulio will welcome comments on the paper.

Giulio begins:

The view taken in this note is that the contention between halfers and thirders over the Sleeping Beauty (SB) problem arises primarily for two reasons. The first reason relates to exactly what experiment or frame of reference is being considered: the perspective of SB inside the experiment, or the perspective of an external observer who chooses to randomly inspect the state of the experiment. The second reason is that confusion persists because most thirders and halfers have not explicitly described their approach in terms of generally defining a concept such as “the probability of an experiment being in a state satisfying a property

Pconditional on the state satisfying propertyC”.

Here Giulio harks back to Bob Walters’ distinction between experiments and states. In the context of the Sleeping Beauty problem, the “experiment” is a full run from coin toss, through Monday and Tuesday, states are a particular point in the experiment and as an example, *P* could be a state with the coin toss being Heads and *C* being a state in which Sleeping Beauty is awake.

From here, Giulio goes on to describe two possible “probability” calculations. The first would be familiar to thirders and Giulio notes:

What thirders appear to be calculating is the probability that an external observer randomly inspecting the state of an experiment finds the state to be satisfying P . Indeed, someone coming to randomly inspect this modified SB problem (not knowing on what day it started) is twice as likely to find the experiment in the case where tails was tossed. This reflects the fact that the reference frame or ‘timeframe’ of this external observer is different to that of (or, shall we say, to that ‘inside’) the experiment they have come to observe. To formally model this situation would seem to require modelling an experiment being run within another experiment.

The halfer approach is then characterised as follows:

The halfers are effectively calculating as follows: first calculate for each complete behaviour of the experiment the probability that the behaviour is in a state satisfying property

P; and then take the expected value of this quantity with respect to the probability measure on the space of behaviours of the experiment. Denote this quantity by Π(_{X}P) .

An interesting observation about this definition follows:

Note that even though at the level of each behaviour the ‘probability of being in a state satisfying

P’ is a genuine probability measure, the quantity Π(_{X}P) is not in general a probability measure on the set of states ofX. Rather, it is an expected value of such probabilities. Mathematically, it fails in general to be a probability measure because the normalization denominatorsn(p) may vary for each path. Even though this is technically not a probability measure, I will, perhaps wrongly, continue to call Π(_{X}P) a probability.

I think that this is an important observation. As I noted at the outset, the mathematics of the thirder position “works”, but typically halfers end up facing all sorts of nasty side-effects. For example, an incautious halfer may be forced to conclude that, if the experimenters tell Sleeping Beauty that today is Monday then she should update her degree of belief that the coin toss came up Heads to 2/3. In the literature there are some highly inelegant attempts to avoid these kinds of conclusions. Giulio’s avoids these issues by embracing the idea that, for the Sleeping Beauty problem, something other than a probability measure may be more appropriate for modelling “credence”:

I should say at this point that, even though Π

(_{X}P) is not technically a probability, I am a halfer in that I believe it is the right quantity SB needs to calculate to inform her degree of ‘credence’ in being in a state where heads had been tossed. It does not seem Ξ(_{X}P) [the thirders probability] reflects the temporal or behavioural properties of the experiment. To see this, imagine a mild modification of the SB experiment (one where the institute in which the experiment is carried out is under cost pressures): if Heads is tossed then the experiment ends after the Monday (so the bed may now be used for some other experiment on the Tuesday). This experiment now runs for one day less if Heads was tossed. There are two behaviours of the experiment: one we denote byp_{Tails}which involves passing through two statesS_{1}= (Mon, Tails),S_{2}= (Tue, Tails) ; and the other we denote byp_{Heads}which involves passing through one stateS_{3}= (Mon,Heads). LetP= {S_{3}}, which corresponds to the behaviourpHeads . That is, to say the experiment is in P is the same as saying it is is in the behaviourp_{Heads}. Note π(p_{Heads}) = 1/2 , but Ξ(_{X}P) = 1/3 . So the thirders view is that the probability of the experiment being in the state corresponding to the behaviourp_{Heads}(i.e. the probability of the experiment being in the behaviourp_{Heads}) is actually different to the probability ofp_{Heads}occurring!

This halfer “probability” has some interesting characteristics:

There are some consequences of the definition for Π

(_{X}P) above that relate to what some thirders claim are inconsistencies in the halfers’ position (to do with conditioning). In fact, in the context of calculating such probabilities, a form of ‘interference’ can arise for the series composite of two experiments (i.e. the experiment constructed as ‘first do experiment 1, then do experiment 2’), which does not arise for the probabilistic join of two experiments (i.e. the experiment constructed as ‘with probability p do experiment 1, with probability 1-p do experiment 2’).…

In a purely formal manner (and, of course, not in a deeper physical sense) this ‘nonlocality’, and the importance of defining the starting and ending states of an experiment when calculating probabilities, reminds me of the interference of quantum mechanical experiments (as, say, described by Feynman in the gem of a book QED). I have no idea if this formal similarity has any significance at all or is completely superficial.

Giulio goes on to make an interesting conjecture about composition of Sleeping Beauty experiments:

We could describe this limiting case of a composite experiment as follows. You wake up in a room with a white glow. A voice speaks to you. “You have died, and you are now in eternity. Since you spent so much of your life thinking about probability puzzles, I have decided you will spend eternity mostly asleep and only be awoken in the following situations. Every Sunday I will toss a fair coin. If the toss is tails, I will wake you only on Monday and on Tuesday that week. If the toss is heads, I will only wake you on Monday that week. When you are awoken, I will say exactly the same words to you, namely what I am saying now. Shortly after I have finished speaking to you, I will put you back to sleep and erase the memory of your waking time.” The voice stops. Despite your sins, you can’t help yourself, and in the few moments you have before being put back to sleep you try to work out the probability that the last toss was heads. What do you decide it is?

In this limit, Giulio argues that a halfer progresses to the thirder position, assigning 1/3 to the probability that the last toss was heads!

These brief excerpts don’t do full justice to the framework Giulio has developed, but I do consider it a serious attempt to encompass all of the temporal/non-temporal, in-experiment/out-of-experiment subtleties that the Sleeping Beauty problem throws up. This paper is only for the mathematically inclined and, like so much written on this subject, I doubt it will convince many thirders, but if nothing else I hope it will put Giulio’s mind at rest having the paper published here on the Mule. Over recent weeks, his thoughts have been as plagued by this problem as have mine.

]]>For the last couple of weeks, I have fallen asleep thinking about Sleeping Beauty. Not the heroine of the Charles Perrault fairy tale, or her Disney descendant, but the subject of a thought experiment first described in print by philosopher Adam Elga as follows:

Some researchers are going to put you to sleep. During the two days that your sleep will last, they will briefly wake you up either once or twice, depending on the toss of a fair coin (Heads: once; Tails: twice). After each waking, they will put you to back to sleep with a drug that makes you forget that waking. When you are first awakened, to what degree ought you believe that the outcome of the coin toss is Heads?

Elga, A. “Self‐locating belief and the Sleeping Beauty problem”, Analysis 60, 143–147 (2000)

It has become traditional to add that Sleeping Beauty is initially put to sleep on Sunday and is either woken up on Monday (Heads) or Monday and Tuesday (Tails). Then on Wednesday she is woken for the final time and the experiment is over. She knows in advance exactly what is going to take place, believes the experimenters and trusts that the coin is fair.

Much like the Monty Hall problem, Sleeping Beauty has stirred enormous controversy. There are two primary schools of thought on this problem. The **thirders** and the **halfers**. Both sides have a broad range of arguments, but put simply they are as follows.

**Halfers** argue that the answer is 1/2. On Sunday Sleeping Beauty believed that the chance of Heads was 1/2, she has learned nothing new when waking and so the chances are still 1/2.

**Thirders **argue that the answer is 1/3. If the experiment is repeated over and over again, approximately 1/3 of the time she will wake up after Heads and 2/3 of the time she will wake up after tails.

I first came across this problem myself when a friend alerted me to a blog post by my former supervisor Bob Walters, who describes the thirder position as an “egregious error”. But as Bob notes, there are many in the thirder camp, including Adam Elga himself, physicist Sean Carroll and statistician Jeffrey Rosenthal.

As for my own view, I will leave you in suspense for now, mainly because I’m still thinking it through. Although superficially similar, I believe that it is a far more subtle problem than the Monty Hall problem and poses challenges to what it means to move the pure mathematical theory of probability to a real world setting. Philosophers distinguish between the mathematical concept of “probability” and real world “credence”, a Bayesian style application of probability to real world beliefs. I used to think that this was a bit fanciful on the part of philosophers. Now I am not sure sure: applying probability is harder than it looks.

Let me know what you think!

Image Credit: Serena-Kenobi

]]>*John Carmody returns to the Mule in his promised second guest post and takes a close look at Australia Post’s profitability with some (ahem) back-of-the-envelope calculations.*

There are many forms of communication which underpin the function and productivity of a modern society like Australia. Despite the Cassandra-commentary from Mr Ahmed Fahour (the well-paid CEO of Australia Post), regular mail delivery certainly remains one of them.

In making his tendentious, but opaque, points, he has not been entirely frank with the community. He has, for instance, claimed that 99% of our mail is electronic. That assertion is meaningless because so much e-mail is advertising, brief inter- or intra-office memos and notices, or quick substitutes for telephone calls. When these are removed from the calculation, the importance of “hard mail” becomes more obvious

The data which the Herald has published (for instance, “Please Mr Postman: snail mail doomed to disappear“, 14 June) also show how shallow or formulaic Mr Fahour’s thinking seems to be. In 2012-13 Australia Post made an after-tax profit of $312 million and if there had been no losses on the handling of letters, that would have been $530 million. Do Australians really want a profit of that magnitude from such a vital national service?

But when one looks at that “letter-loss” a little more closely and at the figure of 3.6 billion letters delivered that year, it is clear that the loss per letter was 6.5 cents. In other words, if instead of recently increasing the cost of a standard letter to 70 cents, this had been to 75 cents, the losses would have been comprehensively dealt with.

Some comparisons might be informative. The British Royal Mail currently charges about $A1.10 for delivery of a standard (20g) letter for next-day delivery within the UK (its “aim”) and $A0.95 if you’re happy for delivery within 3 days. The Deutsche Post charges the equivalent of 86 Australian cents for delivery within Germany but about $A1.08 cents to adjacent France. Given that we currently pay only 70 cents for delivery across a far larger area, my suggested price of 75 cents seems reasonable and justified.

]]>*For the first time in a while, John Carmody returns to the Stubborn Mule with the first of two guest posts. He argues that the government’s proposed medical “co-payments” do not add up.*

The government continues to flounder about many details of its budget and part of the reason is a lack of stated clarity about its intentions (although the electors are drawing their own conclusions about those intentions and whether they are fair and honest). The proposed $7 “co-payment” for GP visits is an example of this lack of frankness.

On the one hand, the Government – purporting to be concerned about an excessive patronage of GPs – seems to want us to visit our doctors less frequently than the 6 visits which every man, woman and child currently makes each year (i.e. about once in two months for all of us, an internationally comparable figure, incidentally) . On the other hand, it has, so to speak, attempted to sugar-coat this unpleasant pill by promising that, while a little of that fee will go to the practitioners, most of it will go into a special fund (to be built up to $20 billion over the next 6 years) to boost medical research (and thereby do us all a great deal of good). Neither claim survives scrutiny.

The $2 proposed share to GPs will not compensate them for the extra administrative costs which they will have to carry on behalf of the Government; nor will that nugatory sum compensate for the progressive tightening of the reimbursement of doctors from “Medicare”; so the Government’s share will, to be realistic, need to be significantly less than $5. After dealing with its own extra administrative costs, therefore, the Government will probably only be able to put $3-4 per GP consultation into the proposed research fund. To build that fund up to the $20 billion proposed will require every Australian to visit the GP about 50 times each year – once each week. How this is going to reduce our alleged “overuse” of medical services has not been explained. Nor has how, in practice, it can be achieved. The Government is living in Fairyland.

]]>

Before, during and after this month’s budget, Treasurer Joe Hockey sounded dire warnings about Australia’s “budget emergency”. Amidst this fear-mongering, it was a pleasant relief to come across a dissenting view. In a recent interview on 2SER Dr Stephanie Kelton (Department of Economics at the University of Missouri in Kansas City) argued that the government budget is very different from a household budget, however appealing that analogy might be. Governments like the Australian government, with its own free-floating currency can spend more than they take in taxation without worrying about running out of money. While the economy is weak, the government can comfortably run a deficit. The constraint to worry about is the risk of inflation, which means curbing spending once the economy heats up.

I posted a link to Facebook, and immediately drew comment from a more ~~conservatively~~ libertarian-minded friend: “of course a deficit is a bad thing!”. Pressed for an explanation, he argued that government spending was inefficient and “crowded out” more productive private sector investment. This did not surprise me. Deep down, the primary concern of many fiscal conservatives is government spending itself, not a deficit. This is easy to test: ask them whether they would be happy to see the deficit closed by increased taxes rather than decreased spending. The answer is generally no, and helps explain why so many more traditional conservatives are horrified by the prospect of the Coalition’s planned tax on higher income earners….sorry, “deficit levy”.

From there, the debate deteriorated. North Korea was compared to South Korea as evidence of the proposition that government spending was harmful, while a left-leaning supporter asked whether this meant Somalia’s economy should be preferred to Sweden’s. Perhaps foolishly, I proffered a link to an academic paper (on the website of that bastion of left-wing thought, the St.Louis Fed) which presented a theoretical argument to the “crowding out” thesis. My sparring partner then rightly asked whether the thread was simply becoming a rehash of the decades old Keynes vs Hayek feud, a feud best illustrated by Planet Money’s inimitable music video.

Macroeconomic theory was never going to get us anywhere (as I should have known only too well). Instead, the answer lay in the data, with more sensible examples than North Korea and Somalia. Aiming to keep the process fair, avoiding the perils of mining data until I found an answer that suited me, here was my proposal:

I’m going to grab a broad cross-section of countries over a range of years and compare a measure of government expenditure (as % of GDP to be comparable across countries) to a measure of economic success (I’m thinking GDP per capita in constant prices).

If indeed government spending is inherently bad for an economy, we should see a negative correlation: more spending, weaker economy and vice versa. My own expectation was to see no real relationship at all. In a period of economic weakness, I do think that government spending can provide an important stimulus, but I do not think that overall government spending is inherently good or bad.

The chart below illustrates the relationship for 32 countries taken from the IMF’s data eLibrary. To eliminate short-term cyclical effects, government spending and GDP per capita (in US$ converted using purchasing power-parity) was averaged over the period 2002-2012.

The countries in this IMF data set are all relatively wealthy, with stable political structures and institutions. All but one is classified as a “democracy” by the Polity Project (the exception is Singapore, which is classified as an “anocracy” due to an assessment of a high autocracy rating). This helps to eliminate more extreme structural variances between the countries in the study, providing a better test of the impact of government spending. Even so, there are two outliers in this data set. Luxembourg has by far the highest GDP per capita and Mexico quite low GDP per capita, with the lowest rate of government spending.

The chart below removes these outliers. There is no clear pattern to the data. There is no doubt that government spending can be well-directed or wasted, but for me this chart convincingly debunks a simple hypothesis that overall government spending is necessarily bad for the economy.

Now look for the cross (+) on the chart: it is Australia (IMF does not include data for New Zealand and we are the sole representative of Oceania). Despite Hockey’s concerns about a budget emergency, Australia is a wealthy country with a relatively low rate of government spending. Among these 30 countries, only Switzerland and South Korea spend less. These figures are long run averages, so perhaps the “age of entitlement” has pushed up spending in recent years? Hardly. Spending for 2012 was 35.7% compared to the 2002-2012 average of 35.3%. The shift in the balance of government spending from surplus to deficit is the result of declining taxation revenues rather than increased spending. Mining tax anyone?

]]>My recent randomness post hinged on people’s expectations of how long a run of heads or tails you can expect to see in a series of coin tosses. In the post, I suggested that people tend to underestimate the length of runs, but what does the ~~fox~~ maths say? The exploration of the numbers in this post draws on the excellent 1991 paper “The Longest Run of Heads” by Mark Schilling, which would be a good starting point for further reading for the mathematically inclined.. When I ran the experiment with the kids, I asked them to try to simulate 100 coin tosses, writing down a sequence of heads and tails. Their longest sequence was 5 heads, but on average, for 100 tosses, the length of the longest run (which can be either heads or tails) is 7. Not surprisingly, this figure increases for a longer sequence of coin tosses. What might be a bit more surprising is how slowly the length of longest run grows. Just to bump up the average length from 7 to 8, the number of tosses has to increase from 100 to 200. It turns out that the average length of the longest run grows approximately logarithmically with the total number of tosses. This formula gives a pretty decent approximation of the expected length:

average length of longest run in *n* tosses ≃ log_{2 }*n* + 1/3

The larger the value of *n*, the better the approximation and once *n* reaches 20, the error falls below 0.1%.

**Growth of the Longest Run**

However, averages (or, technically, expected values) like this should be used with caution. While the average length of the longest run seen in 100 coin tosses is 7, that does not mean that the longest run will typically have length 7. The probability distribution of the length of the longest run is quite skewed, as is evident in the chart below. The most likely length for the longest run is 6, but there is always a chance of getting a much longer run (more so than very short runs, which can’t fall below 1) and this pushes up the average length of the longest run.

**Distribution of the Longest Run in 100 coin tosses**

What the chart also shows is that the chance of the longest run only being 1, 2 or 3 heads or tails long is negligible (less than 0.03%). Even going up to runs of up to 4 heads or tails adds less than 3% to the cumulative probability. So, the probability that the longest run has length at least 5 is a little over 97%. If you ever try the coin toss simulation experiment yourself and you see a supposed simulation which does not have a run of at least 5, it’s a good bet that it was the work of a human rather than random coin. Like the average length of the longest run, this probability distribution shifts (approximately) logarithmically as the number of coin tosses increases. With a sequence of 200 coin tosses, the average length of the longest run is 8, the most likely length for the longest run is 7 and the chances of seeing a run of at least 5 heads or tails in a row is now over 99.9%. If your experimental subjects have the patience, asking them to simulate 200 coin tosses makes for even safer ground for you to prove your randomness detection skills.

**Distribution of the Longest Run in 200 coin tosses**

What about even longer runs? The chart below shows how the chances of getting runs of a given minimum length increase with the length of the coin toss sequence. As we’ve already seen, the chances of seeing a run of at least 5 gets high very quickly, but you have to work harder to see longer runs. In 100 coin tosses, the probability that the longest run has length at least 8 is a little below 1/3 and is still only just over 1/2 in 200 tosses. Even in a sequence of 200 coin tosses, the chances of seeing at least 10 heads or tails in a row is only 17%.

**Longest Run probabilities**

Getting back to the results of the experiment I conducted with the kids, the longest run for both the real coin toss sequence and the one created by the children was 5 heads. So, none of the results here could help to distinguish them. Instead, I counted the number of “long” runs. Keeping the distribution of long runs for 100 tosses in mind, I took “long” to be any run of 4 or more heads or tails. To calculate the probability distribution for “long” runs, I used simulation*, generating 100,000 separate samples of a 100 coin toss sequences. The chart below shows the results, giving an empirical estimate of the probability distribution for the number of runs of 4 or more heads or tails in a sequence of 100 coin tosses. The probability of seeing no more than two of these “long” runs is only 2%, while the probability of seeing 5 or more is 81%.

These results provide the ammunition for uncovering the kids’ deceptions. Quoting from the Randomness post:

One of the sheets had three runs of 5 in a row and two runs of 4, while the other had only one run of 5 and one run of 4.

So, one of the sheets was in the 81% bucket and one in the 2% bucket. I guessed that the former was the record of coin tosses and the second was devised by the children. That guess turned out to be correct and my reputation as an omniscient father was preserved! For now.

If you have made it this far, I would encourage you to do the following things (particularly the first one):

- Listen to Stochasticity, possibly the best episode of the excellent Radiolab podcast, which features the coin toss challenge
- Try the experiment on your own family or friends (looking for at least 3 runs of 5 or more heads or tails and ideally at least one of 6 or more)
- Share your results in the comments below.

I look forward to hearing about any results.

* UPDATE: I subsequently did the exactly calculations, which confirmed that these simulated results were quite accurate.

]]>I have been watching some (very) old Doctor Who episodes, including the first ever serial featuring the archetypal villains, the Daleks. In this story, the Daleks share a planet with their long-time enemies, the Thal. After a war culminating in the denotation of a neutron bomb, both races experience very different mutations. The Daleks have become shrunken beasts that get about in robotic shells, while the more fortunate Thals mutated into peace-loving blondes.

The Thals hope to make peace with the Daleks, but the Daleks have more fiendish plans and plot to lure the Thals into their city with a gift of food and then ambush them. It is a good plan, but it is the choice of gifts that left me bemused. There is plenty of fruit and some large tins whose contents remain undisclosed. These may be reasonable choices, although I do find it hard to picture the Daleks stacking melons with their plunger hands. But the trap also appears to feature stacks of toilet paper. Granted, toilet paper may be an appealing luxury for the Thal, who have been trekking through the jungle for a year, but the real question here is, why do Daleks even have toilet paper?

]]>With three children, I have my own laboratory at home for performing psychological experiments. Before anyone calls social services, there is an ethical committee standing by (their mother).

This evening, I tried out one of my favourites: testing the perception of randomness. Here is the setup: I gave the boys two pieces of paper and a 20 cent coin. I was to leave the room, then they had to decide which of the two sheets of paper would be decided by the boys and which by a coin. Having made their choice, they then had to write down on one of the sheets their best attempt at a “random” sequence of 100 heads (H) and tails (T). Having done that, they were then to toss the coin 100 times, writing down on the other page the sequence of heads and tails that came up. I would then return to the room and guess which one was determined by the toss of the coin, and which by the boys.

I identified which sequence was which in less than 30 seconds. How did I do it?

The trick is to look for the longer sequences. Much like the gambler at the roulette wheel, the kids assume that a run of heads cannot last too long. One of the sheets had three runs of 5 in a row and two runs of 4, while the other had only one run of 5 and one run of 4. I correctly picked that the sheet with more long runs was determined by the coin toss.

Try it yourself sometime. If you see a run of 6 or more (which is in fact quite probable in a sequence of 100 coin tosses), you can quite confidently pick that as the coin toss, unless your subject has been well schooled in probability.

Our intuition struggles with randomness. We tend to assume randomness is more regular than it is. On the other hand, we also try to find patterns where there is only randomness, whether it is the man in the moon, clouds that look like things, the face of Mary on a piece of toast or, perhaps, an explanation for the disappearance of MH 370.

]]>Recently I travelled to China for the first time. My first glimpse of Beijing took in the Escher-like headquarters of Chinese TV station CCTV. It is an extraordinary building and to get a proper sense of it, you have to see it from a number of different angles.

Driving across the city, impressed by the scale of the place, I asked one of my hosts about the population of Beijing. He told me there were about 40 million, including non-residents. Almost double the entire population of Australia. Maybe it’s an exaggeration, but more than the figure itself it was the reference to “non-residents” that piqued my interest. Where there really so many people moving to China as to have a significant impact on the population of the capital?

Later, I learned that these non-residents were in fact people from other provinces. Under China’s Hukou system, restrictions are imposed on people’s ability to move from one part of the country to another. Many people from rural areas are drawn to cities to find work, but without residency rights for the city in which they work they cannot access public education or health care. So, Beijing is full of married men who have left their families at home in the provinces. Living in tiny apartments, they work all year and then travel back to their families for Chinese New Year, taking their earnings with them.

Being used to freedom of movement in Australia, it’s hard not to see this as a harsh system. But, reflecting on the numbers, China is a country of 1.3 billion people; if there are already 30 to 40 million people in Beijing, how would the city cope with a sudden influx of millions more? Only a few days ago, the central committee of China’s communist party released new targets to increase urbanisation from 53.7% of the population to 60% by 2020. This plan involves granting urban hukou status to an additional 100 million rural migrant workers. Even so, another 200 million migrants will remain non-residents. It is sobering to consider the potential consequences of granting full freedom of migration to the entire population rather than managing the process in this highly controlled fashion.

I’m not about to renounce my belief in democracy (however challenged it may be in many Western countries today), but, much like the CCTV building, it seems that to better understand China, you have to see it from a number of different angles.

]]>*For some time now, our regular contributor James Glover been promising me a post with some statistical analysis of historical global temperatures. To many the science of climate change seems inaccessible and the “debate” about climate change can appear to come down to whether you believe a very large group of scientists or a much smaller group of scientists people. Now, with some help from James and a beer coaster, you can form your own view.*

How I wish that the title of this article was literally true and not just a play on words relating to the Harmonic Series. Sadly, the naysayers are unlikely to be swayed, but read this post and you too can disprove global warming denialism on the back of a beer coaster!

It is true, I have been promising the Mule a statistical analysis of Global Warming. Not only did I go back and look at the original temperature data but I even downloaded the data and recreated the original “hockey stick” graph. For most people the maths is quite complicated though no more than an undergraduate in statistics would understand. It all works out. As a sort of professional statistician, who believes in Global Warming and Climate Change, I can only reiterate my personal mantra: there is no joy in being found to be right on global warming.

But before I get onto the beer coaster let me give a very simple explanation for global warming and why the rise in CO_{2} causes it. Suppose I take two sealed glass boxes. They are identical apart from the fact that one has a higher concentration of CO_{2}. I place them in my garden (let’s call them “greenhouses”) and measure their temperature, under identical conditions of weather and sunshine, over a year. Then the one with more CO_{2} will have a higher temperature than the one with less. Every day. Why? Well it’s simple: while CO_{2} is, to us, an “odourless, colourless gas” this is only true in the visible light spectrum. In the infra-red spectrum, the one with more CO_{2} will be darker. This means it absorbs more infrared radiation and hence has a higher temperature. CO_{2} is invisible to visible light but, on it’s own, would appear black to infrared radiation. The same phenomenon explains why black car will heat up more in the sun than a white one. This is basic physics and thermodynamics that was understood in the 19th century when it was discovered that “heat” and “light” were part of the same phenomenon, i.e. electromagnetic radiation.

So why is global warming controversial? Well, while what I said is undeniably true in a pair of simple glass boxes, the earth is more complicated than these boxes. Radiation does not just pass through, it is absorbed, reflected and re-radiated. Still, if it absorbs more radiation than it receives then the temperature will increase. It is not so much the surface temperature itself which causes a problem, but the additional energy that is retained in the climate system. Average global temperatures are just a simple way of trying to measure the overall energy change in the system.

If I covered the glass box containing more CO_{2} with enough aluminium foil, much of the sunshine would be reflected and it would have a lower temperature than its lower CO_{2} twin. Something similar happens in the atmosphere. Increasing temperature leads to more water vapour and more clouds. Clouds reflect sunshine and hence there is less radiation to be absorbed by the lower atmosphere and oceans. It’s called a negative feedback system. Maybe that’s enough to prevent global warming? Maybe, clouds are very difficult to model in climate models, and water vapour is itself a greenhouse gas. Increasing temperature also decreases ice at the poles. Less ice (observed) leads to less radiation reflected and more energy absorbed. A positive feedback. It would require a very fine tuning though for the radiation reflected back by increased clouds to exactly counteract the increased absorption of energy due to higher CO_{2}. Possible, but unlikely. Recent models show that CO_{2} wins out in the end. As I as said, there is no joy to being found right on global warming.

So enough of all that. Make up your own mind. Almost time for the Harmony. Perusing the comments of a recent article on the alleged (and not actually real) “pause” in global warming I came across a comment to the effect that “if you measure enough temperature and rainfall records then somewhere there is bound to be a new record each year”. I am surprised they didn’t invoke the “Law of Large Numbers” which this sort of argument usually does. Actually The Law of Large Numbers is something entirely different, but whatever. So I asked myself, beer coaster and quill at hand, what is the probability that the latest temperature or rainfall is the highest since 1880, or any other year for that matter?

Firstly, you can’t prove anything using statistics. I can toss a coin 100 times and get 100 heads and it doesn’t prove it isn’t a fair coin. Basically we cannot know all the possible set ups for this experiment. Maybe it is a fair coin but a clever laser device adjusts its trajectory each time so it always lands on heads. Maybe aliens are freezing time and reversing the coin if it shows up tails so I only think it landed heads. Can you assign probabilities to these possibilities? I can’t.

All I can do is start with a hypothesis that the coin is fair (equal chance of heads or tails) and ask what is the probability that, despite this, I observed 100 heads in a row. The answer is not zero! It is actually about 10^{-30}. That’s 1 over a big number: 1 followed by 30 zeros. I am pretty sure, but not certain, that it is not a fair coin. But maybe I don’t need to be certain. I might want to put a bet on the next toss being a head. So I pick a small number, say 1%, and say if I think the chance of 100 head is less than 1% then I will put on the bet on the next toss being heads. After 100 tosses the hypothetical probability (if it was a fair coin) is much less than my go-make-a-bet threshold of 1%. I decide to put on the bet. It may then transpire that the aliens watching me bet and controlling the coin, decide to teach me a lesson in statistical hubris and make the next toss tails and I lose. Unlikely, but possible. Statistics doesn’t *prove* anything. In statistical parlance the “fair coin” hypothesis is called the “Null Hypothesis” and the go-make-a-bet threshold of 1% is called the “Confidence Level”.

Harmony. Almost. What is the probability that if I had a time series (of say global temperature since 1880) that the latest temperature is a new record. For example the average temperature in Australia in 2013 was a new record. The last average global temperature record was in 1998. I think it is trending upwards over time with some randomness attached. But there are all sort of random process which produce trends, some of which are equally likely to have produced a downward trending temperature graph. All I can really do, statistically speaking, is come up with a Null Hypothesis. In this case my Null Hypothesis is that the temperature doesn’t have a trend but is just the result of random chance. There are various technical measures to analyse this, but I have come up with one you can fit on the back of a beer coaster.

So my question is this: if the temperature readings are just i.i.d. random processes (i.i.d. stands for “independent and identically distributed”) and I have taken 134 of these (global temperature measurements 1880-2014) what is the probability the latest one is the maximum of them all? It turns out to be surprisingly easy to answer. If I have 134 random numbers then one of them must be the maximum. Obviously. Since they are iid I have no reason to believe it will be the first, second, third,…, or 134th. It is equally likely to be any one of those 134. So the probability that the 134th is the maximum is 1/134 = 0.75% (as it is equally likely that, say, the 42nd is the maximum). If I have T measurements then the probability that the latest is the maximum is 1/T. So when you hear that the latest global temperature is a maximum, and you don’t believe in global warming, then be surprised. As a corollary if someone says there hasn’t been a new maximum since 1998 then the probability of this still being true, 14 years later, is 1/14 = 7%.

So how many record years do we expect to have seen since 1880? Easy. Just add up the probability of the maximum (up to that point) having occurred in each year since 1880. So that would be H(T) = 1 + 1/2 + 1/3 + … + 1/T. This is known as the Harmonic Series. It is famous in mathematics because it almost, but doesn’t quite converge. For our purposes it can be well approximated by H(T) =0.5772+ ln(T) where ln is the natural logarithm, and 0.5772 is known as the Euler-Mascharoni constant.

So for T=134 we get from this simple beer-coaster sized formula: H(134) = 0.5772+ln(134)= 5.47. (You can calculate this by typing “0.5772+ln(134)” into your Google search box if you don’t have a scientific calculator to hand). In beer coaster terms 5.47 is approximately 6. So, given the Null Hypothesis (which is that there has been no statistically significant upward trend since 1880) how many record breaking years do we expect to have seen? Answer: less than 6. How many have we seen: 22.

**Global temperatures* – labelled with successive peaks**

If I was a betting man I would bet on global warming. But there will be no joy in being proven right.

*James rightly points out that the figure of 22 peak temperatures is well above the 6 you would expect to see under the Null Hypothesis. But just how unlikely is that high number? And, what would the numbers look like if we took a different Null Hypothesis such as a random walk? That will be the topic of another post, coming soon to the Stubborn Mule!*

* The global temperature “anomaly” represents the difference between observed temperatures and the average annual temperature between 1971 and 2000. Source: the National Climate Data Center (NCDC) of the National Oceanic and Atmospheric Administration (NOAA).

]]>