Ever wonder what the rules were for when to use thou, thee, ye, or you in Shakespeare or the King James Bible?

For example, the inscription on front of the Main Building at The University of Texas says

Ye shall know the truth and the truth shall make you free.

Why *ye* at the beginning and *you* at the end?

The latest episode of The History of English Podcast explains what the rules were and how they came to be. Regarding the UT inscription, *ye* was the subject form of the second person plural and *you* was the object form. Eventually *you* became used for subject and object, singular and plural.

The singular subject form was *thou* and the singular object form was *thee*. For example, the opening lines of Shakespeare’s Sonnet 18:

Shall I compare thee to a summer’s day?

Thou art more lovely and more temperate.

Originally the singular forms were intimate and the plural forms were formal. Only later did *thee* and *thou* take on an air of reverence or formality.

- Common Math Symbols in HTML, XML, TeX, and Unicode
- Accented letters in HTML, TeX, and Microsoft Word
- Greek letters in HTML, XML, TeX, and Unicode
- Unicode resources

See also blog posts tagged LaTeX, HTML, and Unicode and the Twitter account TeXtip.

**Last week**: C++ resources

**Next week**: Special functions

Unicode often counts the same symbol (glyph) as two or more different characters. For example, Ω is U+03A9 when it represents the Greek letter omega and U+2126 when it represents Ohms, the unit of electrical resistance. Similarly, M is U+004D when it’s used as a Latin letter but U+216F when it’s used as the Roman numeral for 1,000.

The purpose of such distinctions is to capture semantic differences. One example of how this could be useful is increased accessibility. A text-to-speech reader should pronounce things the same way people do. When such software sees “a 25 Ω resistor” it should say “a twenty five Ohm resistor” and not “a twenty five uppercase omega resistor,” just as a person would. [1]

Making text more accessible to the blind helps everyone else as well. For example, it makes the text more accessible to search engines as well. As Elliotte Rusty Harold points out in Refactoring HTML:

Wheelchair ramps are far more commonly used by parents with strollers, students with bicycles, and delivery people with hand trucks than they are by people in wheelchairs. When properly done, increasing accessibility for the disabled increases accessibility for everyone.

However, there are practical limits to how many semantic distinctions Unicode can make without becoming impossibly large, and so the standard is full of compromises. It can be quite difficult to decide when two uses of the same glyph should correspond to separate characters, and no standard could satisfy everyone.

***

[1] Someone may discover that when I wrote “a 25 Ω resistor” above, I actually used an Omega (Ω, U+03A9) rather than an Ohm character (Ω, U+2126). That’s because font support for Unicode is disappointing. If I had used the technically correct Ohm character, some people would not be able to see it. Ironically, this would make the text *less* accessible.

On my Android phone, I can see Ω (Ohm) but I cannot see Ⅿ (Roman numeral M) because the installed fonts have a glyph for the former but not the latter.

***

This post first appeared on Symbolism, a blog that I’ve now shut down.

]]>- IEEE floating-point exceptions in C++
- Unraveling Strings in Visual C++
- C++ TR1 regular expressions
- Random number generation in C++

See also posts tagged C++

**Last week**: R resources

**Next week**: HTML, TeX, and Unicode

If you see any problems with a post, please let me know. You could send me an email, or leave a comment on the post. (For a while I had comments automatically turn off on older posts, but I’ve disabled that. Now you can comment on any post.)

For the first couple years, this blog didn’t have many readers, and so not many people pointed out my errors. Now that there are more readers, I find out about errors more quickly. But I’ve found some egregious errors in some of the older posts.

Thanks for your contribution to this blog. I’ve been writing here for almost seven years, and I’ve benefited greatly from your input.

]]>- R language for programmers
- Default arguments and lazy evaluation in R
- Distributions in R
- Moving data between R and Excel via the clipboard
- Sweave: First steps toward reproducible analyses
- Troubleshooting Sweave
- Regular expressions in R

See also posts tagged Rstats.

I started the Twitter account RLangTip and handed it over the folks at Revolution Analytics.

**Last week**: Emacs resources

**Next week**: C++ resources

They will follow a Poisson distribution with an average of two per day. (Times are truncated to multiples of 5 minutes because my scheduling software requires that.)

]]>

- Heads
- Tails
- Equal probability of heads or tails.

Each is reasonable in its own context. The last answer is correct assuming the flips are independent and heads and tails are equally likely.

But as I argued here, if you see nothing but heads, you have reason to question the assumption that the coin is fair. So there’s some justification for the first answer.

The reasoning behind the second answer is that tails are “due.” This isn’t true if you’re looking at independent flips of a fair coin, but it could reasonable in other settings, such as sampling without replacement.

Say there are a number of coins on a table, covered by a cloth. A fixed number are on the table heads up, and a fixed number tails up. You reach under the cloth and slide a coin out. Every head you pull out increases the chances that the next coin will be tails. If there were an equal number of heads and tails under the cloth to being with, then after pulling out 10 heads tails are indeed more likely next time.

**Related post**: Long runs

- Emacs kill (cut) commands
- Emacs point (cursor) movement
- Getting started with Emacs on Windows
- Notes on Unicode in Emacs

See also the Twitter account UnixToolTip and blog posts tagged Emacs.

**Last week**: Miscellaneous math notes

**Next week**: R resources

- A big part of being a statistician is knowing what to do when your assumptions aren’t met, because they’re never exactly met.
- A lot of statisticians think time series analysis is voodoo, and he was inclined to agree with them.

It’s not too hard to create a table of sines at multiples of 3°. You can use the sum-angle formula for sines

sin(α+β) = sin α cos β + sin β cos α.

to bootstrap your way from known values to other values. Elementary geometry gives you the sines of 45° and 30°, and the sum-angle formula will then give you the sine of 75°. From Euclid’s construction of a 5-pointed star you can find the sine of 72°. Then you can use the sum-angle formula to find the sine of 3° from the sines of 75° and 72°. Ptolemy figured this out in the 2nd century AD.

But if you want a table of trig values at every degree, you need to find the sine of 1°. If you had that, you could bootstrap your way to every other integer number of degrees. Ptolemy had an approximate solution to this problem, but it wasn’t very accurate or elegant.

The Persian astronomer Jamshīd al-Kāshī had a remarkably clever solution to the problem of finding the sine of 1°. Using the sum-angle formula you can find that

sin 3θ = 3 sin θ – 4 sin^{3} θ.

Setting θ = 1° gives you a cubic equation for the unknown value of sin 1° involving the known value of sin 3°. However, the cubic formula wasn’t discovered until over a century after al-Kāshī. Instead, he used a numerical algorithm more widely useful than the cubic formula: finding a fixed point of an iteration!

Define *f*(*x*) = (sin 3° + 4*x*^{3})/3. Then sin 1° is a fixed point of *f*. Start with an approximate value for sin 1° — a natural choice would be (sin 3°)/3 — and iterate. Al-Kāshī used this procedure to compute sin 1° to 16 decimal places.

Here’s a little Python code to play with this algorithm.

from numpy import sin, deg2rad sin3deg = sin(deg2rad(3)) def f(x): return (sin3deg + 4*x**3)/3 x = sin3deg/3 for i in range(4): x = f(x) print(x)

This shows that after only three iterations the method has converged to floating point precision, which coincidentally is about 16 decimal places, the same as al-Kāshī’s calculation.

Source: Heavenly Mathematics: The Forgotten Art of Spherical Trigonometry

]]>This morning I ran across the etymology of the word:

In the late 1800s, the physicist Ludwig Boltzmann needed a word to express the idea that if you took an isolated system at constant energy and let it run, any one trajectory, continued long enough, would be representative of the system as a whole. Being a highly-educated nineteenth century German-speaker, Boltzmann knew far too much ancient Greek, so he called this the “ergodic property”, from

ergon“energy, work” andhodos“way, path.” The name stuck.

Found here, footnote on page 479.

Other etymological footnotes:

]]>

There’s not a good way to find these pages except through search. So I plan to categorize them and write a short post each Wednesday for the next few weeks listing some related pages. This post starts the series with math notes that didn’t fall into any other category.

- Big-O and related notation
- Notes on Spherical Trigonometry
- Solving quadratic congruences
- The difference between an unbiased estimator and a consistent estimator
- General binomial coefficients
- How to calculate binomial coefficients

See also posts tagged math.

Next week: Emacs resources

]]>… the famous

googol, 10^{100}(a 1 followed by 100 zeros), defined in 1929 by American mathematician Edward Kasner and named by his nine-year-old nephew, Milton Sirotta. Milton went even further and came up with thegoogolplex, now defined as 10^{googol}but initially defined by Milton as a 1, followed by writing zeros until you get tired.

**Related post:** There isn’t a googol of anything

The first from Princeton was The Best Writing on Mathematics 2014. My favorite chapters were *The Beauty of Bounded Gaps* by Jordan Ellenberg and *The Lesson of Grace in Teaching* by Francis Su. The former is a very high-level overview of recent results regarding gaps in prime numbers. The latter is taken from the Francis’ Haimo Teaching Award lecture. A recording of the lecture and a transcript are available here.

The second book from Princeton was a new edition of Andrew Hodges’ book Alan Turing: The Enigma. This edition has a new cover and the new subtitle “The Book That Inspired the Film ‘The Imitation Game.'” Unfortunately I’m not up to reading a 768-page biography right now.

The first book from No Starch Press was a new edition of The Book of CSS3: A Developer’s Guide to the Future of Web Design by Peter Gasston. The book says from the beginning that it is intended for people who have a lot of experience with CSS, including some experience with CSS 3. I tend to ignore such warnings; many books are more accessible to beginners than they let on. But in this case I do think that someone with more CSS experience would get more out of the book. This looks like a good book, and I expect I’ll get more out of it later.

The final book was a new edition of How Linux Works: What Every Superuser Should Know by Brian Ward. I’ve skimmed through this book and would like to go back and read it carefully, a little at a time. Most Unix/Linux books I’ve seen either dwell on shell commands or dive into system APIs. This one, however, seems to live up to its title and give the reader an introduction to how Linux works.

]]>Front:

Back:

Designed by my friend Scott Bronstad. Scott also designed the new look of the web site. (If something doesn’t look quite right, that’s probably my doing.)

]]>The Pareto principle would say that importance is usually very unevenly distributed. The universe is essentially hydrogen and helium, with a few other elements sprinkled in. From an earthly perspective things aren’t quite so extreme, but still a handful of elements make up the large majority of the planet. The most common elements are orders of magnitude more abundant than the least.

The uniformitarian view is a sort of default, not often a view someone consciously chooses. It’s a lazy option. No need to think. Just trudge ahead with no particular priorities.

The uniformitarian view is common in academia. You’re given a list of things to learn, and they all count the same. For example, maybe you have 100 vocabulary words in your Spanish class. Each word contributes one point to your grade on a quiz. The quiz measures what portion of the *list* you’ve learned, not what portion of that *language* you’ve learned. A quiz designed to test the latter would weigh words according to their frequency.

It’s easy to slip into a uniformitarian mindset, or a milder version of the same, underestimating how unevenly things are distributed. I’ve often fallen into the latter. I expect things to be unevenly distributed, but then I’m surprised just how uneven they are once I look at some data.

**Related posts**:

1/7 = 0.142857142857… 2/7 = 0.285714285714… 3/7 = 0.428571428571… 4/7 = 0.571428571428… 5/7 = 0.714285714285… 6/7 = 0.857142857142…

We can make the pattern more clear by vertically aligning the sequences of digits:

1/7 = 0.142857142857… 2/7 = 0.2857142857… 3/7 = 0.42857142857… 4/7 = 0.57142857… 5/7 = 0.7142857… 6/7 = 0.857142857…

Are there more cyclic fractions like that? Indeed there are. Another example is 1/17. The following shows that 1/17 is cyclic:

1/17 = 0.05882352941176470588235294117647… 2/17 = 0.1176470588235294117647… 3/17 = 0.176470588235294117647… 4/17 = 0.2352941176470588235294117647… 5/17 = 0.2941176470588235294117647… 6/17 = 0.352941176470588235294117647… 7/17 = 0.41176470588235294117647… 8/17 = 0.470588235294117647… 9/17 = 0.52941176470588235294117647… 10/17 = 0.5882352941176470588235294117647… 11/17 = 0.6470588235294117647… 12/17 = 0.70588235294117647… 13/17 = 0.76470588235294117647… 14/17 = 0.82352941176470588235294117647… 15/17 = 0.882352941176470588235294117647… 16/17 = 0.941176470588235294117647…

The next denominator to exhibit this pattern is 19. After finding 17 and 19 by hand, I typed “7, 17, 19″ into the Online Encyclopedia of Integer Sequences found a list of denominators of cyclic fractions: OEIS A001913. These numbers are called “full reptend primes” and according to MathWorld “No general method is known for finding full reptend primes.”

]]>Hello-world programs are often intimidating. People think “I must be a dufus because I find hello-world hard. At this rate I’ll never get to anything interesting.”

The problem is that we confuse the *first* task with the *easiest* task. Hello-world programs are almost completely arbitrary. You can’t deduce what a compiler is named, where files must be located, how they must be formatted, etc. You have to be told. The amount of arbitrary material you need to learn is greatest up-front and slowly decreases.

When I started programming I thought I’d quickly get past the hello-world stage and only write substantial programs from then on. Instead, it seems I’ve spent a good chunk of my career writing hello-world programs with no end in sight.

***

No discussion of hello-world programs would be complete without mentioning possibly the most intimidating hello-world program: the first Windows program in Charles Petzold’s Programming Windows book. I was only able to find the program from the Windows 98 edition of his book. I don’t recall how it differs much from the program in his first edition, but I vaguely remember the original being worse.

/*------------------------------------------------------------ HELLOWIN.C -- Displays "Hello, Windows 98!" in client area (c) Charles Petzold, 1998 ------------------------------------------------------------*/ #include <windows.h> LRESULT CALLBACK WndProc (HWND, UINT, WPARAM, LPARAM) ; int WINAPI WinMain (HINSTANCE hInstance, HINSTANCE hPrevInstance, PSTR szCmdLine, int iCmdShow) { static TCHAR szAppName[] = TEXT ("HelloWin") ; HWND hwnd ; MSG msg ; WNDCLASS wndclass ; wndclass.style = CS_HREDRAW | CS_VREDRAW ; wndclass.lpfnWndProc = WndProc ; wndclass.cbClsExtra = 0 ; wndclass.cbWndExtra = 0 ; wndclass.hInstance = hInstance ; wndclass.hIcon = LoadIcon (NULL, IDI_APPLICATION) ; wndclass.hCursor = LoadCursor (NULL, IDC_ARROW) ; wndclass.hbrBackground = (HBRUSH) GetStockObject (WHITE_BRUSH) ; wndclass.lpszMenuName = NULL ; wndclass.lpszClassName = szAppName ; if (!RegisterClass (&wndclass)) { MessageBox (NULL, TEXT ("This program requires Windows NT!"), szAppName, MB_ICONERROR) ; return 0 ; } hwnd = CreateWindow (szAppName, // window class name TEXT ("The Hello Program"), // window caption WS_OVERLAPPEDWINDOW, // window style CW_USEDEFAULT, // initial x position CW_USEDEFAULT, // initial y position CW_USEDEFAULT, // initial x size CW_USEDEFAULT, // initial y size NULL, // parent window handle NULL, // window menu handle hInstance, // program instance handle NULL) ; // creation parameters ShowWindow (hwnd, iCmdShow) ; UpdateWindow (hwnd) ; while (GetMessage (&msg, NULL, 0, 0)) { TranslateMessage (&msg) ; DispatchMessage (&msg) ; } return msg.wParam ; } LRESULT CALLBACK WndProc (HWND hwnd, UINT message, WPARAM wParam, LPARAM lParam) { HDC hdc ; PAINTSTRUCT ps ; RECT rect ; switch (message) { case WM_CREATE: PlaySound (TEXT ("hellowin.wav"), NULL, SND_FILENAME | SND_ASYNC) ; return 0 ; case WM_PAINT: hdc = BeginPaint (hwnd, &ps) ; GetClientRect (hwnd, &rect) ; DrawText (hdc, TEXT ("Hello, Windows 98!"), -1, &rect, DT_SINGLELINE | DT_CENTER | DT_VCENTER) ; EndPaint (hwnd, &ps) ; return 0 ; case WM_DESTROY: PostQuitMessage (0) ; return 0 ; } return DefWindowProc (hwnd, message, wParam, lParam) ; }]]>

- CSS / responsive design
- WordPress customization
- Emacs customization
- Advanced LaTeX
- Data cleaning and visualization
- Python (miscellaneous automation scripts)

I don’t have an immediate project to outsource, but these tasks come up occasionally and I’d like to have someone to contact when they do. Mostly these would be small self-contained projects, though data cleaning and visualization could be larger.

]]>

Obviously the intended message is that scalpels are better than Swiss Army Knives. Certainly the scalpel looks simpler.

But most people would rather have a Swiss Army Knife than a scalpel. Many people, myself included, own a Swiss Army Knife but not a scalpel. (I also have a Letherman multi-tool that the folks at Snow gave me and I like it even better than my Swiss Army Knife.)

People like simplicity, at least a certain kind of simplicity, more in theory than in practice. Minimalist products that end up in the MoMA generally don’t fly off the shelves at Walmart.

The simplicity of a scalpel is superficial. The realistic alternative to a Swiss Army Knife, for ordinary use, is a knife, two kinds of screwdriver, a bottle opener, etc. The Swiss Army Knife is the simpler alternative in that context.

A surgeon would rightfully prefer a scalpel, but not just a scalpel. A surgeon would have a tray full of specialized instruments, collectively more complicated than a Swiss Army Knife.

I basically agree with the Unix philosophy that tools should do one thing well, but even Unix doesn’t follow this principle strictly in practice. One reason is that “thing” and “well” depend on context. The “thing” that a toolmaker has in mind may not exactly be the “thing” the user has in mind, and the user may have a different idea of when a tool has served well enough.

]]>In particular, the URL http://johndcook.com/blog may take you to the new home page rather than the latest blog post, at least temporarily.

If you subscribe via email or RSS posts will come to you as usual; you shouldn’t notice any changes.

]]>

Perhaps in reaction to knee-jerk antipathy toward Bayesian methods, some statisticians have adopted knee-jerk enthusiasm for Bayesian methods. Everything’s better with Bayesian analysis on it. Bayes makes it better, like a little dab of margarine on a dry piece of bread.

There’s much that I prefer about the Bayesian approach to statistics. Sometimes it’s the only way to go. But Bayes-for-the-sake-of-Bayes can expend a great deal of effort, by human and computer, to arrive at a conclusion that could have been reached far more easily by other means.

**Related**: Bayes isn’t magic

Image via Gallery of Graphic Design

]]>This is a variation on a problem I’ve blogged about before. As I pointed out there, we can assume without loss of generality that the samples come from the unit interval. Then the sample range has a beta(*n* – 1, 2) distribution. So the probability that the sample range is greater than a value *c* is

Setting *c* = 0.9, here’s a plot of the probability that the sample range contains at least 90% of the population range, as a function of sample size.

The answer to the question at the top of the post is 16 or 17. These two values of *n* yield probabilities 0.485 and 0.518 respectively. This means that a fairly small sample is likely to give you a fairly good estimate of the range.

Since the range of integration is symmetric around zero, you might think to see whether the integrand is an odd function, in which case the integral would be zero. (More on such symmetry tricks here.) Unfortunately, the integrand is not odd, so that trick doesn’t work directly. However, it does help indirectly.

You can split any function *f*(*x*) into its even and odd parts.

The integral of a function over a symmetric interval is the integral of its even part because its odd part integrates to zero. The even part of the integrand above works out to be simply cos(*x*)/2 and so the integral evaluates to sin(1).

Strictly speaking, a professional in some area is simply someone who is paid to do it. But informally, we think of a professional as someone who not only is paid for their services, they’re also good at what they do. The two ideas are not far apart. People who are paid to do something are usually good at it, and the fact that they are paid is evidence that they know what they’re doing.

Experts, however, are not always so pleasant to work with.

Anyone can call himself an expert, and there’s no objective way to test this claim. But it’s usually obvious whether someone is a professional. When you walk into a barber shop, for example, it’s safe to assume the people standing behind the chairs are professional barbers.

Often the categories of “professional” and “expert” overlap. But it is suspicious when someone is an expert and not a professional. It implies that their knowledge is theoretical and untested. If someone says she is an expert in the stock market but not an investor, I wouldn’t ask her to invest my money. When I need my house painted, I don’t want to hire an expert on paint, I want a professional painter.

Sometimes experts appear to be professionals though they are not. Their expertise is in one area but their profession is something else. Political pundits are not politicians but journalists and entertainers. Heads of scientific agencies are not scientists but administrators. University presidents are not educators or researchers but fundraisers. In each case they may have once been practitioners in their perceived areas of expertise, though not necessarily.

**Related posts**:

**You can’t divide 3 by 4** (inside the ring of integers, but you can inside the rational numbers).

**You can’t take the square root of a negative number** (in the real numbers, but in the complex numbers you can, once you pick a branch of the square root function).

**You can’t divide by zero** (in the field of real numbers, but you may be able to do something that could informally be referred to as dividing by zero, depending on the context, by reformulating your statement, often in terms of limits).

When people say a thing cannot be done, they may mean it cannot be done in some assumed context. They may mean that the thing is difficult, and assume that the listener is sophisticated enough to interpret their answer as hyperbole. Maybe they mean that they don’t know how to do it and presume it can’t be done.

When you hear that something can’t be done, it’s worth pressing to find out in what sense it can’t be done.

**Related post**: How to differentiate a non-differentiable function

“What will happen when you’re done with this project?”

“I don’t know. Maybe not much. Maybe great things.”

“How great? What’s the best outcome you could reasonably expect?”

“Hmm … Not that great. Maybe I should be doing something else.”

It’s a little paradoxical to think that asking an optimistic question — What’s the best thing that could happen? — could discourage us from continuing to work on a project, but it’s not too hard to see why this is so. As long as the outcome is unexamined, we can implicitly exaggerate the upside potential. When we look closer, reality may come shining through.

** Related posts**:

I bought Marshall Goldsmith’s book by that title shortly after it came out in 2007. As much as I liked the title, I was disappointed by the content and didn’t finish it. I don’t remember much about it, only that it wasn’t what I expected. Maybe it’s a good book — I’ve heard people say they like it — but it wasn’t a good book for me at the time.

***

I’ve written before about The Medici Effect, a promising title that didn’t live up to expectations.

***

“Standardized Minds” is a great book title. I haven’t read the book; I just caught a glimpse of the cover somewhere. Maybe it lives up to its title, but the title says so much.

There is a book by Peter Sacks Standardized Minds: The High Price Of America’s Testing Culture And What We Can Do To Change It. Maybe that’s the book I saw, though it’s possible that someone else wrote a book by the same title. I can’t say whether I recommend the book or not since I haven’t read it, but I like the title.

***

I started to look for more examples of books that didn’t live up to their titles by browsing my bookshelves. But I quickly gave up on that when I realized these are exactly the kinds of books I get rid of.

What are some books with great titles but disappointing content?

]]>