I do use software to schedule my tweets in advance. Most of the tweets from my personal account are live. Most of the tweets from my topic accounts are scheduled, though some are live. All replies are manual, not automated, and I don’t scrape content from anywhere.

Occasionally I read the responses to these accounts and sometimes I reply. But with over half a million followers (total, not unique) I don’t try to keep up with all the responses. If you’d like to contact me, you can do so here. That I do keep up with.

]]>

]]>… I said that if science could come up with something like the Jump it could surely solve a problem like that. Severin seized hold of that word, “science.” Science, he said, is not some mysterious larger-than-life force, it’s just the name we give to bright ideas that individual guys have when they’re lying in bed at night, and that if the fuel thing bothered me so much, there was nothing stopping me from having a bright idea to solve it …

Although I completely agree that “algorithmic wizardry” is over-rated in general, my personal experience has been a little different. My role on projects has frequently been to supply a little bit of algorithmic wizardry. I’ve often been asked to look into a program that is taking too long to run and been able to speed it up by an order of magnitude or two by improving a numerical algorithm. (See an example here.)

James Hague says that “rarely is there some … algorithm that casts a looming shadow over everything else.” I believe he is right, though I’ve been called into projects precisely on those rare occasions when an algorithm *does* cast a shadow over everything else.

Here are some of the most popular posts on this site and some other things I’ve written.

If you’d like to subscribe to this site you can do so by RSS or email. I also have a monthly newsletter.

You can find out more about me and my background here.

You can also find me on Twitter and Google+.

If you have any questions or comments, here’s my contact info.

]]>When it comes to writing code, the number one most important skill is how to keep a tangle of features from collapsing under the weight of its own complexity. I’ve worked on large telecommunications systems, console games, blogging software, a bunch of personal tools, and very rarely is there some tricky data structure or algorithm that casts a looming shadow over everything else. But there’s always lots of state to keep track of, rearranging of values, handling special cases, and carefully working out how all the pieces of a system interact. To a great extent the act of coding is one of organization. Refactoring. Simplifying. Figuring out how to remove extraneous manipulations here and there.

Algorithmic wizardry is easier to teach and easier to blog about than organizational skill, so we teach and blog about it instead. A one-hour class, or a blog post, can showcase a clever algorithm. But how do you present a clever bit of organization? If you jump to the solution, it’s unimpressive. “Here’s something simple I came up with. It may not look like much, but trust me, it was really hard to realize this was all I needed to do.” Or worse, “Here’s a moderately complicated pile of code, but you should have seen how much more complicated it was before. At least now someone stands a shot of understanding it.” Ho hum. I guess you had to be there.

You can’t appreciate a feat of organization until you experience the disorganization. But it’s hard to have the patience to wrap your head around a disorganized mess that you don’t care about. Only if the disorganized mess is your responsibility, something that means more to you than a case study, can you wrap your head around it and appreciate improvements. This means that while you can learn algorithmic wizardry through homework assignments, you’re unlikely to learn organization skills unless you work on a large project you care about, most likely because you’re paid to care about it.

**Related posts**:

]]>

AI: From “It’s so horrible how little progress has been made” to “It’s so horrible how much progress has been made” in one step.

When I read this I thought of Pandora (the mythical figure, not the music service).

“Are you still working on opening that box? Any progress?”

“No, the lid just … won’t … budge … Oh wait, I think I got it.”

**Related post**: Why the robots aren’t coming in the way you expect by Mark Burgess

I believe that reading only packaged microwavable fiction ruins the taste, destabilizes the moral blood pressure, and makes the mind obese.

I agree with that. That’s why I shop at Amazon.

If I liked to read best-selling junk food, I could find it at any bookstore. But I like to read less popular books, books I can only find from online retailers like Amazon. If fact, most of Amazon’s revenue comes from obscure books, not bestsellers.

Suppose I want to read something by, I don’t know, say, Ursula K. Le Guin. I doubt I could find a copy of any of her books, certainly not her less popular books, within 20 miles of my house, and I live in the 4th largest city in the US. There’s nothing by her in the closest Barnes and Noble. But I could easy find anything she’s ever written on Amazon.

If you’d like to support Amazon so they can continue to bring us fine authors like Ursula K. Le Guin, authors you can’t find in stores that mostly sell packaged microwavable fiction, you can buy one of the books mentioned on this blog from Amazon.

]]>Equations are typically applied left to right. When you write *A* = *B* you imply that it may be useful to replace *A* with *B*. This is helpful to keep in mind when learning something new: the order in which an equation is written gives a hint as to how it may be applied. However, this way of thinking can also be a limitation. Clever applications often come from realizing that you can apply an equation in the opposite of the usual direction.

For example, Euler’s reflection formula says

Γ(*z*) Γ(1-*z*) = π / sin(π*z*).

Reading from left to right, this says that two unfamiliar/difficult things, values of the Gamma function, are related to a more familiar/simple thing, the sine function. It would be odd to look at this formula and say “Great! Now I can compute sines if I just know values of the Gamma function.” Instead, the usual reaction would be “Great! Now I can relate the value of Gamma at two different places by using sines.”

When we see Einstein’s equation

*E* = *mc*^{2}

the first time, we think about creating energy from matter, such as the mass lost in nuclear fission. This applies the formula from left to right, relating what we want to know, an amount of energy, to what we do know, an amount of mass. But you could also read the equation from right to left, calculating the amount of energy, say in an accelerator, necessary to create a particle of a given mass.

Calculus textbooks typically have a list of equations, either inside the covers or in an appendix, that relate an integral on the left to a function or number on the right. This makes sense because calculus students compute integrals. But mathematicians often apply these equations in the opposite direction, replacing a number or function with an integral. To a calculus student this is madness: why replace a familiar thing with a scary thing? But integrals aren’t scary to mathematicians. Expressing a function as an integral is often progress. Properties of a function may be easier to see in integral form. Also, the integral may lend itself to some computational technique, such as reversing the order of integration in a double integral, or reversing the order to taking a limit and an integral.

Calculus textbooks also have lists of equations involving infinite sums, the summation always being on the left. Calculus students want to replace the scary thing, the infinite sum, with the familiar thing, the expression on the right. Generating functions turn this around, wanting to replace things with infinite sums. Again this would seem crazy to a calculus student, but it’s a powerful problem solving technique.

Differential equation students solve differential equations. They want to replace what they find scary, a differential equation, with something more familiar, a function that satisfies the differential equation. But mathematicians sometimes want to replace a function with a differential equation that it satisfies. This is common, for example, in studying special functions. Classical orthogonal polynomials satisfy 2nd order differential equations, and the differential equation takes a different form for different families of orthogonal polynomials. Why would you want to take something as tangible and familiar as a polynomial, something you might study as a sophomore in high school, and replace it with something as abstract and mysterious as a differential equation, something you might study as a sophomore in college? Because some properties, properties that you would not have cared about in high school, are more clearly seen via the differential equations.

]]>The difference between pure functional languages and traditional imperative languages is not quite that simple in practice.

Programming with pure functions is conceptually easy but can be awkward in practice. You could just pass each function the state of the world before the call, and it returns the state of the world after the call. It’s unrealistic to pass a program’s entire state as an argument each time, so you’d like to pass just that state that you need to, and have a convenient way of doing so. You’d also like the compiler to verify that you’re only passing around a limited slice of the world. That’s where **monads** come in.

Suppose you want a function to compute square roots and log its calls. Your square root function would have to take two arguments: the number to find the root of, and the state of the log before the function call. It would also return two arguments: the square root, and the updated log. This is a pain, and it makes function composition difficult.

**Monads provide a sort of side-band** for passing state around, things like our function call log. You’re still passing around the log, but you can do it implicitly using monads. This makes it easier to call and compose two functions that do logging. It also lets the compiler check that you’re passing around a log but not arbitrary state. A function that updates a log, for example, can effect the state of the log, but it can’t do anything else. It can’t launch missiles.

Once monads get large and complicated, it’s hard to know what side effects they hide. **Maybe they can launch missiles after all**. You can only be sure by *studying the source code*. Now how do you know that calling a C function, for example, doesn’t launch missiles? You *study the source code*. In that sense Haskell and C aren’t entirely different.

The Haskell compiler does give you assurances that a C compiler does not. But ultimately you have to study source code to know what a function does and does not do.

**Related post**: Monads are hard because …

The curve is the plot of exp(*it*) – exp(6*it*)/2 + *i* exp(-14*it*)/3 with *t* running from 0 to 2π.

Here’s Python code to draw the curve.

import matplotlib.pyplot as plt from numpy import pi, exp, real, imag, linspace def f(t): return exp(1j*t) - exp(6j*t)/2 + 1j*exp(-14j*t)/3 t = linspace(0, 2*pi, 1000) plt.plot(real(f(t)), imag(f(t))) # These two lines make the aspect ratio square fig = plt.gcf() fig.gca().set_aspect('equal') plt.show()

Maybe there’s a more direct way to plot curves in the complex plane rather than taking real and imaginary parts.

Updated code for the aspect ratio per Janne’s suggestion in the comments.

**Related posts**:

Several people have been making fun visualizations that generalize the example above.

Brent Yorgey has written two posts, one choosing frequencies randomly and another that animates the path of a particle along the curve and shows how the frequency components each contribute to the motion.

Mike Croucher developed a Jupyter notebook that lets you vary the frequency components with sliders.

John Golden created visualizations in GeoGerba here and here.

Jennifer Silverman showed how these curves are related to decorative patterns that popular in the 1960’s. She also created a coloring book and a video.

Dan Anderson accused me of nerd sniping him and created this visualization.

]]>- Business
- Clinical trials
- Computing
- Creativity
- Graphics
- Machine learning
- Math
- Music
- Python
- Science
- Software development
- Statistics
- Typography
- Misc

You can also subscribe to my Twitter feeds via RSS if you’d like.

]]>Finding Windows ports of Unix utilities is easy. The harder part is finding a shell that behaves as expected. (Of course “as expected” depends on your expectations!)

There have been many projects to port Unix utilities to Windows, particularly GnuWin32 and Gow. Some of the command shells I’ve tried are:

- Cmd
- PowerShell
- Eshell
- Bash
- Clink

I’d recommend the combination of Gow and Clink for most people. If you’re an Emacs power user you might like Eshell.

The built-in command line on Windows is `cmd`

. It’s sometimes called the “DOS prompt” though that’s misleading. DOS died two decades ago and the `cmd`

shell has improved quite a bit since then.

`cmd`

has some features you might not expect, such as `pushd`

and `popd`

. However, I don’t believe it has anything analogous to `dirs`

to let you see the directory stack.

PowerShell is a very sophisticated scripting environment, but the interactive shell itself (e.g. command editing functionality) is basically `cmd`

. (I haven’t kept up with PowerShell and that may have changed.) This means that writing a PowerShell script is completely different from writing a batch file, but the experience of navigating the command line is essentially the same as `cmd`

.

You can run shells inside Emacs. By default, `M-x shell`

brings up a `cmd`

prompt inside an Emacs buffer. You can also use Emacs’ own shell with the command `M-x eshell`

.

Eshell is a shell implemented in Emacs Lisp. Using Eshell is very similar across platforms. On a fresh Windows machine, with nothing like Gow installed, Eshell provides some of the most common Unix utilities. You can use the `which`

command to see whether you’re using a native executable or Emacs Lisp code. For example, if you type `which ls`

into Eshell, you get the response

eshell/ls is a compiled Lisp function in `em-ls.el'

The primary benefit of Eshell is that provides integration with Emacs. As the documentation says

Eshell is

nota replacement for system shells such as`bash`

or`zsh`

. Use Eshell when you want to move text between Emacs and external processes …

Eshell does not provide some of the command editing features you might expect from `bash`

. But the reason for this is clear: if you’re inside Emacs, you’d want to use the full power of Emacs editing, not the stripped-down editing features of a command line. For example, you cannot use `^foo^bar`

to replace `foo`

with `bar`

in the previous command. Instead, you could retrieve the previous command and edit it just as you edit any other line inside Emacs.

In `bash`

you can use `!^`

to recall the first argument of the previous command and `!$!$`

using `$_`

instead. Many of the other `bash`

shortcuts that begin with `!`

work as expected: `!foo`

, `!!`

, `!-3`

, etc. Directory navigation commands like `cd -`

, `pushd`

, and `popd`

work as in `bash`

.

Gow comes with a `bash`

shell, a Windows command line program that creates a `bash`

-like environment. I haven’t had much experience with it, but it seems to be a faithful bash implementation with few compromises for Windows, for better and for worse. For example, it doesn’t understand backslashes as directory separators.

There are other implementations of `bash`

on Windows, but I either haven’t tried them (e.g. win-bash) or have had bad experience with them (e.g. Cygwin).

Clink is not a shell *per se* but an extension to `cmd`

. It adds the functionality of the Gnu `readline`

library to the Windows command line and so you can use all the Emacs-like editing commands that you can with `bash`

: Control-a to move to the beginning of a line, Control-k to delete the rest of a line, etc.

Clink also gives you Windows-like behavior that Windows itself doesn’t provide, such as being able to paste text onto the command line with Control-v.

I’ve heard that Clink will work with PowerShell, but I was not able to make it work.

The command editing and history shortcuts beginning with `!`

mentioned above all work with Clink, as do substitutions like `^foo^bar`

.

In my opinion, the combination of Gow and Clink gives a good compromise between a Windows and Unix work environment. And if you’re running Windows, a compromise is probably what you want. Otherwise, simply run a (possibly virtual) Linux machine. Attempts to make Windows too Unix-like run down an uncanny valley where it’s easy to waste a lot of time.

]]>In some contexts software is regulated but data is not, or at least software comes under different regulations than data. For example, maybe you have to maintain test records for software but not for data.

Suppose as part of some project you need to search for files containing the word “apple” and you use the command line utility `grep`

. The text “apple” is data, input to the `grep`

program. Since grep is a widely used third party tool, it doesn’t have to be validated, and you haven’t written any code.

Next you need to search for “apple” and “Apple” and so you search on the regular expression “[aA]pple” rather than a plain string. Now is the regular expression “[aA]pple” code? It’s at least a tiny step in the direction of code.

What about more complicated regular expressions? Regular expressions are equivalent to deterministic finite automata, which sure seem like code. And that’s only regular expressions as originally defined. The term “regular expression” has come to mean more expressive patterns. Perl regular expressions can even contain arbitrary Perl code.

In practice we can agree that certain things are “code” and others are “data,” but there are gray areas where people could sincerely disagree. And someone wanting to be argumentative could stretch this gray zone to include everything. One could argue, for example, that all software is data because it’s input to a compiler or interpreter.

You might say “data is what goes into a database and code is what goes into a compiler.” That’s a reasonable rule of thumb, but databases can store code and programs can store data. Programmers routinely have long discussions about what belongs in a database and what belongs in source code. Throw regulatory considerations into the mix and there could be incentives to push more code into the database or more data into the source code.

* * *

See Slava Akhmechet’s essay The Nature of Lisp for a longer discussion of the duality between code and data.

]]>This is a thumbnail version of a large, high-resolution image by Ulysse Carion. Thanks to Aleksey Shipilëv (@shipilev) for pointing it out.

It’s hard to see in the thumbnail, but the map gives the change in velocity needed at each branch point. You can find the full 2239 x 2725 pixel image here or click on the thumbnail above.

]]>This decomposition is unique if you impose the extra requirement that consecutive Fibonacci numbers are not allowed. [1] It’s easy to see that the rule against consecutive Fibonacci numbers is necessary for uniqueness. It’s not as easy to see that the rule is sufficient.

Every Fibonacci number is itself the sum of two consecutive Fibonacci numbers—that’s how they’re defined—so clearly there are at least two ways to write a Fibonacci number as the sum of Fibonacci numbers, either just itself or its two predecessors. In the example above, 8 = 5 + 3 and so you could write 10 as 5 + 3 + 2.

The *n*th Fibonacci number is approximately φ^{n}/√5 where φ = 1.618… is the golden ratio. So you could think of a Fibonacci sum representation for *x* as roughly a base φ representation for √5*x*.

You can find the Fibonacci representation of a number *x* using a greedy algorithm: Subtract the largest Fibonacci number from *x* that you can, then subtract the largest Fibonacci number you can from the remainder, etc.

Programming exercise: How would you implement a function that finds the largest Fibonacci number less than or equal to its input? Once you have this it’s easy to write a program to find Fibonacci representations.

* * *

[1] This is known as Zeckendorf’s theorem, published by E. Zeckendorf in 1972. However, C. G. Lekkerkerker had published the same result 20 years earlier.

]]>Some say they enjoy the blog, but I post more often than they care to keep up with, particularly if they’re only interested in the non-technical posts.

Others have said they’d like to know more about my consulting business. There are some interesting things going on there, but I’d rather not write about them on the blog.

The newsletter will address both of these groups. I’ll highlight a few posts from the blog, some technical and some not, and I’ll say a little about what I’ve been up to.

If you’d like to receive the newsletter, you can sign up here.

I won’t share your email address with anyone and you can unsubscribe at any time.

]]>… we should clarify. From what, or whom, are we hiding information?

[T]raditional languages … bend over backwards to ensure that modules hide internal routines and data structures from other modules. The goal is to achieve module independence (a minimum coupling). The fear seems to be that modules strive to

attack each other like alien antibodies. Or else, thatevil bands of marauding modulesare out to clobber the precious family data structures.This is not what we’re concerned about. The purpose of hiding information, as we mean it, is simply to minimize the effects of a possible design-change by localizing things that might change within each component.

Quote from Thinking Forth. Emphasis added.

]]>

The sample code uses PyPDF2. I’m using Conda for my Python environment, and PyPDF2 isn’t directly available for Conda. I searched Binstar with

`binstar search -t conda pypdf2`

The first hit was from JimInCO, so I installed PyPDF2 with

`conda install -c https://conda.binstar.org/JimInCO pypdf2`

I scanned a few pages from a book to PDF, turning the book around every other page, so half the pages in the PDF were upside down. I needed a script to rotate the even numbered pages. The script counts pages from 0, so it rotates the *odd* numbered pages from its perspective.

import PyPDF2 pdf_in = open('original.pdf', 'rb') pdf_reader = PyPDF2.PdfFileReader(pdf_in) pdf_writer = PyPDF2.PdfFileWriter() for pagenum in range(pdf_reader.numPages): page = pdf_reader.getPage(pagenum) if pagenum % 2: page.rotateClockwise(180) pdf_writer.addPage(page) pdf_out = open('rotated.pdf', 'wb') pdf_writer.write(pdf_out) pdf_out.close() pdf_in.close()

It worked as advertised on the first try.

]]>- AlgebraFact
- AnalysisFact
- CompSciFact
- Diff_eq
- MedVocab
- NetworkFact
- PerlRegex
- ProbFact
- RegexTip
- ScienceTip
- SciPyTip
- StatFact
- TeXtip
- TopologyFact
- UnitFact
- UnixToolTip

If you would like to subscribe to more Twitter accounts via RSS, you could subscribe to the BazQux service and create a custom RSS feed for whatever Twitter, Google+, or Facebook accounts you’d like to follow.

]]>Rosenzweig discusses experiments designed to study decision making. In order to make clean comparisons, subjects are presented with discrete choices over which they have no control. They cannot look for more options or exercise any other form of agency. The result is an experiment that is easy to analyze and easy to publish, but so unrealistic as to tell us little about real-world decision making.

In his book Left Brain, Right Stuff, Rosenzweig quotes Philip Tetlock’s summary:

]]>Much mischief can be wrought by transplanting this hypothesis-testing logic, which flourishes in controlled lab settings, into the hurly-burly of real-world settings where

ceteris paribusnever is, and never can be, satisfied.

In his new book How to Fly a Horse, Kevin Ashton says that the Mozart story above is a myth based on a forged letter. According to Ashton,

Mozart’s real letters—to his father, to his sister, and to others—reveal his true creative process. He was exceptionally talented, but he did not write by magic. He sketched his compositions, revised them, and sometimes got stuck. He could not work without a piano or harpsichord. He would set work aside and return to it later. … Masterpieces did not come to him complete in uninterrupted streams of imagination, nor without an instrument, nor did he write them whole and unchanged. The letter is not only forged, it is false.

**Related posts**:

]]>

Simplifying fractions sometimes makes things clearer, but not always. It depends on context, and context is something students don’t understand at first. So it makes sense to be pedantic at some stage, but then students need to learn that **clear communication trumps pedantic conventions**.

Along these lines, there is a old taboo against having radicals in the denominator of a fraction. For example, 3/√5 is not allowed and should be rewritten as 3√5/5. This is an arbitrary convention now, though there once was a practical reason for it, namely that in hand calculations it’s easier to multiply by a long fraction than to divide by it. So, for example, if you had to reduce 3/√5 to a decimal in the old days, you’d look up √5 in a table to find it equals 2.2360679775. It would be easier to compute 0.6*2.2360679775 by hand than to compute 3/2.2360679775.

As with unreduced fractions, radicals in the denominator might be not only mathematically equivalent but psychologically preferable. If there’s a 3 in some context, and a √5, then it may be clear that 3/√5 is their ratio. In that same context someone may look at 3√5/5 and ask “Where did that factor of 5 in the denominator come from?”

A possible justification for rules above is that they provide standard forms that make grading easier. But this is only true for the simplest exercises. With moderately complicated exercises, following a student’s work is harder than determining whether two expressions represent the same number.

One final note on pedantic arithmetic rules: If the order of operations isn’t clear, make it clear. Add a pair of parentheses if you need to. Or write division operations as one thing above a horizontal bar and another below, not using the division symbol. Then you (and your reader) don’t have to worry whether, for example, multiplication has higher precedence than division or whether both have equal precedence and are carried out left to right.

]]>Here’s a percolation problem for QR codes: What is the probability that there is a path from one side of a QR code to the opposite side? How far across a QR code would you expect to be able to go? For example, the QR code below was generated from my contact information. It’s not possible to go from one side to the other, and the red line shows what I believe is the deepest path into the code from a side.

This could make an interesting programming exercise. A simple version would be to start with a file of bits representing a particular QR code and find the deepest path into the corresponding image.

The next step up would be to generate simplified QR codes, requiring certain bits to be set, such as the patterns in three of the four corners that allow a QR reader to orient itself.

The next step in sophistication would be to implement the actual QR encoding algorithm, including its error correction encoding, then use this to encode random data.

(Because of the error correction used by QR codes, you could scan the image above and your QR reader would ignore the red path. It would even work if a fairly large portion of the image were missing because the error correction introduces a lot of redundancy.)

]]>You could say that an empty sum is 0 because 0 is the additive identity and an empty product is 1 because 1 is the multiplicative identity. If you’d like a simple answer, maybe you should stop reading here.

The problem with the answer above is that it doesn’t say why an operation on an empty set should be defined to be the identity for that operation. The identity is certainly a plausible candidate, but why should it make sense to even define an operation on an empty set, and why should the identity turn out so often to be the definition that makes things proceed smoothly?

The convention that the sum over an empty set should be defined as 0, and that a product over an empty set should be defined to be 1 works well in very general settings where “sum”, “product”, “0”, and “1” take on abstract meanings.

The **ultimate generalization** of products is the notion of products in category theory. Similarly, the ultimate generalization of sums is categorical co-products. (Co-products are sometimes called sums, but they’re usually called co-products due to a symmetry with products.) Category theory simultaneously addresses a wide variety of operations that could be called products or sums (co-products).

The particular advantage of bringing category theory into this discussion is that it has definitions of product and co-product that are the same for any number of objects, including zero objects; there is no special definition for empty products. Empty products and co-products are a **consequence** of a more general definition, **not special cases** defined by convention.

In the category of sets, products are Cartesian products. The product of a set with *n* elements and one with *m* elements is one with *nm* elements. Also in the category of sets, co-products are disjoint unions. The co-product of a set with *n* elements and one with *m* elements is one with *n+m* elements. These examples show a connection between products and sums in arithmetic and products and co-products in category theory.

You can find the full definition of a categorical product here. Below I give the definition leaving out details that go away when we look at empty products.

The product of a set of objects is an object *P* such that given any other object *X* … there exists a unique morphism from *X* to *P* such that ….

If you’ve never seen this before, you might rightfully wonder what in the world this has to do with products. You’ll have to trust me on this one. [1]

When the set of objects is empty, the missing parts of the definition above don’t matter, so we’re left with requiring that there is a unique morphism [2] from each object *X* to the product *P*. In other words, *P* is a terminal object, often denoted 1. **So in category theory, you can say empty products are 1**.

But that seems like a leap, since “1” now takes on a new meaning that isn’t obviously connected to the idea of 1 we learned as a child. How is an object such that every object has a unique arrow to it at all like, say, the number of noses on a human face?

We drew a connection between arithmetic and categories before by looking at the cardinality of sets. We could define the product of the numbers *n* and *m* as the number of elements in the product of a set with *n* elements and one with *m* elements. Similarly we could define 1 as the cardinality of the terminal element, also denoted 1. This is because there is a unique map from any set to the set with 1 element. Pick your favorite one-element set and call it 1. Any other choice is isomorphic to your choice.

Now for empty sums. The following is the definition of co-product (sum), leaving out details that go away when we look at empty co-products.

The co-product of a set of objects is an object *S* such that given any other object *X* … there exists a unique morphism from *S* to *X* such that ….

As before, when the set of objects is empty, the missing parts don’t matter. Notice that the direction of the arrow in the definition is reversed: there is a unique morphism from the co-product *S* to any object *X*. In other words, *S* is an initial object, denoted for good reasons as 0. [3]

In set theory, the initial object is the empty set. (If that hurts your head, you’re not alone. But if you think of functions in terms of sets of ordered pairs, it makes a little more sense. The function that sends the empty set to another set is an empty set of ordered pairs!) The cardinality of the initial object 0 is the integer 0, just as the cardinality of the initial object 1 is the integer 1.

* * *

[1] Category theory has to define operations entirely in terms of objects and morphisms. It can’t look inside an object and describe things in terms of elements the way you’d usually do to define the product of two numbers or two sets, so the definition of product has to look very different. The benefit of this extra work is a definition that applies much more generally.

To understand the general definition of products, start by understanding the product of two objects. Then learn about categorical limits and how products relate to limits. (As with products, the categorical definition of limits will look entirely different from familiar limits, but they’re related.)

[2] Morphisms are a generalization of functions. In the category of sets, morphisms *are* functions.

[3] Sometimes initial objects are denoted by ∅, the symbol for the empty set, and sometimes by 0. To make things more confusing, a “zero,” spelled out as a word rather than a symbol, has a different but related meaning in category theory: an object that is both initial and terminal.

]]>Other methods, such as fuzzy logic, may be useful, though they must violate common sense (at least as defined by Cox’s theorem) under some circumstances. They may be still useful when they provide approximately the results that probability would have provided and at less effort and stay away from edge cases that deviate too far from common sense.

There are various kinds of uncertainty, principally epistemic uncertainty (lack of knowledge) and aleatory uncertainty (randomness), and various philosophies for how to apply probability. One advantage to the Bayesian approach is that it handles epistemic and aleatory uncertainty in a unified way.

Blog posts related to quantifying uncertainty:

- How loud is the evidence?
- The law of small numbers
- Example of the law of small numbers
- Laws of large numbers and small numbers
- Plausible reasoning
- What is a confidence interval?
- Learning is not the same as gaining information
- What a probability means
- Irrelevant uncertainty
- Probability and information
- False positives for medical papers
- False positives for medical tests
- Most published research results are false
- Determining distribution parameters from quantiles
- Fitting a triangular distribution
- Musicians, drunks, and Oliver Cromwell

This exercise gave me confidence that mathematical definitions were created by ordinary mortals like myself. It also began my habit of examining definitions carefully to understand what motivated them.

One question that comes up frequently is why zero factorial equals 1. The pedantic answer is “Because it is defined that way.” This answer alone is not very helpful, but it does lead to the more refined question: Why is 0! defined to be 1?

The answer to the revised question is that many formulas are simpler if we define 0! to be 1. If we defined 0! to be 0, for example, countless formulas would have to add disqualifiers such as “except when *n* is zero.”

For example, the binomial coefficients are defined by

*C*(*n*, *k*) = *n*! / *k*!(*n* – *k*)!.

The binomial coefficient *C*(*n*, *k*) tells us how many ways one can draw take a set of *n* things and select *k* of them. For example, the number of ways to deal a hand of five cards from a deck of 52 is *C*(52, 5) = 52! / 5! 47! = 2,598,960.

How many ways are there to deal a hand of 52 cards from a deck of 52 cards? Obviously one: the deck is the hand. But our formula says the answer is

*C*(52, 52) = 52! / 52! 0!,

and the formula is only correct if 0! = 1. If 0! were defined to be anything else, we’d have to say “The number of ways to deal a hand of *k* cards from a deck of *n* cards is *C*(*n*, *k*), **except** when *k* = 0 or *k* = *n*, in which case the answer is 1.” (See [1] below for picky details.)

The example above is certainly not the only one where it is convenient to define 0! to be 1. Countless theorems would be more awkward to state if 0! were defined any other way.

Sometimes people appeal to the gamma function for justification that 0! should be defined to be 1. The gamma function extends factorial to real numbers, and the gamma function value associated with 0! is 1. (In detail, *n*! = Γ(*n*+1) for positive integers *n* and Γ(1) = 1.) This is reassuring, but it raises another question: Why should the gamma function be authoritative?

Indeed, there are many ways to extend factorial to non-integer values, and historically many ways were proposed. However, the gamma function won and its competitors have faded into obscurity. So why did it win? Analogous to the discussion above, we could say that the gamma function won because more formulas work out simply with this definition than with others. That is, you can very often replace *n*! with Γ(*n* + 1) in a formula true for positive integer values of *n *and get a new formula valid for real or even complex values of* n. *

There is another reason why gamma won, and that’s the Bohr–Mollerup theorem. It says that if you’re looking for a function *f*(*x*) defined for *x* > 0 that satisfies *f*(1) = 1 and *f*(*x*+1) = *x* *f*(*x*), then the gamma function is the only log-convex solution. Why should we look for log-convex functions? Because factorial is log-convex, and so this is a natural property to require of its extension.

**Update**: Occasionally I hear someone say that the gamma function (shifting its argument by 1) is the only analytic function that extends factorial to the complex plane, but this isn’t true. For example, if you add sin(πx) to the gamma function, you get another analytic function that takes on the same values as gamma for positive integer arguments.

**Related posts**:

- Why are empty products 1?
- Why are natural logarithms natural?
- Another reason natural logarithms are natural

* * *

[1] Theorems about binomial coefficients have to make some restrictions on the arguments. See these notes for full details. But in the case of dealing cards, the only necessary constraints are the **natural **ones: we assume the number of cards in the deck and the number we want in a hand are non-negative integers, and that we’re not trying to draw more cards for a hand than there are in a deck. Defining 0! as 1 keeps us from having to make any **unnatural** qualifications such as “unless you’re dealing the entire deck.”

]]>I tried. I tried to learn some statistics actually when I was younger and it’s a beautiful subject. But at the time I think I found the shakiness of the philosophical underpinnings were too scary for me. I felt a little nauseated all the time. Math is much more comfortable. You know where you stand. You know what’s proved and what’s not. It doesn’t have the quite same ethical and moral dimension that statistics has. I was never able to get comfortable with it the way my parents were.

In its simplest form the 80-20 rule says 80% of your outputs come from 20% of your inputs. You might find that 80% of your revenue comes from 20% of your customers, or 80% of your headaches come from 20% of your employees, or 80% of your sales come from 20% of your sales reps. The exact numbers 80 and 20 are not important, though they work surprisingly well as a rule of thumb.

The more general principle is that a large portion of your results come from a small portion of your inputs. Maybe it’s not 80-20 but something like 90-5, meaning 90% of your results coming from 5% of your inputs. Or 90-13, or 95-10, or 80-25, etc. Whatever the proportion, it’s usually the case that some inputs are far more important than others. The alternative, assuming that everything is equally important, is usually absurd.

The 80-20 rule sounds too good to be true. If 20% of inputs are so much more important than the others, why don’t we just concentrate on those? In an earlier post, I gave four reasons. These were:

- We don’t look for 80/20 payoffs. We don’t see 80/20 rules because we don’t think to look for them.
- We’re not clear about criteria for success. You can’t concentrate your efforts on the 20% with the biggest returns until you’re clear on how you measure returns.
- We’re unclear how inputs relate to outputs. It may be hard to predict what the most productive activities will be.
- We enjoy less productive activities more than more productive ones. We concentrate on what’s fun rather than what’s effective.

I’d like to add another reason to this list, and that is that we may find it hard to believe just how unevenly distributed the returns on our efforts are. We may have an idea of how things are ordered in importance, but **we don’t appreciate just how much more important the most important things are**. We mentally compress the range of returns on our efforts.

Making a list of options suggests the items on the list are roughly equally effective, say within an order of magnitude of each other. But it may be that the best option would be 100 times as effective as the next best option. (I’ve often seen that, for example, in optimizing software. Several ideas would reduce runtime by a few percent, while one option could reduce it by a couple orders of magnitude.) If the best option also takes the most effort, it may not seem worthwhile because we underestimate just how much we get in return for that effort.

]]>If you’d like to contribute an endorsement, please contact me.

]]>The appeal of magic is that it promises to render objects plastic to the will without one’s getting too entangled with them. Treated at arm’s length, the object can issue no challenge to the self. … The clearest contrast … that I can think of is the repairman, who must submit himself to the broken washing machine, listen to it with patience, notice its symptoms, and then act accordingly. He cannot treat it abstractly; the kind of agency he exhibits is not at all magical.

**Related post**: Programming languages and magic