Extra Cheese

PeepCode Play by Play Screencast

2011-01-05T00:00:00-08:00

Back in July, Geoffrey Grosenbach and I got together and programmed for a few hours, recording the session as a screencast. We hacked on a particular problem using Ruby, RSpec, Vim, Git, and the Unix shell, focusing the whole time on why I did things the way I did. Geoffrey recently published that screencast at PeepCode with his usual editing magic, adding annotations and additional explanations to our programming session.

Along the way, we hit many diverse topics ranging from the high level, like the four rules of simple design, all the way down to the lowest level, like why it's better to say Vjd in Vim than 2dd.

The screencast's page on PeepCode has a preview video and some other preview content. The screencast itself isn't free, but it's also fifteen times the length of my average screencast and professionally edited. As always, feedback is appreciated.

Rebase Is Safe

2010-12-09T00:00:00-08:00

A falsehood about Git is spreading: that git rebase isn't safe. Not the kind of unsafe where you rebase pushed changes and everyone gets a nasty merge bubble. That's a real, well-known danger, and it's accepted that you just don't do it.

This falsehood is that git rebase inherently destablizes your history, potentially introducing changesets that don't compile or don't pass the tests, and that this is a serious problem. The main consequence is that git bisect stops working: if you have revisions with broken tests, your bisect will skip them if you're very careful, or give you false positives if you're not.

That certainly sounds bad. But if the problem is that you don't know whether the tests pass for the newly rebased commits, why not run the tests? Like this, for example:

(set -e;
  git rev-list --reverse origin/master..master |
    while read rev; do
      echo "Checking out: $(git log --oneline -1 $rev)";
      git checkout -q $rev;
      python runtests.py;
    done)

This isn't hard—it's just a while loop at the shell. It checks out every revision between origin/master and master, running the tests for each revision. Running it on a couple revisions of Expecter Gadget gives this output:

Checking out: 4f56581 Split tests into many files
....................
-------------------------------------------------
Ran 20 tests in 0.019s

OK
Checking out: 44de2c2 pyflakes
....................
-------------------------------------------------
Ran 20 tests in 0.019s

OK

This only took 1.1 seconds. If it's slow, that's because your unit tests are slow (not Git's fault). Of course, test slowness will also slow down your bisects, your CI, your inner-loop development workflow, and lots of other stuff. Rebase isn't the bottleneck there.

The command will stop whenever a test fails—that's what the set -e at the top is for. So, if any revision doesn't pass the tests, you'll see the test output, the command will stop, and you'll be left with that revision checked out. You'll probably want to git checkout master and do an interactive rebase to fix that revision. This rarely happens to me in practice; the rebases almost always work without modification, but checking is still important because you really don't want to break the history.

You might worry that the checkouts will smash stuff in the working copy or otherwise cause chaos. Again, set -e saves us. If the working copy is dirty, the first git checkout will fail and the subshell will exit:

Checking out: 4f56581 Split tests into many files
error: You have local changes [...]; cannot switch branches.

I've used this on multiple projects with great success, both by myself and on small teams of around six people. It originally came from a similar command for running tests over all revisions in a Mercurial patch queue, which is just another way to rebase commits, lest the Mercurial users think that this issue is Git-specific.

In the past, I haven't even bothered turning the command into a script. I have a huge Zsh history size (100,000 commands) and would just hit ^R to do a reverse history search for "rev-list". However, I've just scriptified the command in my dotfiles repository, so you can use it easily in your own projects.

The lesson here is that you need to engage deeply with your tools: understand Git, but also understand Unix. Learn how to use them well (for Unix examples, see my screencasts "Python to Shell to Ruby" and "A Raw View into My Unix Hackery"), and learn the fundamental models of both (for a Unix example, see "The Tar Pipe"), and especially learn how to use them together. Don't learn only one, or learn them both halfway, and then blame the tools when you have problems. Programming is hard, but let's not go shopping.

(If you enjoyed this, you may also like "Why I Switched to Git From Mercurial".)

An Aside About Shell Conditionals

2010-11-21T00:00:00-08:00

The && shell operator is used in two common contexts. First, to chain commands like ; does, but stop on failure. For example:

$ mkdir foo && cd foo && echo 'done!' && cd ..
done!
$ mkdir foo && cd foo && echo 'done!' && cd ..
mkdir: foo: File exists

The directory existed on the second try, so mkdir set $?, the error code, to 1. An error code of 0 is success; non-zero is failure. The && operator saw that $? was non-zero, meaning that the mkdir failed, so it exited.

The second common application of && is within comparisons: you can use it in conditionals as a logical "and" operator. It acts as you'd expect it to act in any other programming language:

$ if [ 1 = 1 ] && true; then echo ok; fi  
ok
$ if [ 1 = 100 ] && true; then echo ok; fi
$ if [ 1 = 1 ] && false; then echo ok; fi

Above, we saw that the && operator continues executing only if the previous command sets $? to 0, the success value. Which means...

In shell conditionals, the true things are 0 and the false things are 1! I know – it's Wrong, but it also makes everything Just Work.

Oh... and one more thing. true, false, and [ are all programs:

$ which true false [
/usr/bin/true
/usr/bin/false
/bin/[

Noodle on that for a while! Which program knows what = means? Why is the ] there when using a conditional at the shell? (Isn't it just a meaningless argument passed to [?)

(Seriously – think about these questions before proceeding.)

...

I'm not going to answer them, but I will show you this:

$ /bin/[ 1 = 1; echo $?
0
$ /bin/[ 1 = 2; echo $?
1

(This post started as an aside in The Tar Pipe that didn't make the cut. See that post for more Unix bits.)

Screencast: Custom Vim Refactorings

2010-11-19T00:00:00-08:00

This lightning talk was given at Eden Development in Winchester, UK. I create an automated "inline variable" refactoring from scratch in Vim by recording interactive commands, then working them into a mostly-intelligible function. It shows off some of the deeper power of Vim: commands are text and text is commands. This parallels the Lisp philosophy of "code is data and data is code".

Screencast: Python to Shell to Ruby

2010-06-30T00:00:00-07:00

This is a quick screencast where I do some Unix shell hacking. The problem: I needed to get my Google Reader subscription list into a Rails app's database. The first solution that came to mind was to parse the Google OPML file with Python, then use shell scripting to generate Ruby code using the raw Python data structure.

The whole thing is less than seven minutes long and spans three languages. I also find a brand new bug in the terminal. Software is frustrating!

Screencast: Python to Shell to Ruby from Gary Bernhardt on Vimeo.

(If you enjoyed this, you might enjoy ("A Raw View into My Unix Hackery" and my "String Calculator Kata in Python".)

Specificity, Faking It, and Making It

2010-05-31T00:00:00-07:00

I recently asked this question on Twitter:

When I've faked it, but not yet made it, is it legit to push and pull specificity around?

Clearly, this makes sense to almost no one. In the spirit of The Tar Pipe, I present a step-by-step explanation of what this question means:

Faking It and Making It

Faking it is when, during TDD, you write something stupid to make a test pass. When you write the first fibonacci sequence test, for fib(0) == 0, the body of your fib function will say return 0. That's faking it.

You continue to write tests for functionality, then fake it to make them pass. But after each red-green cycle comes the required refactoring stage. As soon as your tests pass, you must remove any duplication. Don't blame me for this; it's one of the four elements of simple design.

Often, removing duplication means combining two special cases into a general case. For example, two special-cased checks for fib(0) == 0 and fib(1) == 1 can be collapsed into a single statement that passes both tests: return n.

After enough write-test-then-fake-it-then-refactor iterations, the class will arrive at a true implementation of its desired behavior. That's making it: there's no longer any faking. Or, in some cases, what looked like faking turned out to be the basis case of the class's behavior. In either case, the behavior has been made.

Defining Specificity

Details about how a class does its job are specificity. For example, rule 1 in Conway's game of life is "any live cell with fewer than two live neighbors dies, as if caused by underpopulation." You could write your game of life evolution function with that in mind:

    def evolve(world):
        for cell in world.cells:
            if cell.neighborhood.live_cell_count < 2:
                cell.die()
        ...

That's some serious specificity, though: the evolution algorithm gets very specific about the rules of evolution. A lower-specificity version of the function wouldn't talk about numbers at all:

    def evolve(world, rules):
        for cell in world.cells:
            if rules.cell_should_die(cell):
                cell.die()

The responsibility of knowing the specific rules has been moved out into a rules object, allowing this function to focus on its core responsibility of applying the rules to the world.

However, there's still more specificity that can be pulled out. Right now, the function asks the rules whether each cell should live, then conditionally tells the cell to die based on the answer. It would be better to tell the rules engine to do its thing directly, rather than asking:

    def evolve(world, rules):
        for cell in world.cells:
            rules.apply(cell)

The specificity about the relationship between rules and cells has been removed, making the function more abstract. Equivalently, the fan-out of evolve has decreased by one. Or, the coupling between evolve and rules has decreased from data coupling to message coupling (message coupling is weaker). Or, mumble mumble connascence mumble.

Pushing and Pulling Specificity

During code retreats, Corey Haines likes to challenge attendees to move specificity out into the test. This is what he means: move low-level details out of your system and into your test, making the system more abstract.

In our example above, we moved specificity about how cells live and die out of the production code. Depending on how we made that change, that specificity might continue to exist, but get moved into the test code instead. For example, in Ruby, the test code could pass a block that implements the actual steps. Ultimately, those details will go in a lower-level class that the current one depends on.

Pushing and pulling specificity is about moving implementation details around. Many people, when told about isolated unit testing, complain that it will lead to rigid, unchangeable definitions of the implementation. That's not the case, and this is one of the reasons – a TDDer practicing isolationism will learn how to move specificity out of necessity.

(I've written and deleted several paragraphs about the differences between pushing and pulling specificity, but haven't come up with anything I'm comfortable saying in public. You'll have to come up with your own ideas about the important differences, if any, between pushing and pulling.)

The Question at Hand

Now, we can get back to my original question:

When I've faked it, but not yet made it, is it legit to push and pull specificity around?

In other words, when I've not yet TDDed the full system – when it still contains faked-out behavior – is it safe, desirable, morally acceptable, etc., etc. to push and pull specificity to and from other classes?

Well, probably. But that's really not the point, is it?

The Tar Pipe

2010-05-17T00:00:00-07:00

This is a tar pipe:

(cd src && tar -cf - .) | (cd dest && tar -xpf -)

It basically means "copy the src directory to dst, preserving permissions and other special stuff." It does this by firing up two tars – one tarring up src, the other untarring to dst, and wiring them together.

You can learn a whole lot about Unix from that one little command.

The Subshell

The ( starts a subshell. This is actually spawning a process using fork(2) – everything inside the parentheses is in a separate instance of bash. The subshell, by virtue of being a separate process, is a natural namespacing mechanism: the parent bash won't see the child's changes to variables and – important for our case – the working directory. This is why we used a subshell: it isolates the cd to only the subshell doing the tar.

The cd changes the subshell process's working directory to src. After that comes &&. Logically, this means "do the next thing only if the previous one worked; otherwise, fail." Under the hood, it's testing $?, the previous command's exit code variable, and failing if it's not 0. The weird thing about && is that it means "only continue if the return code was 0", which is the opposite of what you'd expect from a real programming language. But Unix commands (and most C functions, for that matter) return 0 on success, so it all works out.

The Tar

Now that we've cded to src, we start a tar. -c means "create a tarball", as opposed to extracting or listing. -f tells tar what file to create, and it's getting an argument of -, which is the Unix convention for stdout. The final argument, ., means "the current directory", so the whole command together means "tar up the current directory and dump the tar data to stdout."

Stdout is one of three special file descriptors in Unix: stdin, stdout, and stderr. At a terminal, the keys you type are going into the shell's stdin, and the output it shows you is coming from stdout. Stderr is basically another stdout, but used for errors. If our tar command failed, the error messages would show up on our terminal via stderr even though the stdout is being piped away.

stdin, stdout, and stderr are files 0, 1, and 2. Always. This is why you append 2>&1 to a command to say "combine the stdout and stderr streams": it means "send stderr (descriptor 2) to stdout (descriptor 1)".

The Pipe

The original command contained two subshells. We'll get to the second later, but what we care about now is that they're joined with a |. This makes bash create a pipe using pipe(2). A pipe is a unidirectional file-like object: it has a reading end and a writing end.

Bash fork(2)s, making a copy of its own running process. Through a series of changes, this newly forked bash process will become the tar process that will feed the pipe.

When piping commands together, the stdin and stdout file descriptors are used, but they write to and read from the pipe instead of the terminal. Bash uses dup2(2) to duplicate the writing end of the pipe to stdout. This means that any data the newly-forked process writes will go into the pipe. Under the covers, dup2(2) is saying "forget about what used to be at file descriptor 1; make this other file descriptor the new 1."

At this point the process is still bash. It has to exec(3) the tar binary before it's ready to do real work. This replaces the running bash process with a copy of tar, but doesn't close the file descriptors. There's now a running tar process with its stdout glued to the writing end of the pipe.

Moving Data

After bash exec(3)s tar, the tar process looks at its arguments and sees that it's supposed to be tarring up the current directory (which is src because the subshell it came from cded to it), and that it's supposed to emit the tarred data on stdout (which is the writing end of the pipe that bash set up). It starts reading files, generating tar data, and spewing it to stdout.

But wait – the reading tar isn't even there yet! Doesn't matter. Both sides of the pipe existed from the time it was created, and pipes are buffered. The writing tar will be allowed to shove data into the pipe until it's full. Eventually, when the pipe is full, the write(2) will block. The kernel's CPU scheduler will kick in, notice that bash is waiting for the CPU, and context switch to it.

The Reading

Bash starts to execute the other side of the pipe. It cds into the dest directory and starts a separate tar process with -xpf -. -x means "extract", -p means "preserve permissions" – usually a good thing – and -f - now means "read from stdin". You know, I never really thought about that until just now: sometimes - means stdin; sometimes it means stdout. It was so natural that I never considered it. Anyway...

The second tar process is started in basically the same way as the first. It fork(2)s, sets its stdin to the reading side of the pipe (using the dup2(2) trick again), and execs tar. Both ends of the pipe are now connected. The writing tar has its stdout hooked to the writing end of the pipe, and the reading tar has its stdin hooked to the reading end.

The parent bash process executes wait(2)s on the subshell bash processes, which will block until the subprocesses finish. The subshell bash processes also execute wait(2)s on their forked tar processes. Because the bashes are all blocked, a context switch happens and the newly-spawned reading tar process gets the CPU.

The reading tar process, being freshly forked and execed, starts up, processes its arguments, and sees that it's supposed to read from stdin (which is the pipe, not that it cares). The blob of data that the writing tar wrote into the pipe's buffer is sitting there, so the reading tar pulls it out and starts to decode it. There may be enough data that it can actually reconstruct a file or two. But pretty soon, it's going to exhaust the buffer.

The Context Switch

The reading tar will never know when the pipe's buffer is empty; it just keeps calling read(2). At the beginning, read(2) will keep returning the data that the writer wrote. Eventually, it'll empty the pipe's buffer and the read(2) will block. The kernel's scheduler will kick in again and switch back to the writer. It gets woken up, the write(2) call that was blocked completes, and the writer continues filling the pipe until it's full again.

This repeats again and again: the writing tar writes until the pipe is full, the reading tar reads until it's empty, on and on.

The Exhausted Pipe

Eventually, the writing tar will finish tarring up everything and sending it over the pipe. When this happens, it'll clean up and exit. Exiting implicitly closes its stdout, which means the writing end of the pipe closes. The reading tar, who's blocking on the empty pipe, sees its call to read(2) return 0, which means it's reached the end of the file.

Since the tar stream has ended, the reader cleans up and exits as well. The subshell processes exit because they've finished their commands. Finally, the parent bash process's two wait(2) calls return. The prompt comes back. From beginning to end, about 10ms have elapsed.

And that's Unix!

Notes

I don't know how Bash actually implements any of this; I've just made up a conveniently simple implementation. The same goes for tar.
This is all from memory; the only things I looked up were man page numbers. Caveat emptor.
The -f arguments to tar aren't actually needed, but are illustrative.
I assume the tar file format doesn't indicate when the file ends, so the reader must wait for read(2) to return 0. I doubt this is actually the case.
I drastically simplify the CPU scheduling, process management, and execution order, approximating them for simplicity.
Tar pipes are mostly obsolete, but far too awesome to be forgotten!

Why I Switched to Git From Mercurial

2010-05-15T00:00:00-07:00

I used Mercurial for three years, but started switching to Git about a year ago. I now grudgingly recommend Git to anyone who intends to be a full-time programmer. Git's interface is bad in many ways, which is the main complaint about it, and it's a legitimate one. It's just an interface, though, and this is a tool you're going to use all day, every day, in a wide variety of situations.

Here are all of the ways that Mercurial has harmed me, or that I've seen it harm others, and the ways in which Git does good where Mercurial does evil:

One: Mercurial is bad at handling large amounts of data. A friend accidentally committed a couple GB of data into a Mercurial repository. It became completely broken, to the point where most commands would die because they ran out of memory. Git has no problem with large data. It's awesome to be able to put, say, an entire home directory or ports install under version control without fear. (I recently put a multi-gigabyte Mac Ports install under version control with Git without even thinking about it.)

Two: Mercurial's repository model is clunky and stays hidden in the background (this is a bad thing; don't let anyone tell you otherwise). If you have a Mercurial repository whose size is dominated by a single, 20 MB directory, and you then rename that directory, your repository just doubled to 40 MB. This has limited my ability to manage real-life Mercurial repositories. Git's repository model is so good that I only hesitate slightly when calling it perfect. It allows me to think about what's going on in the repository with an ease that I never had with Mercurial, despite using it much more than Git.

Three: Mercurial is not safe. Both systems ship with many commands that change history, but Git's data model is such that even a "delete" isn't really a delete. Destructive commands just create new nodes in the history graph, then adjust the branch to point at them. Whenever this happens, the old branch HEAD is still accessible using the reflog. That's awesome, and it alone would bring me to Git.

Mercurial's answer to this is weak: destructive commands shove a bundle file into a subdirectory of the Mercurial repository; you have to manually manipulate it if you want to get the data back. Except some of the destructive commands don't dump the bundle files, which has made me lose actual data in the past. Even for the commands that do dump the files, keeping track of them, and which applies where, becomes difficult fast.

tl;dr:

Mercurial made my repositories huge for no reason.
Mercurial broke when my friend put lots of data in it.
Mercurial lost my data when I did a destructive command.
In a year of Git, it's never done anything nearly as bad.

I'm sorry for recommending software with a confusing interface. But you'll be spending a lot of time with it; it's worth getting over the initial hurdle of confusion.

...until something better comes along, of course.

My Startup Talk From NWPD 2010

2010-05-06T00:00:00-07:00

A few months ago, I posted one of my talks from Northwest Python Day 2010: Python vs. Ruby: A Battle to The Death. This is the other talk I gave, "A Brief History of BitBacker, A Startup", which shares its name with a blog post from 2009. This new talk is mostly about the technical details, whereas the original blog post was a higher-level chronology about what we did from founding in 2006 until the doors closed in 2009.

Like the Python vs. Ruby talk, this one hits many different topics: a lot of testing (of course), a smattering of technical lessons learned, and a couple of business lessons. There's one very notable artifact in the talk: I show BitBacker's 510 unit tests executing in about a second. I also show one of the acceptance tests executing – firing up the full suite of back-end components as well as the app itself, then driving the whole system programmatically. I've talked about these things to various people, but never shown them in public. For this talk, I went through the trouble of recording videos of BitBacker's tests in action, so now you know that I'm not making things up!

A Brief History of BitBacker, A Startup from Gary Bernhardt on Vimeo.

Here are a few clarifications on things that I say in the talk:

Dingus is a test double library for Python. Technically, dinguses are both stubs and test spies. Many people would call it a mocking library, but that's not quite correct.
London-style TDD is a style that uses test doubles extensively, with mocks or spies verifying interactions. Classes are generally TDDed in isolation, interacting only with test doubles – never real collaborators with their own behavior. It sounds crazy; it's not.
Corey Haines gave me much of my early coaching on testing. We'd talk about it once every month or two; other than that, I read a lot and just kept hurting myself accidentally until I understood where the pain was coming from.
I do know that a third of 17,000 is not 4,000!

I hope you enjoy the talk. If you have feedback, you can find my contact info at the bottom of this page. I'd appreciate hearing it!

A Raw View into My Unix Hackery

2010-04-27T00:00:00-07:00

I just converted my blog from PyBlosxom to Jekyll. After the conversion, I wanted to make sure that no incoming links were broken. I recorded the following screencast of me answering that question.

This wasn't rehearsed; I didn't even spend time thinking about how I'd solve the problem ahead of time. I was also sort of run down while recording it; I'd been hacking away at this stuff for many hours by the time I recorded. Still, it gives you a glimpse of what it looks like when I puzzle through a problem at the Unix command line, something that people seem to be interested in.

A Raw View into My Unix Hackery from Gary Bernhardt on Vimeo.

At the end, I come up with a pretty long command. I did some further tweaking after the screencast ended; the final command is below. Although I've formatted it on multiple lines here, keep in mind that this was written at the console without newlines or regard for quality or brevity, so it's not pretty. See the video to learn how such things are born.

cat logs/apache/access_blog* |
grep ' 200 ' |
grep -v '"http://blog.extracheese.org/[^"]' |
grep -v 'Feedfetcher-Google' |
grep -v 'Googlebot' |
grep -v 'my6sense' |
grep -v 'search.msn.com' |
grep -v 'scoutjet' |
grep -v 'betaBot' |
grep -v 'Yahoo! Slurp' |
grep -v 'aggregator:Spinn3r' |
grep -v 'FeedBurner' |
grep -v 'Planet Python' |
grep -v 'FeedBurner/1.0' |
grep -v 80legs.com |
grep -v Yandex |
grep -v '.NET CLR' |
grep -v 'seoprofiler' |
grep -v 'urdland' |
grep -v 'Speedy Spider' |
grep -v 'Ask Jeeves' |
cut -d '"' -f 2 |
cut -d ' ' -f 2 |
awk '{urls[$1]++} END {for (url in urls) print urls[url], url}' |
sort -nr |
grep -v 'index.php' |
grep -v 'widgetType' |
cut -d ' ' -f 2 |
cut -d '?' -f 1 |
while read url; do
    curl "http://localhost:4000$url" 2>&1 |
    grep -i '<h1>not found</h1>' > /dev/null;
    if [ $? -eq 0 ]; then
        echo $url;
    fi;
done |
grep -v '\.css$' |
grep -v '\.js$'

Happily, I only had to redirect the RSS feed URLs. Everything else that I actually cared about worked the first time!

Python vs. Ruby: A Battle to The Death

2010-02-15T00:00:00-08:00

At Northwest Python Day 2010 on January 30th, I gave two talks; this is the second, which was the last talk of the day. I tuned it to its audience and time slot, making it biased toward an audience that already knows Python, as well as being lighter than originally intended. I still quite like the result.

During my talk, I mention Ruby.rewrite(Ruby), a talk by Reg Braithwithe. I recommend it. It also happens to contain my favorite quote from any talk, ever, which I will not repeat here. You'll just have to guess!

I've removed a few small bits from the audio – one grossly incorrect statement, one slide that could be misinterpreted in ways I want to avoid, and a couple of audience questions that I basically had no answer for. None of these edits affects the content of the talk in my opinion.

The talk is dense and necessarily glosses over a lot of subtleties. I talk about the Zen of Python, monkey patching (several times), the Ruby community's reckless hastiness, the syntax of RSpec and cucumber, beauty and ugliness in languages and testing tools, the complexity of the languages' grammars, syntactic vs. semantic complexity, the relative taste of grasshoppers and tree bark, etc., etc. There's way too much here to give anything a fair treatment. I hope that you'll keep this in mind while watching, avoid interpreting the talk as a claim to absolute truth, and simply enjoy it for what it is.

Shortly after the event, Geoffrey Grosenbach posted his thoughts. You should take a look for another viewpoint.

I'll be at PyCon later this week, by the way, so feel free to commiserate with, or rant at, me. With that, here's the talk:

Python vs. Ruby: A Battle to The Death from Gary Bernhardt on Vimeo.

Functional and Non-Functional Testing

2010-01-21T00:00:00-08:00

Different types of complexity interact with test suites in different ways. Consider BitBacker, an online backup product that I worked on for three years. It had very few functional requirements. At its core, it only had to let the user choose files, back them up, and restore them. Almost all of the complexity was non-functional: it had to look good and be easy to use, of course, but it also had to be fast and secure. I spent most of my development effort on the "fast" and "secure" parts.

(Note that when I say "functional" here, I'm talking about requirements for a system's behavior. This has nothing to do with functional programming.)

In this type of app, where there are so few functional requirements, functional test fragility is less of a problem. In a recent discussion with Jonathan Penn, I mentioned that the backup and restore functionality were tested at the unit, subsystem, and full-stack levels: three different levels of tests, all testing the same thing. He asked me whether this made refactoring difficult. It didn't.

BitBacker's functional requirements were never going to change. When the user backed up and then restored files, they had to be identical to the originals. That's all. It took 17,000 lines of code to make that happen efficiently and securely, but the surface area of the user-facing problem is tiny.

I didn't know this at the time. If I was building a business app instead of a backup system, I probably would've ended up with a similar test suite, and in that situation it would've been a burden. Fortunately, I got lucky, and I've learned this lesson by retrospecting about my luck rather than retrospecting about some pain that I felt.

What about automated non-functional tests? The topic is murky in general, and I only know how to test small subsets of the non-functional requirement space. I don't know how to automate testing of user experience, for example.

I have done automated performance testing, however. At one point, I wrote tests for BitBacker that ran backups across a wide range of file counts and asserted that the backup time grew linearly with the number of files. That's clearly a non-functional test, but how fragile is it?

It's very fragile, of course, unless you run it on massive file counts that would've taken far too long for my patience at the time. I left the file counts small, so it ran fast but broke constantly, which eventually led me to remove it.

I replaced the test with a system that could kick off various predefined processes ("do an empty backup", "back up 1,000 files", etc.), graphing the runtimes and memory footprints across revisions in version control. One look at those performance graphs would show whether, and where, there was a problem. This gave me a different kind of feedback: instead of defining "success" and "failure", it would alert me to a change, which I could then investigate on my own.

I suspect that this is a fundamental property of non-functional testing. Trying to fully automate it and boil it down to a set of pass/fail assertions, while sometimes possible, seems prone to fragility. It may be that non-functional testing is best achieved by dashboard apps, like my performance-over-revisions graph, or an app that renders every page in a user flow automatically and highlights recent changes in appearance.

Test Double Injection Inversion

2010-01-19T00:00:00-08:00

In Dependency Injection Inversion, Uncle Bob wonderfully explains the difference between Dependency Injection and Dependency Injection Frameworks, a topic I've done a lot of thinking about recently. You should go read his post right now if you haven't yet.

At the end, he provides the test code below as an example of testing some dependency-injected Java code:

public class BillingServiceTest {
  private LogSpy log;

  @Before
  public void setup() {
    log = new LogSpy();
  }

  @Test
  public void approval() throws Exception {
    BillingService bs = new BillingService(new Approver(), log);
    bs.processCharge(9000, "Bob");
    assertEquals("Transaction by Bob for 9000 approved",
                 log.getLogged());
  }

  @Test
  public void denial() throws Exception {
    BillingService bs = new BillingService(new Denier(), log);
    bs.processCharge(9000, "Bob");
    assertEquals("Transaction by Bob for 9000 denied",
                 log.getLogged());
  }
}

class Approver implements CreditCardProcessor {
  public boolean approve(int amount, String id) {
    return true;
  }
}

class Denier implements CreditCardProcessor {
  public boolean approve(int amount, String id) {
    return false;
  }
}

class LogSpy implements TransactionLog {
  private String logged;

  public void log(String s) {
    logged = s;
  }

  public String getLogged() {
    return logged;
  }
}

It's perfectly fine Java code, and it wonderfully demonstrates the power of injection. After the code, Uncle bob says:

It would have been tragic to use a mocking framework for such a simple set of tests.

In Java, I agree completely. In a more modern language, I disagree completely! I've translated his example to Python using my Dingus test double library to illustrate the simplicity that doubles can provide:

class BillingServiceTest:
    def setup(self):
        self.log = Dingus()

    def test_approval(self):
        approver = Dingus(approve__returns=True)
        bs = BillingService(approver, self.log)
        bs.process_charge(9000, 'Bob')
        assert self.log.calls(
            'log',
            'Transaction by Bob for 9000 approved').once()

    def test_denial(self):
        denier = Dingus(approve__returns=False)
        bs = BillingService(approver, self.log)
        bs.process_charge(9000, 'Bob')
        assert self.log.calls(
            'log',
            'Transaction by Bob for 9000 denied').once()

In a real system, I'd factor these tests slightly differently; I've left them as close to Bob's as possible. This is 13 ELOC vs. Bob's 38 – only about a third as much code! Some of the difference is in his testing library's ceremony, but most of it is in his test doubles. For example, he says:

class Approver implements CreditCardProcessor {
  public boolean approve(int amount, String id) {
    return true;
  }
}

That is a lot of code! All it really says is "the approve method always returns true", with the rest being a complex dance around Java's rigidity. This is a liability for programmers working in such languages, as well as a learning barrier for new testers. In my Python version, the following takes the place of the Approver class, as well as its instantiation:

approver = Dingus(approve__returns=True)

That line of code is so close to "the approve method always returns true" that I can't imagine it being any clearer. Of course, if the magic double underscores turn you off, you can also say:

approver = Dingus(approve=returner(True))

Digression

I'd love to hear what you think about those two alternate forms. I want to deprecate one, but I don't know which.

I fear that statements like Uncle Bob's about test doubles may lead newer programmers, and static-only programmers, astray. His advice is wonderful, but only in certain domains. Like so many things in software, doubling is far easier when the shackles of Javaesque type systems are removed. And, if you worry that the complexity is simply moved into the test double library, fear not: Dingus is currently 193 ELOC long, including plenty of features not mentioned here!

String Calculator Kata in Python

2010-01-06T00:00:00-08:00

For those who aren't familiar, katas are those impressive sequences of movements that you've surely seen martial arts guys perform. Code katas are the same idea applied to writing code: you solve a problem many times, mastering the movements, and then perform it for others.

My friend Corey Haines has worked with some other people to run a Katacast site dedicated to code katas, posting them roughly once per week. Recently, there's been a string of solutions to the same problem, with my Python version as this week's entry.

Briefly, here's the problem I solve. I have numbers coming in as a string, separated by commas or newlines. My job is to add those numbers and return the sum. There are two complications:

If the first line of the string is of the form "//", then * is also a possible delimiter. This works for any string, not just .
Negative numbers must be rejected.

You can see the kata and read my brief commentary on the Katacasts site, or go straight to Vimeo to watch it. It's only 4:32 long, so it's not a big commitment.

String Calculator Kata in Python from Gary Bernhardt on Vimeo.

My screencasts always prompt questions about my Vim configuration, so take a look at my dotfiles repo if you're interested. For this kata, I'm slowing down intentionally – typing slower and inserting small, regular pauses so the viewer has time to look around a bit. I may post a "hard mode" version at full speed if people show interest.

Comments are encouraged, of course – the purpose of a kata is improvement!

If you like this, by the way, you may also enjoy the refactoring screencast I posted recently.

On Abstraction

2009-12-16T00:00:00-08:00

Some people seem to consider abstraction a bad word. I think that this is misguided and impedes progress – all software is abstraction. Understanding what our abstractions mean, and what makes them good or bad, is the core of design.

For now, let's define abstractions as concepts; nothing more. If it's a concept in your head, it's an abstraction. (I've tried to define the word more fully about ten times, deleting each definition in turn.)

The interesting part of abstractions is their violation. First, the textual definition of an abstraction – a class, for example – can violate itself. This happens when a class presents information at more than one level of abstraction. Here's Grady Booch, from "Object Oriented Analysis and Design":

[The] class Dog is functionally cohesive if its semantics embrace the behavior of a dog, the whole dog, and nothing but the dog."

It's a wonderfully terse explanation, but doesn't go far enough for our purposes because it doesn't address relationships.

Example

A Person class can have a first_name field. But should Person also have a set of address fields like street and zip_code? Probably not. These fields are part of an Address, which is a concept that exists independent of Person. Moving them into an Address class reifies this natural abstraction in our code, making it mirror the way the ideas are structured in our brains.

This is sort of a restatement of the Single Responsibility Principle (SRP), which is sort of a restatement of the principle of cohesion. We have many tools for thinking about this idea because it's important.

Abstractions can also be violated from outside. If an object exposes a set of fields to me, I should avoid looking into those fields' structure. In other words, I must respect the abstraction provided by the object. If I feel the need to violate the abstraction, I need to reconsider how to modify the boundaries to match that need, rather than violating the boundaries by crossing them.

This is the moment when design happens: I can take the path of short-term gain by reaching into my collaborators' collaborators, or I can take the path of long-term gain by refactoring my design to match the conceptual model.

Example

Suppose I have a Person and need to tell the SnailMailer to send him mail. The SnailMailer, as currently designed, takes a street, a zip_code, etc. I could pull the data out of the address fields, like person.address.zip_code, then pass them to the SnailMailer. But in doing that, I would violate the Person abstraction.

Instead, I should have stepped back and thought about the contract of the SnailMailer. It would be better to pass in the Person's Address instead of its components. That way, I rely only on the Person abstraction (it has an Address) and the SnailMailer abstraction (it sends to addresses). I remove my dependency on the structure of a Person's Address (street, zip, etc.) and I remove my dependency on the SnailMailer's expectations about address fields (street, zip, etc.) The SnailMailer can decide how to deal with those.

This is sort of a restatement of the Law of Demeter, which is sort of a restatement of part of the principle of coupling. These are symmetric with the definition side of abstraction in a pleasing way:

My abstraction vs. your use of it;
Single Responsibility Principle vs. Law of Demeter;
Cohesion vs. Coupling.

Most of the design principles we talk about regularly, like those listed above, are syntactic – they are properties of the text of the code. But syntax is only a means; the thing that really matters is that the semantic model of the code mirror the semantic model in our brains. Thinking about (or being preached to about) the design principles in isolation can make them feel arbitrary; it's much better to view them in the light of abstraction integrity.

Abstraction is important! The result of programming isn't simply a computation; it's also a set of ideas made concrete in a programming language. Nothing can beat the long-term business value of ideas expressed clearly in code.

Refactoring A Cyclomatic Complexity Script

2009-11-16T00:00:00-08:00

I recently blogged about the cyclomatic complexity script I wrote that highlights Python code based on its complexity. The script itself is written in Python, but the code was rushed together. I decided to do some cleanup, recording the process as an example of refactoring.

You can see a lossless copy of the video by logging into Vimeo and clicking "Download Quicktime version" at the bottom right of the page. I recommend it; MPEG artifacting is no fun.

Refactoring A Cyclomatic Complexity Module from Gary Bernhardt on Vimeo.

I make one notable mistake – I leave an unneeded reference to "code_or_node" lying around. Just imagine that I delete it at the very end. :)

The Limits of TDD

2009-11-09T00:00:00-08:00

My last post about TDD generated some great responses, some of which were skeptical. A few common complaints about TDD were brought up, and posed with civility, so I'd like to address them.

Complaint: You weren't stupid enough

When TDDing Fibonacci, we could get to a point where we have this function (and I did write exactly this code in my last post):

def fib(n):
    if n == 0:
        return 0
    else:
        return 1

But why should we write that? Why not this instead?

def fib(n):
    return [0, 1, 1][n]

This comes down to how we define "simple". In TDD, we make tests pass by making the simplest possible change. So, which of the above two is simpler?

Defining that word is our job; TDD as a process says nothing about it. The definition is a huge variable and, in my experience, it's the primary axis along which our skill as TDDers grows once we reach minimal competence. Note that we still have to define "simple" even if we're not doing TDD, but we won't have the test-driving pressure forcing the definition to be refined.

Regardless of how "simple" is defined, we must eventually accept that an arbitrarily long list is not the simplest thing. At that point, we refactor. Depending on the definition of simple, it may take seven tests to get to the final refactor instead of five. So what? Two TDDers need not generate the same tests, and this isn't a problem at all.

Complaint: TDDed tests are prescriptive

This is a complaint that TDDed code does exactly what the tests say it should do, so there might be bugs. If I write the wrong test, the reasoning goes, then it will drive me to write the wrong code.

When would we write the wrong test? Only when we misunderstand the problem. If we misunderstand the problem, and we go straight to the code, then we're be encoding our incorrect understanding directly in the code. That's bad. By writing the tests first, we have some extra protection against misunderstanding: every assumption about what the system should do is encoded as a test, and each test has a good name.

Often, this will point out our confusion during the TDD process – we'll find that we want to write a test whose name contradicts another test's name. Even if we translate our misunderstanding into a bug, however, good test names make it easy to revisit our assumptions later. A subtle, five-character change to the code may have been driven by a sixty-character test name, which will be easier to understand.

Complaint: Choosing tests is hard

When TDDing Fibonacci, I tested fib(0) first. Why did I test fib(1) next instead of fib(37) or fib(51)?

Because it was obvious! The problem domain of a unit test is necessarily small, so it's usually clear what the next step is. If the next step isn't clear, it probably means that the unit under test is too large (making it hard to think about extending it for another case), or that we don't understand the problem well enough (making it hard to think about what the code should do at all). In either case, TDD has just helped us: it's either pointed out a bad design, which we should fix, or it's pointed a gap in our knowledge about the problem, in which case we should put the keyboard away and fill that gap.

Complaint: The code you TDDed was bad

The particular code I came up with in my last blog post was a slow, recursive Fibonacci solution. Two people mentioned this in the comments.

TDD doesn't solve problems like "my run time is superlinear" or "my database loads aren't eager enough." It's not supposed to solve those problems! TDD frees us to solve those hard problems well by (1) pushing us toward a good, decoupled design and (2) providing us with large, fast test suites.

Complaint: TDD requires too much typing

This one has the easiest answer of all: typing is not the bottleneck. Just think about it for a minute. Go back and look at how many lines of code you actually generated yesterday. How long would it take you to type it all in one long burst? A few minutes? Seriously, typing is not the bottleneck.

TDD is not magic

Let's recap:

Complaint: You weren't stupid enough.
Response: There's more than one legitimate definition of "stupid".

Complaint: TDDed tests are prescriptive.
Response: This is a feature. Stating our assumptions up front exposes misunderstandings.

Complaint: Choosing tests is hard.
Response: This is also a feature. It tells us that our design is bad or that we don't understand the problem.

Complaint: The code you TDDed was bad!
Response: TDD does not free us from thinking. TDD is not magic.

Complaint: It's too much typing.
Response: Typing is not the bottleneck.

Many complaints about TDD are complaints that it doesn't solve some problem. These are not problems with TDD – it's not supposed to solve every problem!

Dynamic languages don't make coffee, continuous integration doesn't shine shoes, and TDD doesn't make code scale. It's simply the basis of a solid, disciplined process for building software – a beginning, not an end.

How I Started TDD

2009-11-04T00:00:00-08:00

This story is about the first code I ever wrote with proper TDD. I'd been doing test-first for several months, but I didn't understand the design aspect. Fortunately, Corey Haines wanted to learn Python, and I wanted to learn TDD, so we paired up at a Coding Dojo. It went something like this. [1]

Corey: Let's write a test.

def test_fib_of_0_is_0():
    assert fib(0) == 0

1 test failed; 0 tests passed.

Corey: Now let's make it pass.

Me: Well, we could iterate...

Corey: Why?

Me: Because it's fibonacci...

Corey: The test says it returns zero!

Me: Oh. Well, OK.

def fib(n):
    return 0

1 test passed.

Corey: Let's write another test.

def test_fib_of_1_is_1():
    assert fib(1) == 1

1 test failed; 1 test passed.

Corey: Now let's make it pass.

Me: OK, we need to recursively...

I stop myself. I know what this got me last time.

Me: We can check for which input we got.

Corey: We don't even need that.

def fib(n):
    return n

2 tests passed.

Corey: Let's write another test.

def test_fib_of_2_is_1():
    assert fib(2) == 1

1 test failed; 2 tests passed.

Corey: Now let's make it pass

I pause while I find the correct answer.

Me: Only the zero case is different.

def fib(n):
    if n == 0:
        return 0
    else:
        return 1

3 tests passed.

(I consider the implications of this. "Only the zero case is different." This is an inductive system, so it needs a basis case. Zero is only half of the basis case of a fibonacci sequence, but I never had to think about a basis case or recursion to write this code. The tests showed me what the code needed to do.)

Corey: Let's write another test.

def test_fib_of_3_is_2():
    assert fib(3) == 2

1 test failed; 3 tests passed.

Me: Another if?

Corey: Another if.

def fib(n):
    if n == 0:
        return 0
    elif n < 3:
        return 1
    else:
        return 2

4 tests passed.

Corey: Refactor!

Me: I don't know...

My brain hurts for a moment.

def fib(n):
    if n < 2:
        return n
    else:
        return n - 1

4 tests passed.

The full basis case is in place and we don't even need recursion yet. I'm surprised by how many cases we've written without needing recursion or iteration.

Corey: Another test.

def test_fib_of_4_is_3():
    assert fib(4) == 3

5 tests passed.

Me: It passed without changes. Is that OK?

Corey: Another test!

def test_fib_of_5_is_5():
    assert fib(5) == 5

1 test failed; 5 tests passed.

I think I can handle this now.

def fib(n):
    if n < 2:
        return n
    elif n == 5:
        return 5
    else:
        return n - 1

6 tests passed.

Corey: Refactor!

Me: Combine them into... recursion?

Corey: Combine them into recursion.

def fib(n):
    if n <= 1:
        return n
    else:
        return fib(n - 1) + fib(n - 2)

6 tests passed.

This isn't a perfect example of TDD, but that's not the point. The first thing you need to understand is the rough process: write the smallest failing test you can; then write the smallest code to make it pass; then refactor without changing behavior.

After getting this lesson from Corey, I went off and TDDed a couple thousand lines of code with almost no outside feedback. I was doing it very poorly, and often became frustrated, but in retrospect it was still the best code I'd ever written.

It takes years to learn how to do this well, and consistently, across a wide variety of situations. I've been doing it for two years, and I still have non-trivial problems, but I can almost always move forward confidently.

Building software without TDD was crushingly stressful, but I couldn't see it at the time. It was only shown to me when I started working one test at a time, one line of code at a time, with verification that the entire system is working in less than two seconds.

([1] In reality, the Coding Dojo probably went only vaguely like this, and this isn't even the problem we solved, but that's not the point. This is what the first true TDD session always looks like.)

The Value of Continuous Processes

2009-10-29T00:00:00-07:00

Consider the construction of software:

Bad: create a big design, then turn it into code.
Better: create a small design, then turn it into code, then stop and ponder it.
Best [1]: design software in the shortest iterations possible – on the order of minutes – with failing tests as your guide.

Consider the construction of tests:

Bad: write a lot of code, then test it before a release.
Better: write some code, then test it at the end of the (1- or 2-week) iteration.
Best: write code and tests together, ensuring that all tests pass at all times.

Consider the construction of companies:

Bad: have an idea, write a business plan, get funded, hire a team of 10, build the product, release it.
Better: have an idea, build the simplest version that could possibly work, validate it in the market.
Best: have an idea, validate it in the market without writing any code

Consider the construction of products:

Bad: collect requirements, generate specifications, build software, release it when it's "done".
Better: talk to customer, generate story cards, build software, release it every two weeks.
Best: ???

One of these things is not like the others. The first three processes above deliver value faster, more consistently, and with less risk as they become continuous. Why shouldn't this be true of the entire product cycle?

This is why I'm excited about (1) Kanban and (2) continuous deployment. I think you should be too. Consider it carefully, because so far the score card shows that processes become more effective as the cycles get tighter, and we surely have a lot farther to go.

[1]: "best" is defined here as "best that I know of."

My Personal Failures in Test Isolation

2009-10-28T00:00:00-07:00

My position paper for SDTConf was about test doubles and my problems with refactoring around fully isolated tests.

Digression: Isolation

There are many colloquial definitions of "unit test". When I use the term, I'm almost always talking about a test that executes code in exactly one production class. If it collaborates, it collaborates only with test doubles like mocks, stubs, and fakes. Every test's world contains 30 or so lines of production code. If you've not heard of this, it probably sounds crazy. It's not.

J. B. Rainsberger found my position paper and responded to it. He quotes me:

In my TDD practice with test doubles I’ve found that, now that all code is 100% isolated, it’s almost impossible to refactor across classes with confidence unless I totally rewrite them.

J. B. replies, in part:

I interpret his comment as though that disappoints him. I invite Gary, and you, to consider an alternative interpretation:

Rewriting classes, rather than refactoring them, shows good compliance with the Open/Closed Principle, which encourages me.

Needing to refactor across multiple classes, as opposed to re-implementing everything behind a given interface, probably indicates a layering problem, which I’d expect to notice with duplication in the isolated tests. That encourages me, because I like it when my tests expose design flaws to me.

Abstraction Errors

J. B. is absolutely correct: this is about layers, and my failure to layer my software correctly. I've only been doing TDD for two or three years, and I still make non-trivial, multiple-class-spanning abstraction errors. The layering errors creep in across many tests and I just don't see them early enough. I can feel myself slowly getting better at this, but it's taking a while.

The problem is that I often notice the problem after it already exists, and my dilemma is "how do I fix it?" If I try to replace the hard dependency with an abstraction while avoiding a rewrite of the class, I lose confidence in my tests.

My concern is that that isolation doesn't work well unless you never make certain classes of errors. There's too much coupling in this skill set! It doesn't have a reasonable entry point; you have to be a relentless jerk – as I clearly am – to break in. I want to ease people into isolation without telling them "just wait five years and you'll be fine."

Vertical Changes in Semantics

I have a closely-related example that isn't a refactoring, but displays exactly this problem. I wrote Mote, a test runner for Python. The original design had the suite class collecting all of the tests, then handing them off to the result printer.

I wanted to replace this with a pure pull process, where the printer pulls one example at a time through the entire stack, with the goal being to output the cases as they're evaluated rather than all at once. But the "push" concept pervaded most of the core classes! A context, upon being created, would recursively create its child contexts and examples. A suite, upon being created, would create its contexts. The isolated tests made this change hard! I had to rewrite many core classes to maintain confidence, and I actually never finished because it sent me into a horrible death spiral of self-doubt about isolation!

Digression: Mote

Mote has many more problems than this, and I consider its internals one of my greatest TDD failures. This is sort of depressing, since I failed to effectively TDD a tool that I was writing to help me do TDD. I think I know why the design ended up so bad, but that's a topic for another blog post entirely.

I have a vague notion of the answer to this push-pull problem. The suite was directly instantiating contexts, which were directly instantiating examples. While I was writing it, I could feel that it was wrong. Usually, when that feeling crops up, I know how to improve the design accordingly. In this case, for some reason, I didn't, and I pushed forward because I wanted Mote to be in a working state for my own personal use. Now I have the same problem I've had with refactorings: I've introduced a suboptimal design and I have to improve it, but I'll end up rewriting 100% of the code to change 10%.

I don't need to be sold on isolation or abstraction; I'm already sold, as evidenced by having written a mock library that forcefully isolates your system under test. I'm not even looking for answers to refactoring and vertical-change problems: I know that these classes of problems grow out of a lack of design skills, I know which design skills those are, and I know how to improve them.

What I'm now looking for is how to grow these skills in a person from the ground up. What if there is a way to do those nasty, vertical refactors with higher confidence? Maybe not full confidence, but enough to prevent the horrible death spiral of self-doubt? As things stand, it's very hard to help other people learn these techniques and it just bothers me.

Cyclomatic Complexity in Vim: First Steps

2009-10-25T00:00:00-07:00

I've been kicking around the idea of integrating complexity analysis with vim. Tonight, I spent a couple hours getting a proof of concept working (for Python code, naturally). Each line gets highlighted based on the complexity of the function: green for low, yellow for medium, red for high.

A static image doesn't really do it justice – the highlighting updates whenever you write a file. There's a 15 second video of me demoing it that's much better than the picture above.

This is based on Dave Stanek's cyclomatic complexity code. It's hacked together and the complexity algorithm seems slightly wrong (unless I'm remembering it incorrectly). However, it does successfully compute the complexity and highlight the code, which is good enough for a blog post. ;)

If you're feeling adventurous, you can find the source code on BitBucket.

Dingus 0.2 Is Released

2009-10-24T00:00:00-07:00

Finally, Dingus 0.2 is released. Dingus is a super-permissive record-then-assert mock/stub library that started life as a fork of Michael Foord's mock library. See the original Dingus screencast for more information.

There aren't any huge changes in this release. The biggest changes are:

DingusTestCase has an "exclude" argument to allow you to avoid stubbing some module attributes.
The "one" method is now named "once". E.g., my_dingus.calls('foo').once()"
There's a much better introductory-level example included.
Dinguses are now picklable.

You can download Dingus at the package index and the source code on BitBucket. Enjoy!

A Brief History of BitBacker, A Startup

2009-07-23T00:00:00-07:00

BitBacker was a startup that I founded with two other people in March, 2006. It died for many reasons, including disagreements between the founders, changes in the target market, and changes in the economy as a whole. I never talked about it directly on this blog, but here's a summary of what happened in its three year history:

March 2006

We (Nick Barendt, Michael Branicky, and I) decide to build an online backup product. We're unimpressed with the existing solutions – we want something that can be installed and forgotten about. I'll do most of the development, Nick will do contracting with other companies to get us going, and Mike, busy enough as a tenured professor, will be in an advisory role.

We begin building a "test app" to analyze data churn rates on peoples' hard drives. This is a sort of feasability study, as well as a source of data to help us create the product. I'm still finishing undergrad, so I'm only working part-time.

June 2006 (3 months after founding)

I finish undergrad. We realize that building the test app isn't worth the time: once we get it working, we'd need to get it installed on a lot of machines to make it worth the effort. We aren't confident that we can convince random people to install an application that doesn't help them in any way. We'd rather just build a simple, first-draft backup app. The test app did have some value as a learning experience, though – it had to do a lot of the crawling and hashing that BitBacker would eventually need.

With the test app retired, we start work on BitBacker itself immediately.

August 2006 (5 months after founding)

Apple announces that Mac OS X Leopard will include Time Machine, a system for doing transparent backups to an external USB drive. This is scary, but we decide to postpone our worrying until it's released. Besides, Leopard is a year off, and we'll surely have a product out by then.

Late 2006 to early 2007 (about 9 months after founding)

I learn how to do backups well, mostly by trial and error. A lot of infrastructure shows up around the concept of a snapshot, which captures the state of the entire file system at a single point in time. BitBacker supports many snapshots, so you can step back in time and see what your folders and files looked like yesterday, or two weeks ago, or last January (much like Time Machine, but without the fancy UI).

At this point, BitBacker only has a web interface. You run a little app that has the web server in it, then point your browser at that local server. We realize that this isn't going to fly in the long term, but haven't decided what to do about it yet.

Spring 2007 (one year after founding)

We decide what we're going to do about the UI. I spend a couple weeks learning Cocoa and rebuild the whole client UI in it . At this point, we're committed to supporting only Macs, at least for the initial release. Cocoa makes me realize that, contrary to past experience, building a GUI doesn't have to be excruciating. This is a nice surprise. The early versions of BitBacker are rather silly, but that's because I'm still learning to think like a person who hasn't spent thousands of hours with this software already.

(Early BitBacker UI)

The first alpha is released. It runs live across the internet, but is only used by the founders. Everything is written in Python, including the entire Cocoa GUI.

July 2007 (one year, four months after founding)

We get a real, live office at 1677 E 40th St, Cleveland, OH. The BitBacker sign is still hanging outside the building today. We suddenly own a lot of Ikea furniture.

Sign outside the BitBacker office

Until now, we've been working from our homes and various coffee shops. Having an actual, physical office raises expectations, as well as our burn rate.

Summer and Fall 2007 (about 1.5 years after founding)

The private beta begins. More users are added and I do a lot of technical waffling – having external users has made me afraid of making serious changes. The few problems that are found by users cripple my progress as I agonize over whether I've actually fixed them. Commits happen just as often as ever, but they aren't making many user-visible changes.

At this point in the project, I'm unhappy with the state of the code. I'm more disciplined than I was when we started the company in 2006, and I spend much of this period bringing the code up to my new standards.

October 2007 (1 year, 7 months after founding)

Mac OS X Leopard is released, including Time Machine, and we still don't have a released product. Because Time Machine requires a physical external drive, it achieves limited market penetration. We are relieved. However, the online backup offerings available for Mac OS X are improving. Mozy is getting quite popular, which is scary. Surely we will have a release soon!

January 2008 (1 year, 10 months after founding)

I take over as the organizer of clepy, the Cleveland Python user group, and BitBacker hosts it for the next fifteen months consecutively. Beer is now allowed at clepy meetings, which seems to improve attendance.

February 2008 (1 year, 11 months after founding)

We remove support for multiple machines being backed up with one account. This is a major turning point for me – I'm learning to throw huge chunks of a system away if they don't help the user.

March 2008 (2 years after founding)

We finally add the ability to make asynchronous S3 requests. Before this, the server had to make one S3 request at a time. When a snapshot is completed, the server potentially needs to make thousands of S3 requests quickly, so this is very important for our performance.

Python doesn't have an asynchronous HTTP library (Twisted doesn't count, at least as of March 2008). I write my own by hacking subclasses of httplib.HTTPConnection and httplib.HTTPResponse to use fake sockets (subclasses of [StringIO])(http://docs.python.org/library/stringio.html)) instead of real ones. I use [asyncore])(http://docs.python.org/library/asyncore.html) to do the asynchronous communication, with the hacked httplib subclasses doing their HTTP work before and after the actual communication.

We release version 1.3.0, doubling our user base.

May 2008 (2 years, 2 months after founding)

One of our users backs up around 100 GB, roughly five times more than we'd ever tested up to this point. The client application has some performance problems, and the server takes so long that the client times out (both fixed in later versions), but the backups do complete successfully.

August 2008 (2 years, 5 months after founding)

We finally port to Leopard, which has a totally different Python-to-Objective-C bridge. The previous version of PyObjC builds apps that work in Leopard, but those apps can only be built on a development machine running Tiger. This means that I'm still using Tiger in August 2008, a year after Leopard was released. Porting to the new version of PyObjC takes about a week and is quite painful. After we finish, I'm afraid that we might've missed some corner cases. I'm really hurting for a comprehensive suite of automated full-stack tests.

September 2008 (2 years, 6 months after founding)

Until this point, I've been doing huge manual tests before release. Unit and system tests have been automated since the beginning of the project, but there have never been automated tests that actually drive the UI. I begin writing end-to-end ("acceptance") tests using appscript, which is a Python interface to the event system underlying AppleScript. These tests start the servers, build the app, launch it, and use it as the user would – clicking buttons, selecting menu items, etc. This reduces my manual test process from eight hours to one. It takes about a week and a half to implement all of the tests.

Fall 2008 (about 2.5 years after founding)

We release versions 1.4.0 and 1.4.1, which move BitBacker into the OS X menu bar. Before this, it was a normal application, showing up in the Dock and the Cmd-Tab list. At this point, I realize that building the whole client as a single application was a terrible idea due to the way that OS X "UI Element" applications work, but I can't justify taking the time to fix it. These versions also add the ability to exclude folders from backups and run at startup. We're feeling pretty comfortable about charging money for the app now.

Final BitBacker UI (in menu bar)

November 2008 (2 years, 8 months after founding)

We start looking at subscription payment processing options, quickly realizing that it's much more complex than we realized. We waffle on it.

December 2008 (2 years, 9 months after founding)

A beta user's hard drive dies. He uses BitBacker to restore 17 GB of lost data. It's the first time that a large restore has been done in the wild. There is much rejoicing.

February 2009 (2 years, 11 months after founding)

The open beta begins – beta keys are no longer required for registration. Storage quotas are added (all existing users get 250 GB; new users get 2 GB). Email address confirmation is added. Legal agreements are added. A few people we don't know sign up and try the app.

March 2, 2009 (3 years after founding)

I get sick for several days and, when I get better, I find that I'm disappointed. This makes me realize that I'm not at all happy with my day-to-day life. I talk to Nick about it; we agree that we're both frustrated with the state of the company and disconcerted by the huge change in the competitive landscape since we started. We decide to stop working on BitBacker immediately. The last Clepy meeting at our office is on that night: March 2nd, 2009. We immediately begin selling our office furniture and hardware.

March 8, 2009 (3 years after founding)

We send an email to all existing beta users telling them that we're retiring the product and that the servers will go down on March 20th, 2009.

March 24, 2009 (3 years after founding)

The servers go down forever at 6:46 PM. Their uptime was 369 days. May they rest in peace... or, more likely, immediately be provisioned for some other Amazon EC2 customer. ;)

Lessons Learned

Here are the two biggest lessons I've learned from BitBacker:

Talk about equity early and often. Be frank about your expectations. Never say "we'll work it out later."
Release as early as you can, even if it doesn't do much.

This advice can be found in a thousand places, but it bears repeating. I failed to do these things even though I'd heard them so many times before. Seriously: release early, even if the thing doesn't do much. You'll always think it's not ready, but that's because you live with it every day. If it could possibly have any use to other people, release it!

Software Craftsmanship: Geographical Distribution

2009-06-23T00:00:00-07:00

I often compute statistics from publicly available data, tell myself I'll blog about them, and then never do. For once, I'm actually doing it!

The graph below shows how many software craftsmanship manifesto signatories US states have relative to their total population. Only states with at least five signatories are included.

I'm about to move from Cleveland to Seattle – #4 to #1. Not bad!

Have a look at the source code if you'd like. (It was hacked together; please judge it gently.)

UPDATE: In the comments, Joel Helbling suggested a Google Charts map. Here's one with states colored from red to green for lowest to highest signatories per million residents (source):

Dingus Screencast: A mock/stub library with automatic isolation

2009-04-01T00:00:00-07:00

Dingus is a mocking/stubbing library I've been working on for about a year. It grew out of a now-defunct project's test suite, and I've used it in about 3,500 lines of unit test code. It does two things that are pretty novel:

A dingus allows you to do almost anything to it, including nesting accesses arbitrarily deep. If you have a dingus d you can say 99 * (d.foo.bar.baz() ** 'hello')[15] and you'll just get another dingus out. This lets you use dinguses to replace dependencies in legacy code without thinking about what interface they must conform to.
If you want it to, Dingus can automatically replace your dependencies with dinguses. You just tell it what class you're testing, and it will replace everything else in that class's module with a Dingus. This fully isolates your code under test without requiring any work from you.

The second point above is probably a bit hard to think about from such a short description, so I've created a screencast to show it off. I TDD up a bit of code in the screencast, but it is not intended to be a good example of TDD or test design. I use a lot of stubbing and interaction assertions because I'm trying to show off Dingus' features. In real-world code, you should avoid assertions about object interaction as much as you can.

If you'd like to watch the screencast, it's available on Vimeo, but the version on the page is low quality. If you log in to Vimeo, there's a download link in the bottom right of the page to see it at full quality.

From Twitter: Version Control

2009-02-25T00:00:00-08:00

This blog has been quiet lately, due at least partly to Twitter. In an effort to get it going again, I'd like to start posting some summaries of my Twitter output. This will not be a daily Twitter bridge or anything like that. These are hand-selected from my entire tweet history. This first batch is on version control.

All links have been inserted after the fact, but everything else remains unchanged and in roughly chronological order. Indented entries are continuations of the thoughts in their parents.

For more of my ranting, you can follow me on Twitter.

To everyone who lost power due to the storm: you are now advocates of distributed version control and distributed ticketing. ;)
Recommending git as an intro to DSCM is like recommending C++ as an intro to OO. Just thought I'd throw that out there...
- Good OO can be done in C++, but it takes a lot of learning and is prone to error. The same goes for DSCM and Git. ;)
The way most people use version control is downright offensive. Trailing whitespace changes in diffs? Seriously? Grow some discipline.
Workflow using patch queues: Spike a feature (patch 1), then replace the spiked classes with TDDed ones (patches 2..n), then fold patches.
That workflow gives you: easy spike-to-TDD transition; everything nicely versioned; a single changeset at the end; no history rewriting.
Git was designed by insane space aliens. Whether this is a good or bad thing is a personal preference.
One day I will write an editor-VCS that stores all files as the list of vim commands originally used to create them. <0.95 ;)>
The funny thing about rebase, patch queues, and multiple heads: Once you truly understand one of them, you understand all of them.
Using a DVCS has made me worry a lot about repo size, which I shouldn't have to worry about. I never would've expected this problem.
Wish list: Fancy VCS: When I refactor a test, automatically check that the new one would've failed at the point where I originally TDDed it.
"This is more complex than OpenGL!" - @jleedev, about five words into my explanation of Mercurial patch queues.

Processes spawn faster than threads?

2008-05-30T00:00:00-07:00

In general, processes take longer to start than threads. This makes sense if you think about it – a thread lives within the memory space of its parent process, so it takes less work to set one up. (This is a gross oversimplification, but to be honest I find the details of process management incredibly uninteresting in 2008.) I assumed that this difference would hold for the Python processing module. Apparently it doesn't, at least on Mac OS X. Surprise!

Spawning 100 children with Thread took 1.04s
Spawning 100 children with Process took 0.60s

The above result is for starting and joining the children serially. I get the same results in all of these variations:

Starting them all at once, then joining them all at once.
Using 10 children or 1,000 children.
Having each child sleep for one second (to ensure that they're all actually alive at the same time).

I don't know whether this is due to goodness in OS X, or processing, or fork(), or just Unix in general. In any case, it's very good news. I'd dismissed processing for use on the client side of BitBacker because "process management is hard and they're too heavyweight." Clearly at least one of those complaints is invalid; maybe the other is as well. It would be a wonderful relief if I could use processes. I'm going to need parallelization of one form or another soon, and I'm definitely not going to start sprinkling threads around. Only madness lies down that path.

Here's the code that generated those results, in case you're interested:

import time, threading, processing
for cls in [threading.Thread, processing.Process]:
    start = time.time()
    for _ in range(100):
        child = cls(target=lambda: None)
        child.start()
        child.join()
    print 'Spawning 100 children with %s took %.2fs' % (
        cls.__name__, time.time() - start)

Shell Meme Wins

2008-04-11T00:00:00-07:00

My most common shell commands:

171 hg
144 fg
77 rm
71 ls
38 cd
28 vi
24 nosetests
17 killall
15 tissue
15 python

I tend to keep Vim open for a long time, running many commands from within it. That's why I don't have lots of task switching like Mike. I learned Emacs before Vim, so blame it on that. I also usually run tests from within Vim; otherwise nosetests would definitely be #1.

Tissue is a ticketing system I've been working on. It's super simple – the whole ticket database is stored in a single plaintext file. The idea is to fit in with DVCSes like Mercurial better. Having a single monolithic Trac instance breaks down when you have dozens of repositories, each of which may have certain tickets fixed or not. By storing the ticket database in a plaintext file within the repository, you get (1) explicit ties between code fixes and ticket changes, and (2) free merging of modified tickets when the corresponding code merge happens. I'm about to switch BitBacker from Trac to this, so it will hopefully get released some time.

Human-Readable Encryption Keys

2007-12-28T00:00:00-08:00

For BitBacker, we use 128-bit AES encryption, which means our keys are really long and annoying – 32 characters long when printed in hex. And not only do the users sometimes have to type them in, but they have to write them down on paper. (We can't store the key on our servers because then we'd be able to read the user's files; and we obviously can't trust it to their hard drive because that's what we're backing up.)

Somehow, we have to present these random 128-bit keys to the user, and I think I've found a pretty good way. We use RFC 1751, which defines a "Convention for Human-Readable 128-bit Keys" – basically just a mapping of blocks of bits to strings of English words. Here's an example in Python using the RFC 1751 module in PyCrypto:

>>> key = os.urandom(16) # Generate 16 random bytes (128 bits)
>>> bin_to_hex(key) # Show the key in hex (32 characters)
'61aa60e43a5e7fdb4b86a4897b52a0dc'
>>> y = RFC1751.key_to_english(key)
>>> y # Show the pass phrase version of the key
'BUSY BARN RUB DOLE TAUT TOOK ALTO PRY KIT WALL MUG CURT'
>>> # The transformation is always reversible
>>> bin_to_hex(RFC1751.english_to_key(y))
'61aa60e43a5e7fdb4b86a4897b52a0dc'

The keys are still very long, of course, and this is unavoidable for our application. But when translated to words, I think it's easier to write them down or type them in without making a mistake. The image below shows BitBacker giving me a pass phrase. (This feature hasn't even gone into beta yet – it's little more than a mockup. So please don't judge it too harshly!)

When the user clicks "Continue" here, BitBacker actually makes him re-enter the generated pass phrase he wrote down. To be honest, BitBacker's pass phrase handling is quite annoying. But that's a heck of a lot better than losing your pass phrase, which would make your backups inaccessible! This is the one place in all of BitBacker that isn't optimized for "least user annoyance". Encryption keys are just way too important to mess around with, and I think that most existing software is far too lax with them (including BitBacker's competitors).

(This was derived from a comment I left on Jeff Atwood's "Software Registration Keys" post.)

My blog woes have been soothed

2007-12-22T00:00:00-08:00

It seems I've mostly solved my blog woes. I got some quite helpful replies (still visible on Blogger, although the comments didn't come over to my new blog). I also got emails from Will Guaraldi about PyBlosxom, and from Lloyd Dalton about blog_my.

I took at least a brief look at each system mentioned in the comments and emails, but I decided on PyBlosxom. If you're reading this in a web browser, what you're seeing is PyBlosxom rendering a theme I ported from Tumblr, with all of my old Blogger blog's content imported. Quite frankensteinian indeed, as far as blogs go.

It turns out that my impression of PyBlosxom's size when I wrote my "blog woes" post was a bit off – I didn't realize just how little functionality resides in the core. It's pretty slim, but with a decent selection of plugins. I only needed tags, wbgarchives, and metadate, but there are plenty more for those who want more features. With the tag and metadate plugins, I managed to keep my blog posts in almost exactly the format I've always used, so that was nice.

PyBlosxom nicely solves my biggest concern, which I didn't explicitly state in my original post: I want to keep all of the files related to my blog in a Mercurial repository. I've succeeded in that – my entire blog is in Mercurial now. That includes configuration files, the .htaccess file, the template, the entries, and even the queue of unfinished entries. If I ever need to, I should be able to move the blog to another host in a matter of minutes. Not that I ever intend to leave WebFaction (note: that's an affiliate link), which is where it's happily hosted now.

With that all out of the way, hopefully I can quit the detestable practice of metablogging, which I'd managed to avoid for my entire first year. Thanks to everyone who made a suggestion, and special thanks to the PyBlosxom developers.

Blog woes

2007-12-19T00:00:00-08:00

Last night, I switched my blog to tumblr; this morning, I switched it back to Blogger. Over the last two days, I've spent a ridiculous amount of time and effort trying to make the switch without breaking any links. I'll spare the details, but it involved a lot of mod_rewrite, a PHP script written by Henrik Nyh that proxies all requests to tumblr, and a huge list of URLs mapping the old Blogger ones to the new tumblr ones.

I found the proxy-with-a-PHP-script thing distasteful, but the lack of decent tag support was the thing that ultimately made me give up. I have a Python-specific feed that gets aggregated by the unofficial planet python, so whatever I switch to needs to be able to generate tag-specific feeds. Tumblr does let you add tags to your posts, but it doesn't seem to actually do anything with them.

So, I'm in search of blogging software. I want something that:

Is written in Python or Ruby. Python because I know it very well; Ruby because I want to know it better.
Is simple. I'm not interested in anything built in Django or TurboGears or Rails. Preferably, it would be something that runs on my local machine, generating the static files that make up the live blog.
Allows custom URLs but has sane defaults. (I.e., no exposed serial integer keys.)
Supports tags and can generate subfeeds based on them, as well as human-readable post lists that are filtered by them. (For example, with my current blog you can look at only Python posts if you want to.)
Does not involve a database. (Not even SQLite.)
Reads the posts out of plain HTML files, which I will write by hand.

I've looked around for something like this, but everything seems to be either big (e.g., Pyblosxom and Typo) or someone's weekend project that never got touched again. I'm sure that the bigger ones are quite good at what they do, but I just want something that takes my plaintext files and generates an appropriate URL structure. It can do it offline or online I don't care but it's got to be simple and require no complicated installation or configuration.

So, any ideas, or am I starting a new project?

Tubes by proxy

2007-10-09T00:00:00-07:00

On Friday, I am bored and want video games. But no Internet service! How to download new games to Wii?!

Assets:

One VX8300 cellular telephone.
One Macintosh Book Professional.
One Nintendo Wii.

Solution:

Connect VX8300 to MacBook via Bluetooth.
Configure MacBook to use VX8300 as a modem for free. This is an undocumented Verizon feature!
Put MacBook in ad-hoc wireless mode.
Enable internet sharing on MacBook.
Inform Wii of wireless network.
Access Wii Store!

Full data path: Internet -> Verizon Wireless -> CDMA 2000 (1x) -> VX8300 -> Bluetooth -> MacBook -> 802.11g -> Wii.

Operation was a success. Breath of Fire II acquired.

Undeletable zombie files on OS X

2007-09-13T00:00:00-07:00

Somehow, one of BitBacker's system tests occasionally creates an undeletable file. It happens within a test that generates random filenames, then tries to back up and restore them. Unfortunately, there are apparently some character sequences that make HFS+ choke. I've only had this happen twice ever – once on my 17" MacBook Pro, and just now on my brand new 15" MacBook Pro. Here's what the file looks like:

It can't be deleted from Finder:

It can't be deleted from the console:

And it can't even be "ls"ed:

If I manually move the containing folder to the trash and try to empty it, I get this, which is even more ridiculous than the rest:

I've googled for undeletable files on OS X, as well as for the specific error messages Finder throws, and tried every suggested fix I found. And, of course, I've tried to delete the file from within Python:

>>> os.unlink(os.listdir('.')[0])
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  OSError: [Errno 2] No such file or directory:
  '\\xe4\\xad\\xa9\\xe1\\x84\\x84\\xe1\\x85\\xaf\\[...truncated]'

Dear Apple: please replace HFS+. Seriously, it sucks.

Globals and cargo culting

2007-09-07T00:00:00-07:00

Matt Wilson wants a module's functions to log to one logger, but he can't change their interface and he doesn't want to use a global variable. This is the kind of thing that decorators are very good at, for better or worse. Here's a decorator that will do the job:

def with_logger(fn):
    def new_fn(*args, **kwargs):
        logger = get_singleton_logger()
        return fn(logger=logger, *args, **kwargs)
    return new_fn

And here's how to use it to define a function that gets a logger instance without the caller passing it in:

>>> @with_logger
... def add(x, y, logger):
...     logger.warning('x + y = %i' % (x + y))
...
>>> add(1, 2)
WARNING:foo:x + y = 3

This seems to be exactly what Matt was looking for: there are no globals, a logger gets injected, and the function's interface hasn't changed. But is it a good idea?

No, it's a ridiculous idea! It's just a reimplementation of global variables! All I've done is come up with a complicated scheme for injecting a single logger instance into every function in the module. But that's exactly what a global does! This is something that programmers have done over and over again in the name of OO. Everyone wants globals, but they go to great lengths to hide it.

Here are three possible ways to solve the original logger problem:

Use a singleton, and have each function retrieve the instance that way.
Use a decorator that injects the instance into the function's argument list each time. In with_logger, I combined this with a singleton.
Just use a global.

If you choose (1) or (2), the joke's on you. You're still using a global, but now you have two problems: global state and a complicated method for managing it. There's no need for that, because we already have a simple method for injecting instances into a module's functions: globals!

Of course, sometimes singletons or decorators like with_logging make sense, but only when they actually do something. If all they do is allow multiple functions to access a single long-living instance, they're dangerous and needlessly complex. In almost all cases, singleton and related techniques are nothing more than cargo cult programming.

When JSON isn't JSON

2007-07-23T00:00:00-07:00

JSON is so simple that you can specify it on an index card, but we still can't get it right. For example, here's what happens when simplejson and python-cjson talk about slashes:

# simplejson correctly decodes cjson's data
>>> print simplejson.loads(cjson.encode('/'))
/
# cjson fails to decode simplejson's data
>>> print cjson.decode(simplejson.dumps('/'))
\/

In this case, the problem is that cjson doesn't handle backslashes correctly. There are two ways to say "/" in JSON: "/" and "\/". When encoding, simplejson always escapes slashes, but cjson never does:

>>> print simplejson.dumps('/')
"\/"
>>> print cjson.encode('/')
"/"

The reverse is also true: simplejson knows how to decode "\/", but cjson decodes it incorrectly:

>>> print simplejson.loads('"\/"')
/
>>> print cjson.decode('"\/"')
\/

So there you go: simplejson and cjson don't interoperate. This bit me when I tried to move BitBacker from simplejson to cjson for performance reasons. The live alpha server had a few thousand records encoded with simplejson, all of which included slashes. When I switched to cjson, everything broke because every "/foo/bar" entry in the database came back as "\/foo\/bar".

As far as I'm concerned, this problem with JSON is actually an argument for simple data formats like JSON. If we can't get full interoperability between something as stupidly simple as JSON, how did anyone ever expect WS-* to work?

Adobe, master of the painful install

2007-05-27T00:00:00-07:00

After using Macs for about two years, I've become accustomed to the painless installation process there: open the dmg, drag the app to the Applications folder, and you're done. Unfortunately, I recently had to install Acrobat Reader to electronically sign some insurance documents. They have managed to come up with a downright pathological installation process:

Download AdbeRdr80_DLM_en_US_i386.dmg.
Open the dmg.
DownloadReader.pkg is inside; open it.
Click through some stuff.
Wait for the "Adobe Reader Download Manager" to download
another 23 MB dmg.
The "Download Manager" has silently quit (what?), but the dmg is now mounted. Open "Adobe Reader 8 Installer.app" and wait for it to finish.
The "Installer" has silently quit (what?), but it must be done, because Acrobat is now running.

This is amazing. When I saw the very first pkg file, I was annoyed: I hate installers and they're rare in the Mac world. As I kept installing and installing, it was almost surreal. In the above list, there are three times where something is being installed. Working back from Acrobat itself,

"Adobe Reader 8 Installer.app" installs "Adobe reader.app", so it's the installer.
The "Download Manager" installs that, so it's the installer installer.
DownloadReader.pkg installed that, so it's the installer installer installer.

So, while most Mac programs have shed the archaic notion of an installer altogether, Adobe Acrobat Reader has a program that installs a program that installs a program that installs the software you actually need.

Texturing and programming

2007-05-22T00:00:00-07:00

Flayra just posted a video of one of the Natural Selection 2 artists texturing a 3D model. I've done a bit of modeling and texturing (poorly, of course), so it's awesome to watch someone who's really good. It also reminded me of how I work on RESTdb, though. I always have my heavily modified nosy running in its own terminal window, and it gives me constant feedback on the effects of every little change I make.

In the video, it's obvious that constant feedback is critical to the process. If the artist tried to create the entire texture without seeing it applied to the mesh, the process would take much longer, yield poorer results, or probably both. As far as I'm concerned, the same applies to programming: if you use a bit of TDD, along with a continuous test runner like nosy, the programming process becomes more organic and approachable. It becomes easier to break the problem down into pieces that can be tackled easily, and you have a better sense of the scope of your changes.

If you haven't tried nosy yet, do it! It's very easy – just download nosy.py, stick it in your project's directory, and run it. It's only about 25 lines long, but it can have a huge impact on how you work. If you're still not convinced, check out Jeff Winkler's screencast.

Are your tests lying to you?

2007-04-07T00:00:00-07:00

If you've written a test for a module, and the module is changed in the future, there are three things that can happen:

The test keeps passing because nothing is broken. (Good.)
The test fails because something is wrong. (Great – this is the test's job!)
The test keeps passing, but it silently stops testing the thing it claims to (BAD, BAD, BAD!).

Scenario 3 above is very dangerous, and it's a major problem in testing. What you have in that situation is a lying test: it says "I'm testing feature x," but actually passes without doing so. In other words, you have a test that no longer warns you if you break something.

If you've not been bitten by this, it might not be an obvious problem. To make it a little more clear, let's look at a toy example (in Python, of course!) Here's a silly WebClient class and its test.

class WebClient:
    """An HTTP client that supports both SSL and plain connections"""
    def __init__(self):
        self.use_ssl = False

    def get(self, url):
        # Hand any request off to external functions
        if self.use_ssl:
            return get_with_ssl(url)
        else:
            return get_without_ssl(url)

def test_web_client():
    # Make sure everything works with normal HTTP
    client = WebClient()
    assert client.get('/') == expected_data #defined elsewhere

    # Make sure everything works with SSL as well
    client.use_ssl = True
    assert client.get('/') == expected_data

This works fine – the test passes and it tests what it claims to. But what happens if someone renames the use_ssl attribute later?

class WebClient:
    def __init__(self):
        self.using_ssl = False

    def get(self, url):
        # Hand any request off to external functions
        if self.using_ssl:
            return get_with_ssl(url)
        else:
            return get_without_ssl(url)

Take a look back at the test. It's no longer testing what it claims to, because "use_ssl" no longer means anything to WebClient. The test still passes, though – it's just that neither of the two get() calls actually uses SSL.

This is a serious problem – you need to be able to trust your tests, but for all you know your tests are giving you false positives. The question, then, is how can we detect this type of mistake? Well, there is a simple method that will catch at least some of them. What you need is a meta-test: a test that ensures that the tests aren't lying to you. It's really not that bad; here's the pseudocode:

for each test in the suite:
    for each line of code that isn't an assertion:
        remove that line of code (but not the rest)
        run the test and make sure that it fails

Basically, this meta-test is ensuring that every line of code in the test is required: removing any line should cause the test to fail. This sounds complicated, but it only has to be implemented once. Once it exists as a nose plugin, for example, you can use it without writing any extra code.

Let's look at how this would affect the example. Here's the testing code again:

def test_web_client():
    # Make sure everything works with normal HTTP
    client = WebClient()
    assert client.get('/') == expected_data #defined elsewhere

    # Make sure everything works with SSL as well
    client.use_ssl = True
    assert client.get('/') == expected_data

The meta-test will step through, removing each relevant line and making sure that the test fails. The only executable lines that aren't assertions are 3 and 7. When it removes line 3, the test will fail because "client" won't be defined. So that iteration of the meta-test passes. When it removes line 7, the test will still pass. Because the test passes with a line removed, the meta-test will fail. The meta-test has detected the fact that line 7 isn't necessary, which is a red flag that says "this test might lie to you later!"

It's important to note that the meta-test will fail even when the test is working. It really is a meta-test: it's only testing the test. This is a good thing. It tells you when you've written a crappy test – a test that isn't paying enough attention.

Let's return to the example and try to fix it. To make the meta-test pass again, the test could be changed to be more sensitive to WebClient's state:

def test_web_client():
    # Make sure everything works with normal HTTP
    client = WebClient()
    assert client.get('/') == expected_data #defined elsewhere
    assert client.use_ssl == False

    # Make sure everything works with SSL as well
    client.use_ssl = True
    assert client.get('/') == expected_data
    assert client.use_ssl == True

Now the meta-test passes, and the original test_web_client is more resilient to silent failures. If someone renames WebClient's use_ssl attribute, the test won't silently stop testing like it did before. Instead, line 5 will raise an exception and the test will fail.

Of course, this isn't foolproof. If you added line 10 but not line 5, you wouldn't be doing yourself any good (figuring out why is left as an exercise for the reader :). The meta-test would still pass, though, and you would still have a test that may lie to you in the future. So this meta-testing method isn't a magic bullet that will force you to write good tests. For a careful tester, though, it throws up a red flag for tests that might be susceptible to very subtle errors.

(Nitpicker's corner: Yes, the problem in this test was caused by questionable design in WebClient itself. Using an instance variable to control a class's behavior in this way is error-prone to begin with. This testing problem also arises in much more subtle situations, though; I have the scars to prove it.)

Zero to Slashdot in Three Days

2007-03-06T00:00:00-08:00

The Genesis

A few days before PyCon, Brian suggested that we build a web app in one night. It took a little longer than that to polish it up, but we launched sucks-rocks.com on Tuesday. Since then, it's had over 40,000 page views and been slashdotted (OK... it was the Japanese Slashdot, but it's still a Slashdot.)

Sucks/rocks rates the terms you enter by doing web searches and counting results. For example, if you search for "Windows sucks" using Google, you'll get many more results than for "Windows rocks". The opposite is true for FreeBSD. From this, we can infer that people probably like FreeBSD more than Windows. The actual searches that are done by sucks/rocks are more complex than this, but they follow a similar pattern.

The Search Engine Arms Race

Once we started getting a lot of traffic, it was very hard for us to keep sucks/rocks going because we kept running out of searches. Here are the search APIs we used, in the order that we added them:

Search Engine	Queries/Day	Interface	Suckiness of Results
Google	1,000	SOAP	Low
Yahoo	5,000	REST	Pretty low
live.com	10,000	SOAP	IMMEASURABLY HIGH!

We started with Google, but ran out of queries before we even launched. We then used Yahoo, but ran out when 100shiki.com linked to us, forcing me to add support for live.com. Unfortunately, live.com's search results are terrible. Terrible! If you search sucks/rocks for "lord of the rings", you'll get a "?" back. This means that the engine whose results are cached (which is live.com, of course) reported that there were 0 "total results available". Great.

Now we have a cache of almost 60,000 searches, most of which are from live.com. Many of those are totally wrong, of course. My next task is to add a background thread that slowly replaces all of the cached live.com results with Yahoo results.

The Code

Sucks/rocks runs on top of web.py, but only uses it for URL dispatching. Paste does the HTTP serving, with WebFaction's Apache instance on the front end (disclaimer: the WebFaction link is an affiliate link). This simple setup handled about a million HTTP requests in four days, using less than 5% of the CPU almost all of the time (except when it was at the top of slashdot.jp).

Easy Come, Easy Go

With our slashdotting over, We've gone from 10,000+ pageviews per day to about 1,000. Slashdot giveth, and Slashdot taketh away. That's OK, because I need some time to push all of the crappy live.com results out of the cache anyway.

(Brian has also posted about sucks/rocks: 1, 2.)

PyCon 2007: The Untold Stories

2007-02-27T00:00:00-08:00

Most of the PyCon posts are about the sessions, so here are some of the interesting things I did outside of the scheduled talks. I have pictures for many of them thanks to Mike Pirnat's diligent photography.

Pagoda CMS

Brian, Chris, and Ian demoed Pagoda, their upcoming open-source CMS. It's very user-centric, and they're spending a lot of effort on the user experience. Even though I don't use CMSes, I'm excited about this project because I'm so sick of crappy UIs. Peoples' responses seemed positive, but I think some people were disappointed that Pagoda takes the easy-to-use approach rather than the kitchen-sink approach. That's ok; that's why we have Zope – the kitchen sink is there for the taking!

Python Is Basically DOS, Right?

I headed up to my room to grab my hoodie, and on the way back I was in the elevator with a 40ish couple. They asked me what this conference was about; I told them it was about Python, which is a programming language. The guy asked me whether "that's anything like DOS". It was kind of funny, but mostly just jarring. After being in close quarters with lots of smart programmers for 2 days, it was weird to suddenly talk to someone whose computer experience apparently began and ended around 1990.

The Mysterious Ellipsis

Dave, Mike, and I were at the hotel's bar, and the topic of Python's ellipsis operator ("...") came up some how. From the grammar in the slicing docs, we could tell that the ellipsis could appear in slices, but we couldn't trick Python into taking it without throwing an exception. I figured it out later – in a slice, the "..." token is just translated into an "Ellipsis" object:

>>> class Foo:
...     def __getitem__(self, x):    
...         return x
... 
>>> f = Foo()
>>> f[1:2:3]
slice(1, 2, 3)
>>> f[...]
Ellipsis
>>> f[1, 2, ..., 100]
(1, 2, Ellipsis, 100)

Apparently, it's mostly used for numeric stuff like Numpy. I definitely understand Python's slicing much better after that confusing night.

Mischief on The Open Space Board

Brian and Chris posted a "Python in The Adult Entertainment Industry" card on the open spaces board with my name on it. It was up for about 20 minutes before Brian pointed it out to me and I took it down. Hopefully, I escaped without too many prominent Python hackers associating me with pornography.

I have to wonder whether anyone saw that card and was actually interested in going to the session. Maybe Chris and Brian's silliness prompted an interesting discussion of Python and porn somewhere...

RESTDB

The open space I actually did lead was on "REST, Databases, and RESTful Databases" rather than pornography. Unfortunately, I dove into explaining RESTDB right at the start. It turned out that not everyone was familiar with REST, or convinced of its usefulness, or both. So, we ended up talking about REST for the second half. I think the session would've been more useful to everyone involved if we'd discussed REST first, then moved on to RESTful databases. I'm not sure how much everyone else got out of it, but I learned a lot about how to explain what RESTDB is and why we might want it.

Django vs. The World

I didn't fly back until Monday, so I was still there on Sunday night. Most of the people who were still there were staying for the sprints, so the conference area was pretty quiet as everyone quietly hacked away (with the exception of the Wii room).

I was on my way back to the "quiet room", which was full of Django guys. On my way there, a big group of people appeared and asked where the Django guys were. I pointed them towards the quiet room and joined them on their way there. The group was made up of TurboGears guys, Pylons guys, Paste guys, and some that I didn't recognize. They busted into the Django room and caused some friendly commotion, with one notable result being this post on Ian Bicking's blog. I'm pretty sure that EWT's bathtub full of alcohol (pictured) was a factor in this incident.

Magic URL Mapping

After the ruckus in the quiet room ended, I hacked up some crazy URL mapping code based on an experiment Brian did a while back. Here's a controller defined using it:

class UserController(_/'users'/User):
    def get(self, user):
        return dict(
            email=user.email,
            name=user.name)

The _/'users'/User part defines the controller's URL, and User is actually a RESTDB resource type. So, for example, if someone requests /users/Bob, this controller will be invoked and the Bob RESTDB resource will automatically be retrieved and passed in. This works for multiple records, so you could also have more complex controllers like:

class BlogController(_/'users'/User/'blogs'/Blog):
    def get(self, user, blog):
        assert blog.user == user # yep!

BlogController would be called for URLs like /users/Bob/blogs/TheBobBlog and, once again, both Bob and his blog would automatically be pulled out of the database. Of course, it's fully RESTful (hence the get method).

Keep in mind that this is just a silly experiment; please don't freak out because I'm overloading division to produce a URL mapping object that I then subclass. (Although, to be honest, the code isn't that bad; it's only about 60 lines long.)

Overall, PyCon was awesome, and I'm really glad I went. It's going to be in Chicago next year, so I won't have to lose two full days to travel (awesome!)

Introducing Another Wildly Ambitious Database Project

2007-02-17T00:00:00-08:00

One week ago today, I started hacking on a new project: a database implemented as a RESTful HTTP service. Brian has been pestering me to post about it since before any code was even written, so here we are.

I've been calling the project RESTDB, but that's only because I haven't come up with a better name yet. It's sort of relational, but not quite. Depending on how you squint, you might think it is. Likewise, it's not completely RESTful: it lacks arbitrary POST semantics. Despite these caveats, it's quite similar to both RDBMSes and RESTful systems. Let's have a look.

Defining the Schema

We're going to build a multi-user TODO list (basically stolen from Brian's TurboGears tutorial). As with a traditional RDBMS, the first step is to define our schema:

class User(Resource):
    email = String(key=True)
    lists = List(Link('TodoList'))

class TodoList(Resource):
    id = Integer(key=True)
    title = String()
    items = List(Link('Item'))

class Item(Resource):
    id = Integer(key=True)
    value = String()

Obviously, this looks a lot like a SQLObject table definition. One thing is very different, though: the way resources are related to each other. In SQLObject, each TodoList would have a foreign key that points to its User. In RESTDB, this is inverted: the User contains a list of links that point to TodoLists. In SQL terms, you can think of this as a list of foreign keys. This has many serious implications, both for the database's implementation and for how clients interact with it. In the interest of brevity, I will valiantly gloss over every single one of them for now.

The Data

Now that we've defined our schema, let's see what's going on from an HTTP perspective. The definition above will lead to a URL structure like this (with arbitrary example records inserted):

/User
/User/me@example.com
/TodoList
/TodoList/1
/Item
/Item/1
/Item/2
/Item/3

These resources' structures are dictated by the resource classes we defined above. Resources are stored as simple JSON data, so they're human readable even in their raw form. For example, here's what "/TodoList/1" might look like:

{
    'id': 1,
    'title': 'Groceries',
    'items': [
        '/Item/1',
        '/Item/2',
        '/Item/3',
    ]
}

It's just plain old JSON data, but it follows our schema: there's an ID, a title, and a list of item links. You don't even need a database client program to look at it; just point your web browser at the resource and you'll get back the JSON representation. You can download the whole database with wget if you want to.

The Client

Of course, this database isn't designed to be used by humans directly; human readability is just a nice bonus. wgetting your database is a neat gimmick, but what we really care about manipulating it with code.

To illustrate how simple the client is, here's the entire client-side definition for our todo list database:

c = Client('127.0.0.1:17321')
User = Resource(c, 'User')
TodoList = Resource(c, 'TodoList')
Item = Resource(c, 'Item')

That's it: all you have to do is tell it the names of the resources. Note that this doesn't mean that there aren't constraints on the data – there are! Lots of constraints – all the constraints you care to define! It's just that they're only on the server side. If you step out of line, the server will slap you with an "HTTP 400: No Shenanigans Allowed".

We'll get to the shenanigans in a minute. First, let's try the client out by creating a user:

>>> me = User.post(email="me@example.com", lists=[])
>>> me.email
u'me@example.com'
>>> me.lists
[]

All we do is POST a new user resource with an email address and no todo lists. This is literally just an HTTP POST to /User. The database responds with an HTTP "Location" header to tell the client that the new resource is at "/User/me@example.com".

Now that our user is securely fastened to the database, let's create a todo list with some items:

>>> # Create a todo list and assign it to the user
>>> my_list = TodoList.post(title="Groceries", items=[])
>>> me.lists.append(my_list)
>>> me.put()

>>> # Create some items and add them to the todo list
>>> i1 = Item.post(id=0, value="Milk")
>>> i2 = Item.post(id=1, value="Eggs")
>>> i3 = Item.post(id=2, value="Bread")
>>> my_list.items += [i1, i2, i3]
>>> my_list.put()

We POST a new TodoList, just like we POSTed a new user before. Then we have to update the user resource to point at the new list. me.lists is just a plain old Python list, so we append the new todo list, then PUT me to update the server's copy. We then repeat the same process to add items to the todo list.

Light's Green; Trap's Clean

Now that we've trapped a bunch of data in our database, let's start from scratch and pull the todo list's items back:

>>> me = User.get("me@example.com")
>>> print [item.value for item in me.lists[0].items]
[u'Milk', u'Eggs', u'Bread']

Awesome.

But wait, I've conveniently left a loose end untied! I claimed that shenanigans were strictly forbidden. So far, we've been acting nice and giving the server exactly what it wants. Now let's try to feed the server some crap:

>>> me = User.post(email="you@example.com", lists=123)
Traceback (most recent call last):
  ...
client.BadRequestError: "123" is not a list
>>> me = User.post(screw_you_server="EXPLODE!")
Traceback (most recent call last):
  ...
client.BadRequestError: Didn't expect field "screw_you_server"

It's having none of it! It will snub your stupidly-formed data all day long. And it's not just simple things like types that are enforced. You can define regex constraints for your strings, ranges for your numbers, and whatever else you can dream up. You can suffocate your precious data with constraints. Your links are guaranteed to reference valid resources; your URLs are guaranteed to match your data; your data is guaranteed to match your schema.

Everything I've shown here is real, working code. A few of the things I've mentioned, like link validity constraints, aren't done yet, but they're coming. Unfortunately, I can't point you at subversion just yet, because I don't have anywhere to host it. That will hopefully change soon, and you'll be able to prod it for yourself. For now, you'll have to make do with imagining how awesome it would be to speed up your database by sticking a plain old HTTP caching proxy in front of it.

Unicode Weirdness

2007-02-14T00:00:00-08:00

I'm writing some tests to verify that BitBacker doesn't explode if it sees unicode filenames. For a while, I thought that OS X's terminal wasn't unicode-aware, because non-ASCII unicode characters just showed up as "?":

grbmbp:~ grb$ ls z*
z???       z??????    z????????? z???       z???

Then I happened to pipe the ls through a grep, and the unicode characters printed correctly:

grbmbp:~ grb$ ls z* | grep '.*'
z໐
z두
z툃
z䌨
z冕

What? Well, I guess I'll take it...

While posting this, it got even more fun. All five of the characters above print normally in the terminal and Finder, but only two print normally in Opera's text edit control. I wonder how many will show up once this is published. Can you see them in your RSS reader and/or browser?

Update: After publishing, I viewed the page in Safari and the characters displayed exactly like they did in the terminal and Finder. So at least Opera didn't mangle the bytes. However, Firefox draws the first three incorrectly (the same three that Opera couldn't draw at all). Unsurprisingly, IE7 can't draw any of them.

Mutable State Is Manual Memory Management

2007-02-07T00:00:00-08:00

The Haskell Sequence linked to my post on C# (although they obviously don't share my opinions on it). I'd never read that site before, and I came across this "quote of the week" while glancing over it:

"Mutable state is actually another form of manual memory management: every time you over-write a value you are making a decision that the old value is now garbage, regardless of what other part of the program might have been using it."

(Paul Johnson via The Haskell Sequence)

My first thought upon reading this was "why the hell didn't I think of that?" It's obvious in retrospect, and is a wonderfully concise explanation of why side effects are a bad idea. I always try to limit my use of side effects, but I never had a simple way to explain to myself (or others) why that's a good thing. Thanks, Paul.

Measure the Goodness of Your Code

2007-02-06T00:00:00-08:00

Because I want to keep my code short, I like to see statistics on my sandbox before committing to subversion. At first, I just did things like svn diff | grep '^+' | wc -l, but that got old fast. So, I wrote a little script called gn (for "goodness") that computes some simple statistics. Here's what it says about my sandbox right now:

grbmbp:~/trunk grb$ gn
591 lines of diff
129 lines added
186 lines removed
-57 lines net change

I've added 129 lines and removed 186, which is a net change of -57. Any line that's simply replaced will result in one line removed and one added, for a net change of 0. The implication here is that the more negative your "net change" is, the better. (DISCLAIMER: Please don't take this literally and post angry comments.)

Sometimes I want to know the goodness of a bunch of related changes, so the script can also take any arguments that "svn diff" can take. It just passes them on, so you can compute statistics across revisions, etc.:

grbmbp:~/trunk grb$ gn -r420:451 
4246 lines of diff
1099 lines added
1627 lines removed
-528 lines net change

As you can see, I've been killing a lot of code recently. The script is below, in case you want to try it for yourself. I've only tested it on OS X; YMMV.

#!/usr/bin/python
import sys, os, re

svn_args = ' '.join(sys.argv[1:])
pipe = os.popen('svn diff %s' % svn_args)
diff_lines = pipe.readlines()

# Added lines start with '+' (but not '+++', because that marks a
# new file).The same goes for removed lines, except '-' instead of
# '+'.
added_lines = [line for line in diff_lines
    if line.startswith('+') and not line.startswith('+++')]
removed_lines = [line for line in diff_lines
    if line.startswith('-') and not line.startswith('---')]

print '%i lines of diff' % len(diff_lines)
print '%i lines added' % len(added_lines)
print '%i lines removed' % len(removed_lines)
print '%+i lines net change' % (len(added_lines) -
                                len(removed_lines))

C# 3.0 Looks Promising

2007-01-24T00:00:00-08:00

At CodeMash last week, I learned quite a bit about C#'s new features from Scott Guthrie. I'm no fan of explicitly typed languages , but C# 3.0 has me very excited. It's getting a host of new features inspired by other languages, including anonymous types, type inference, and lambda expressions. There's plenty of discussion of these around the web, so I'm not going to rehash them all. I want to highlight the two features that I'm most excited about: type inference and LINQ.

Type Inference

C# 3.0 adds basic type inference, but it's quite weak and can only infer types on assignment. Basically, you can do these types of things:

var i = 5;
var s = some_function_returning_a_string();
foreach (var s in list_of_strings) { ... }

Note that the "var" keyword does not make the variables dynamically typed – it just tells the compiler to infer their types based on what's being assigned to them. This is nice, but it's very limited. You can't, for example, declare a function with a return type of "var" and expect the compiler to figure it out. Haskell this definitely is not, but it's a huge step forward for C's verbosity-laden children.

LINQ

Let me get this out of the way: LINQ is awesome and I want it in my language.

In a nutshell, LINQ allows you to use declarative, SQLesque syntax in your C# code. I've been wishing for this feature for a long time, but Python's list comprehensions have had to hold me over so far. Here's a very simple example of what you can do with LINQ:

int[] numbers = {5, 4, 1, 9, 8};
var low_nums = from n in numbers
               where n < 5
               select n;

Here's an equivalent example in Python:

numbers = [5, 4, 1, 9, 8]
low_nums = [n for n in numbers if n < 5]

LINQ is far more powerful than Python's list comprehensions, though. It has many of the features of SQL: joins, grouping, count() and sum(), etc. And a single LINQ expression can query a collection of objects, a database, or XML data. This makes me drool.

Most of the discussion of LINQ at CodeMash boiled down to excitement over the idea of writing statically-typed, compiler-checked database queries. You can probably guess that this is not the reason I'm excited. For most interesting problems, the limiting factor is not whether you get your queries correct or not; it's whether you can coax your brain into (1) solving the problem and (2) translating the solution into code. Any feature that simplifies your code makes step (2) easier, and LINQ is definitely such a feature. LINQ is the only feature of any explicitly-typed language that I am covetous of. It really is that awesome, and Microsoft has managed to beat every other language to the punch with this.

Back to Reality

OK, enough gushing. I still dislike explicitly typed languages. These new features certainly aren't going to get me to leave dynamic languages behind – C# is still incredibly verbose when compared to Python, Ruby, etc. And, even though many of these features were borrowed from functional and dynamic languages, C# 3.0 is still as statically typed as ever. However, I have a newfound respect for Anders Hejlsberg and friends. If Microsoft manages to drag the hoards of C# programmers toward a more concise coding style, the world will be a much better place.

Why dynamic typing is useful

2006-12-23T00:00:00-08:00

Steve Yegge's Parabola has thrown a little more fuel on the static vs. dynamic fire. The comments so far have been mostly constructive, which is a nice surprise. However, there has been some back-and-forth about dynamic typing, mostly concerning why it's useful and what it actually entails. In a recent comment, "Matt" said

"I don't see how software could possibly attempt to handle anything it did not anticipate, except to gracefully fail, which T.S. did."

I've seen this sentiment before, almost always from people whose programming experience is limited to statically typed languages. If that's your background, then you've not seen the wonderful flexibility that dynamic typing adds. Here's a very simple example written in Python. Don't worry - Python is called "executable pseudo code" for a reason!

def process_many(things):
    for thing in things:
        thing.process()

Looking at this, you might expect things to be a list. That's possible, but it could also be a set, or an iterator, or even a dictionary (a hash table). As the programmer, you don't have to worry about it. As long as Python can figure out how to iterate over your things argument, everything will just work.

So, to return to Matt's statement above: this is one way code can handle situations that the programmer never considered. Maybe the guy who wrote process_many never even thought that someone might pass in a dictionary. That doesn't matter; it will work anyway.

Of course, you can get this effect in Java using interfaces. The problem with interfaces is that you have to make decisions about them ahead of time. If you were writing process_many in Java, you might not consider the case of passing in anything but a list. Then, if someone using your code wanted to pass in a set, iterator, dictionary, etc., they'd be out of luck.

This is a recurring theme with static vs. dynamic languages. In explicitly typed static languages, you have to spend time figuring out what type every little thing should have. Then, when you're finally done, there will always be some case you didn't consider, like a hand-written note telling you to call the ticket counter.

Python's default arguments are tricky

2006-12-19T00:00:00-08:00

Default arguments are evaluated at function definition time, so they're persistent across calls. This has some interesting (and confusing) side effects. An example:

def foo(d=[]):
    d.append('a')
    return d

If you've not tried this before, you probably expect foo to always return ['a']: it should start with an empty list, append 'a' to it, and return. Here's what it actually does:

>>> foo()
['a']
>>> foo()
['a', 'a']
>>> foo()
['a', 'a', 'a']

This is because the default value for d is allocated when the function is created, not when it's called. Each time the function is called, the value is still hanging around from the last call. This gets even weirder if you throw threads into the mix. If two different threads are executing the function at the same time, and one of them changes a default argument, they both will see the change.

Of course, all of this is only true if the default argument's value is a mutable type. If we change foo to be defined as

def foo2(d=0):
    d += 1
    return d

then it will always return 1. (The difference here is that in foo2, the variable d is being reassigned, while in foo its value was being changed.)