Paul Smith

The 10 Year Anniversary of the HealthCare.gov Rescue

paulsmith@pobox.com (Paul Smith) — Wed, 18 Oct 2023 19:12:15 -0000

Ten years ago today, on Friday, October 18, 2013, the effort to fix HealthCare.gov began in earnest. At 7:15 A.M. Eastern time, a small group assembled next to the entrance to the West Wing of the White House. The group included Todd Park, Brian Holcomb, Gabriel Burt, Ryan Panchadsaram, Greg Gershman, and myself. Later in the day we were joined by Mikey Dickerson via a long-running speakerphone call. Some of us were from outside of government (Gabe, Brian, Mikey, and me), and the others had jobs in government at the time (Todd, Ryan, and Greg). What we all had in common was that we were experienced technologists, having been at startups or at large established technology organizations.

The members of our group were selected by Todd, working with Greg and Ryan and others behind the scenes to identify people who could help because they had that kind of technology experience. HealthCare.gov, having launched days earlier on October 1, 2013, wasn't working. From the vantage from the top of the political leadership in the country, it was clear outside help was needed. Todd, the CTO of the United States at the time, which is a position in the White House, was tapped to help fix it. His plan was to provide reinforcements to complement the team of government employees and contractors that had built HealthCare.gov and were in the midst of operating it. We were to be a small team, very discreet. Todd was our leader. It was already a high-pressure, stressful situation, so insertion into that context meant melding in, not blowing things up. It was to be a low-key mission of information gathering and assessment, not the cavalry storming in. Todd told us the next 30 days would be critical. The goals were to enroll 7 million people by March 31, 2014, the end of the period known as open enrollment. When the media was eventually informed of our existence by the White House, we were referred to as "the tech surge".

When Todd called me two days prior on October 16 to ask me if I would join the effort, he didn't have to explain the stakes. I understood what it meant for the website to work. I immediately agreed, and put on hold what I was doing, which happened to be raising money for a startup I had founded. I took Todd's call while walking the grounds of the Palace of Fine Arts in San Francisco, having met with VCs earlier in the day. I was living in Baltimore at the time with my wife and toddler daughter. I flew back home right away, and before I knew it I was taking the earliest morning train I could from Camden Yards to DC's Union Station. I thought I had timed it right, but still wound up running across Pennsylvania Avenue so as not to be late.

Photo by me as I hustled. Metadata says taken at 7:11 A.M., so must have just made it

We couldn't have started any sooner even if we had wanted to. The federal government had shut down on October 1, the same day HealthCare.gov had launched. The shutdown prevented anyone from outside the main team working on HealthCare.gov from coming in to help. So while days passed with the news dominated by the twin stories of the shutdown and the slow-moving catastrophe of the launch, a vacuum of information formed, as well as a surplus of speculation and worry. The White House couldn't figure out what was wrong with it, and the implications of it failing were troubling. The Affordable Care Act, the signature domestic policy achievement of President Obama's tenure, had gone into effect, and the website was to be the main vehicle for delivering the benefits of the law to millions of people. If they didn't know what was wrong with HealthCare.gov, other than that it was manifestly not working, plain for everyone to see, and therefore they might not be able to fix it, what would that mean for the the fate of health care reform? Fortunately, the shutdown ended on October 17, which meant we could get to work and finally understand what was going on.

Some of us already knew each other, but everyone was new to someone else. We introduced ourselves, headed inside, and after breakfast in the Navy Mess, headed upstairs to the Chief of Staff's office. Denis McDonough shook our hands as we somewhat awkwardly stood in a line. He asked us directly, "can you fix it"? In our nervous energy, I remember some of us blurting out, "yes". We had confidence, but we also were eager to dive in, learn as much as we could, and get going.

A van was procured, along with a driver. We piled in and headed across the Mall to the Hubert H. Humphrey Building, headquarters of the US Department of Health & Human Services (HHS). Entering the lobby, we passed Secretary Kathleen Sebelius. She wasn't there for us, she was welcoming federal employees back to work after the shutdown. Our meeting there was with Marilyn Tavenner and her staff. Tavenner was the Administrator of the Centers for Medicare & Medicaid Services (CMS), the largest organization in HHS, and the owner of HealthCare.gov.

My thermal printed visitor badge photo is pretty faded after 10 years

Our meeting with Tavenner and her team yielded our first details of the system that was HealthCare.gov. We learned how it was structured and what its main functional components were. We heard first hand what CMS leadership knew about what was wrong, or at least where in the system they could see that things were not working. It was our first sense of the size and complexity of the site, both in terms of functionality, but also in terms of the number of contractors, sub-components, and business rules such as for eligibility. But remember, these were not technical people - they were health policy experts and administrators. Most of the things they were reporting to us were business metrics about the site, descriptions of high-level performance. It was helpful to hear this perspective, and indeed many of these metrics would drive our later work. But this would not yet be the time to learn about the technical challenges the team was facing.

Our morning continued back in the van, leaving DC for CMS headquarters in Woodlawn, Maryland, just outside Baltimore, a 45-minute drive. Here we met with the leadership of HealthCare.gov itself, a group of people including Michelle Snyder, CMS's COO, Henry Chao, one of its main architects, and Dave Nelson, a director with a telecom background, who was being elevated and would oversee much of the rescue work from CMS's perspective. The sketchy picture of HealthCare.gov we had was coming into greater relief. We learned about deployment challenges and bottlenecks, more about where specifically users were getting "stuck" using the site, and we started to hear bits and pieces about the particular technologies being used, including something called MarkLogic, an XML database, which was new to us. We even started to get some details about the deployment architecture and the types of servers involved. Again, the theme of complexity stood out. But we were also still talking at a fairly high level. To really understand what was wrong with HealthCare.gov, we'd have to move on.

The afternoon was spent a few miles down the road back toward DC in Columbia, Maryland, at something called the XOC, or Exchange Operations Center. This was to be the mission-control-style hub of operations for HealthCare.gov. We found a room staffed with a few contractors and some CMS employees. (The XOC eventually would be the site of so much activity during the rescue that a scheme to keep people out was needed.) Here was where, away from the leadership-filled rooms, we finally heard from technologists who were directly working on the site. What we heard was troubling. There was a lack of confidence in making changes to the source code. A complex code generation process governed much of the site and produced huge volumes of code, requiring careful coordination between teams. Testing was largely a manual process. Releases and deployment required lengthy overnight downtimes and often failed, required equally-lengthy rollbacks. The core data layer lacked a schema. Provisioning hosting resources was slow and not automated. Critically, there was a pervasive lack of monitoring of the site. No APM, no bog-standard metrics about CPU, memory, disk, network, etc., that could be viewed by anyone on the team. By this point, the looks we exchanged amongst ourselves betrayed our fears that the situation was much worse than we initially expected. I carried with me that day a Moleskin notebook that I furiously scribbled in so as not to forget what I was hearing. You can see as the day worn on my panic start to rise reflected in my writing.

Incredulous that they didn't have monitoring

With that dose of reality and the evening setting in, we collected ourselves back in the van and set off for our final field trip of the day, to suburban Virginia for the offices of CGI, the prime contractor of HealthCare.gov. In Herndon, we were greeted by many people on their leadership team and the technical teams who had built the site, even though it was getting on past business hours at this point. This was as much an interview of us by them as it was a chance for us to ask questions. We had a brief opportunity to ingratiate ourselves and win their trust. We did that in part by showing our eagerness to dig in on some of the things we had learned earlier in the day, proving our engineering bona fides (specifically with high-traffic websites), and emphasizing that we were not there for any other reason than to help. This was a team on edge, under the gun and exhausted. We needed them to succeed, and they weren't going to work with us if they thought we were there to cast blame.

We asked many questions, but they mostly boiled down to, what's wrong with the site, and how do you know what's wrong? Show us where in the system this or that component isn't performing the way you expected. And they and CMS mostly couldn't do that. They had daily reporting and analytics that produced those high level business metrics. But again there was that lack of monitoring of the system itself, real-time under load. So we focused on that.

It was getting late. Could we throw a Hail Mary? Walk out of that building leaving behind something of tangible value, something promising they could build on? We said, there's lots of APM-style monitoring services, but we're familiar with New Relic. Could we install it on a portion of the servers? Yes, we know there's a complicated and fragile release process, but if we bypassed that and just directly connected to some of the machines, we could install the agent on them and be receiving metrics almost immediately. Glances were exchanged. A CMS leader in the room made a call on their cell phone - we actually have some New Relic licenses already, we can use those. There was also a hestitancy to see if this kind of extraordinary, out-of-the-norm request would be approved - clearly, even during this period of turmoil, all the stakeholders stuck to the regular release script. The CMS leader nodded their approval. A small group assembled to marshall the change through. Many folks had stuck around, even though it was nearing midnight. Then on a flatscreen in the conference room, we pulled up the New Relic admin, and within moments, the "red shard" (the subset of servers we had chosen for this test) was reporting in. And there it was - we could see clearly the spike in request latency, even at this late hour, that indicated a struggling website, along with error rate, requests per minute, and other critical details that had basically been invisible to the team to that point. Imagine a hospital ICU ward without an EKG monitor. Now that they knew exactly in what ways it was bad, they could start to correlate them with the business metrics and other aspects of the site that needed improving. They could then prioritize the fixes and actions that would yield the biggest improvements.

Photo by me. Taken at 1:50 A.M. on Saturday, October 19, 2013.

That would come later. For now, exhausted, well past midnight, we left Herndon and rode the van back to DC. At roughly 2 A.M. we reconvened in the Roosevelt Room in a completely silent White House for a quick debrief. As we reflected on what we had just experienced, another thought was settling in over the group - that this was obviously not over by any stretch, that none of us were going home any time soon, that the challenge was much larger in scope than we had imagined, that any notion we may have had at the outset of possibly just offering some suggested fixes and moving on was in retrospect hilariously naive, that this was all we were going to be doing for the foreseeable future until the site was turned around. Indeed, we all managed to find a few hours of sleep in a nearby hotel and then were right back at it in the morning, heading straight out to Herndon.

This was just day one, a roughly 18-hour day, and it certainly wasn't the last such marathon. Over the next two-and-a-half months until the end of December, the tech surge expanded and took on new team members, experienced many remarkable events and surprises, and ultimately, successfully helped turn HealthCare.gov around. Millions enrolled in health care coverage that year, many for the first time. I hope to tell more stories of how that happened over the next few weeks and months.

Oppenheimer

paulsmith@pobox.com (Paul Smith) — Tue, 25 Jul 2023 04:00:00 -0000

I saw “Oppenheimer” this weekend at the Music Box in Chicago, in its theater capable of projecting 70mm film. The movie is a huge achievement, a complex piece of art that nonetheless tells an efficient story in spite of a 3 hour running time. Here are a few thoughts on the visual storytelling that was presented on screen.

The central question “Oppenheimer” asks is, to whom or what is J. Robert Oppenheimer bound? Director Christopher Nolan spares us a hoary answer like “the truth!”, but early on we are told it is to theory and mathematics, wherever they may lead. They first lead Oppenheimer out of the lab and into the arms of pre-war continental quantum physicists, who are forging a nascent field of inquiry that our hero immediately grasps and excels at. He becomes friends with physicist Isidor Rabi while in Germany, bonding over their shared New York Jewish backgrounds. We see Rabi nurture Oppenheimer several times, offering food from his pocket and quiet counsel, evincing an almost maternal protective quality. Oppenheimer, back in the States, forms allegiances with students and organizers of various labor movements, including the Communist Party, but weakly, always as or via a proxy (using the party as a channel to fund republicans of the Spanish Civil War; as a means to start his relationship with Jean Tatlock). Several scenes hint at deeper connections, but cut away before he does anything incriminating (“but that would be treason …”). Of course his “true” allegiance or lack of to the Communist Party and to the Soviet Union hangs over the balance of the post-test movie, which seems content to leave it unanswered, or perhaps, answered sufficiently by his other deeds, which several characters make voice of.

What of Oppenheimer’s non-professional bonds? He’s prepared to boil off his child like so many neutrons in a fission reaction, driving the colicky baby to a friend’s house in hopes of being relieved of parental duties. In spite of his affairs and scarce evidence of marital happiness, his relationship with his wife Kitty is shown to endure (“we’re adults, we’ve been through fire together”, their own form of fusion), surviving at least to his public image rehabilitation late in life. Frank Oppenheimer, kept to the periphery early on, becomes essential to the triumph at Los Alamos, reuniting the brothers on the same mesa where they forged a connection to the land, a feel for the weather of the desert. He pursues Jean Tatlock with flowers and is repelled; she later makes her own pursuit, reminding him of his off-hand oath, “you said you’d always answer” — a promise he’s now incapable of keeping, acted on by multiple forces much stronger than she.

Oppenheimer becomes an attractive force himself to build Los Alamos, cajoling scientists and convincing the US military, overcoming each group’s respective reservations, the former about the endeavor itself, the latter about him. He’s the nucleus of the most important thing that’s ever happened, to quote Gen. Groves, but we see him often distracted, gazing into the middle distance, drifting off to Chicago or San Francisco, giving misleading testimony to the army’s quietly menacing interrogator. Still, it’s Oppenheimer keeping the energetic particles of scientists around him from flying off or creating ruinous inter-personal explosions. They eke out just enough collaboration and luck to blow up the gadget before Potsdam — immediately, the military dissolves their bond to the man who secured their supremacy (“we got it from here”). We see Oppenheimer unmoored, isolated, radiating out his misgivings and the horror of his revelations.

In a pivotal scene, Oppenheimer dismisses a report of a nuclear chain reaction, stating that theory proves it can’t be so. When his neighbor colleague reproduces the experiment, he immediately forms a new theoretical understanding from it — the bond to pure theory is broken, and a new one connecting theory and practice is made. It takes him no time again to accelerate to the logical end of the implications, and this time, theory must wait for practice to catch up. When the new experiment is finished, first at Trinity and then at Hiroshima and Nagasaki, it’s no longer about what the physics demonstrates, but what it means for the notion of humanity and civilization: his revulsion at the reception of the bomb among his peers and the public is shown itself as a terrible blinding fire, one that now lives within him.

Stray observations

“Oppenheimer” centers language throughout, and positions Oppenheimer as a language savant in technical ways, but deficient in others. He learns enough Dutch to teach physics in Europe. He reads Marx in the original German. He quotes Sanskrit to his lover. He’s also a translator, bringing the foreign language of quantum mechanics to the United States, and bridging the gap between academics and the military. He fails to learn the language of Washington, and his character is assassinated as a result. Kitty is bedeviled, who can’t understand how a man can be so proficient in one domain and so passive when it comes to himself and his family.
There’s a rich symbolic history to the apple, and one figures prominently in early scenes. First as an impulsive attempt on his professor’s life. We see Oppenheimer stab the apple, which is a pre-quantum apple, Newton’s apple; the needle with which he injects the cyanide is a dagger in Newtonian physics, which Einstein, whom we encounter multiple times including in despair in the final scene, inflicted the first mortal wound, but which quantum mechanics and the bomb finally killed. And then as the poison fruit of knowledge carried by Niels Bohr, who introduces him to quantum mechanics, which leads to our expulsion from Eden when the atomic weapon is used.
The circular badges that the scientists wore at Los Alamos, which had labels like “K-16” and “C-43”, which I’m sure were for some organizational purpose the film doesn't explain (as far as I remember), made me think of them like personified isotopes from the periodic table of the elements.
The act III Strauss plot was less successful to me than the rest of the film. The revelation that he was humiliated and vindictive and then used the apparatus of official power to seek his revenge didn’t land for me in the way I think was intended, perhaps because, while certainly despicable, it’s not shocking nor particularly novel, even discounting our recent experiences with vengeful politicians. Setting that aside, from a storytelling point of view, as an extended denouement after the wallop of the Trinity event, it has to work extra hard to sustain a clear narrative focus, and I felt the film suffered overall for it.
Kudos to Jennifer Lame, who edited the film, for making scene after scene of extensive dialogue so compelling and propulsive. And to Hoyte van Hoytema, the director of photography — it’s tonally gorgeous and lightly under-saturated in a way that serves the somber mood. What more can be said about 70mm film? It just glows. It’s very much worth seeing in a theater.

Fixing bufferbloat on your home network with OpenBSD 6.2 or newer

paulsmith@pobox.com (Paul Smith) — Wed, 04 Jul 2018 01:22:00 -0000

My home network (which is also my work network) is a standard-issue Comcast cable hookup. In spite of a tolerable 120 megabits down, my experience of daily Internet use is regularly frustrating. Video streams and video chats drop in quality inexplicably. SSH sessions become laggy. Web pages fail to load quickly, and then seem to appear all at once. Even though I should have plenty of bandwidth, the feeling is often one of slowness, waiting, data struggling to get through the pipes.

The reason for this is a phenomenon called "bufferbloat". I'm not going to explain it in detail, there are plenty of good resources to read about it, including the eponymous Bufferbloat.net. Bufferbloat is the result of complex interactions between the software and hardware systems routing traffic around on the Internet. It causes higher latency in networks, even ones with plenty of bandwidth. In a nutshell, software queues in our routers are not letting certain packets through fast enough to ensure that things feel interactive and responsive. Pings, TCP ACKs, SSH connections, are all being held up behind a long line of packets that may not need to be delivered with the same urgency. There's enough bandwidth to process the queue, the trick is to do it more quickly and more fairly.

Fortunately, because bufferbloat is in part a function of how we configure our routers, it's within our ability to solve the problem. But first, we have to diagnose it, and establish a concrete baseline to improve from. The speed test at dslreports.com tests for bufferbloat in addition download and upload speeds, so we'll use that tool to see how we're doing.

First, I run the speed test, and get the following results:

Here you can see the issue starkly: 120 Mbps down and 12 Mbps up yields an "A+" grade (debatable), but we get an "F" for bufferbloat.

We define bufferbloat here as the increased latency of a standard ping while downloading or uploading a large file over ping times while otherwise quiescent.

In our case, our idle latency is 12ms average, a download bloat of about 660ms, and an upload bloat of about 280ms, on average.

The fix is to apply a queue management strategy to our router. Ordinarily, I'd be wary of this. In my experience, QoS administration tends to be fussy and full of unintended consequences. I always felt as if I had cast too broad a net, inadvertantly degrading overall network performance to get slightly better results from one application. And I wasn't sure around what fixed-point I was optimizing. In this case, bufferbloat gives us the measurable target. Administration is made much easier by the appearance of a new algorithm that's easy to apply to network interfaces. It doesn't require much tuning, and you don't need to futz with individual ports or percentages.

Details vary widely by router operating system and administrative UIs. In our case, the router is running OpenBSD. (And if yours isn't, why not? Get a PC Engines board, throw obsd on it, and you have an inexpensive solution with world-class security, efficiency, and performance, that's simple to operate and well-documented.) The OpenBSD way of being a router is through its pf system, which is analogous to Linux's iptables, but much more capable and efficient. Since 6.2, pf has implemented something called "FQ-CoDel", which is an algorithm for scheduling packets fairly and is designed to prevent bufferbloat. It is exposed via the flows option on a queue rule. In principle, all we need to do is add two rules, one to fix uplink bufferbloat and one to fix downlink. Let's see how this goes.

In our /etc/pf.conf, we first add a single line to handle the uplink. This will apply a FQ-CoDel queue to the network interface attached to our WAN link, or the cable modem in our case. The way to think about it is, FQ-CoDel is strategy applied to outbound packets only, as they exit the interface, so even though the WAN interface is duplex up and down, in order to handle the downlink part we'll apply it to the network interface connected to our LAN, which we'll do next.

An important detail. In order for the queue algorithm to do its thing, it needs to know the bandwidth of the outbound link. According to Mike Belopuhov, the implementor of FQ-CoDel in OpenBSD, we need to specify 90-95% of the available bandwidth. Fortunately, we've just measured it.

The line to add to pf.conf to fix bufferbloat on the uplink is (assuming em0 for the WAN interface):

queue outq on em0 flows 1024 bandwidth 11M max 11M qlimit 1024 default

A couple of notes. outq is a label we give, but it's an opaque string to pf. 11M means 11 megabits per second (92% of the uplink bandwidth). qlimit is also specified explicitly, because its default value of 50 is too low for FQ-CoDel. The default keyword is required.

And that's it: we don't need to alter our filtering rules to assign packets to a queue: all outbound packets on this interface are assigned to our new queue.

Now let's reload pf with the config change, and re-run the speed test.

$ doas pfctl -n -f /etc/pf.conf && doas pfctl -f /etc/pf.conf

Uplink latency under load is now down to 17ms on average, from 280ms. That's a mere 5ms worse than idle.

(I discount the apparent decrease in uplink bandwidth from this test result. In my experience, dslreports.com could vary by 10-15% in reported bandwidth run-to-run, but over time it converged on 12 Mbps.)

The downlink fix is nearly the same, we just adjust for the name of the interface (the LAN NIC is called em1) and for its 90-95% bandwidth upper bound, which is 110 Mbps.

queue inq on em1 flows 1024 bandwidth 110M max 110M qlimit 1024 default

Reload, re-run:

Always nice to get an A. Downlink latency under load is now 24ms, from 660ms.

I haven't elided much, I think that's a pretty decent result for two lines of config. If you want to go further, there's a quantum knob to turn (baseline is your NIC's MTU, but look at what OpenWRT does for guidance), but that's about it.

Post-fix, my observation is that things feel much snappier. Aside from the ping time improvements, I don't have other measurements to cite. But so far, FQ-CoDel seems to have fixed bufferbloat on my network and made for a substantially better experience.

2016, my year in review

paulsmith@pobox.com (Paul Smith) — Mon, 02 Jan 2017 08:12:15 -0000

Sewing

After years of looking at the sewing machine in the closet and saying I should learn how to use it, I did something about it this year. Michelle got me a class at a local crafts store as a gift, and I made a pillow cover. It turned out pretty good, and I enjoyed doing it, so I kept at it. I didn't have a ton of time this year to devote to it, but by the end of the year I had made some blankets for friends and quillows for the girls, a pair of pajama pants for Michelle, and repaired the lining of a friend's handbag. I'd like to keep at it, maybe tackling Halloween costumes next year, and doing projects with Maxine, like adding circuits to fabric ala FLORA.

Running

I've run for exercise before, but never with any consistency. This year I set out to run at least 2 times a week, at least 20 minutes per run. Turns out, I liked it a lot. So much so that I was running almost every weekday by April. Unforunately, my knee didn't like it so much, and in the middle of a run on the Bloomingdale Trail, it suddenly seized up and I had to take an Uber home. Luckily, I hadn't done any damage, and an orthopedist said I was just overdoing it. I took about a month off and ramped back up slowly. By October, I peaked for the year, doing about 10 miles per week. In November, I ran in my first 10k, the Lincolnwood Turkey Trot. All told for 2016, I logged 156 miles. For 2017, I'd like to attempt a half-marathon, and double my yearly mileage. Here's my Strava profile.

Ad Hoc

The company I started with Greg in 2014 began the year with 7 people and ended with 41. The growth was thanks to winning our first contracts: we had been around long enough as a company and had enough "past performance", as they say in the industry, to compete and be awarded tasks on our own, instead of only working as subcontractors. The contracts were with CMS, to continue our work on HealthCare.gov, and with the Department of Veterans Affairs, to help build Vets.gov. We also earned spots on two highly sought-after contract vehicles, the ADELE BPA with CMS, and FLASH with the Department of Homeland Security, that have will have opportunities for us down the road to bid on. It was a great year for Ad Hoc, and I'm proud of what we've built: a great team, and useful software that is delivering actual benefits and services to real people. For example, this year, we took over the core shopping part of HealthCare.gov, known internally as Plan Compare. During open enrollment so far, which is still going on until the end of January, over 3.5 million households have enrolled for plans using our software. HealthCare.gov also saw its biggest single-day enrollment tally ever, on December 15th. I'm also proud that we're proving out the model of providing modern software engineering and design services to the government that are efficient, work well, cost less to build and operate, and are just better than the status quo. There's a lot of uncertainty ahead for the programs that we're working on. There's not much we can about that, other than continue to do the work we have in front of us until it changes, and look for additional opportunities in state, local, and maybe outside of government.

House

Michelle and I bought a house in south Evanston in January, and we've been renovating it since. It's an old Victorian-style home from the 1890s with a good stone foundation and timber frame. We're doing extensive changes to the interior, with a new layout and all new flooring, doors and windows, and systems like electrical and HVAC. The exterior is mostly unchanged from a framing perspective, but we're updating the siding, and we dormered out part of the sloping roof so we could have a master bedroom on the third floor. We're also converting the garage into a two-story garage/coach house combo: the plan is to have an office on the second floor for Michelle and me. We had hoped, at the beginning of the year, to be moved in by fall, but these things go the way these things go. As of this writing, we're about a month away from being able to pull up stakes here in Logan Square. We've been working with our friend and architect David Burns, which has been great. We spent time together with him talking about what we wanted, and he came up with a vision, drew up detailed plans, and has been managing the overall construction process. We hired Conrad Szajna of FormedSpace to be our general contractor, and he's hired a great team of subcontractors and tradespersons to do the work.

Family

The best part of my year was spending time with my family. Maxine started Kindergarten at Lincoln Elementary School in Evanston, and Veronica has grown into a full-fledged toddler. We didn't do as much travel as we would have liked this year, but we enjoyed biking together (we got a trailer this year for the girls), exploring our fair city, and working on projects, like the homemade arcade Maxine and I have been building together.

A few other things of note from the year:

We participated in the Volkswagen settlement and chose to have them buy back our Jetta TDi. Good riddance. We bought a new Mazda CX-9 as a replacement.
We volunteered for the Hillary Clinton campaign, including taking Maxine up to Kenosha, WI, to knock on doors for GOTV on election day. Well.
I continue to feel grateful in so many different ways, for dear old friends and new ones we made this year, for our families immediate and extended, for our relative health and wealth, for our general dumb luck to be this fortunate and safe, recognizing just how contigent, random, and unlikely that is.

Looking at your program’s structure in Go 1.7

paulsmith@pobox.com (Paul Smith) — Tue, 16 Aug 2016 06:42:20 -0000

Go 1.7—out today!—features an new SSA-based compiler backend. SSA is a method of describing low-level operations like loads and stores that roughly map to machine instructions, with the special difference that SSA acts as though it has infinite number of registers. This is not especially interesting on its own, except that it enables a class of well-understood optimization passes that make the resulting binary smaller in code size and faster. The new release of Go is an indication the implementation is maturing and starting to take advantage of techniques and practices adopted in the wider world of compiler technology.

In addition to the performance benefits of the new SSA-based backend, there is a suite of new tools that allow a developer to interact with the SSA machinery. One such tool outputs the intermediate SSA statements, optimization passes, and resulting Go-flavored assembly. This is done by setting the environment GOSSAFUNC to the name of a function to disassemble when using the go tool, for example:

$ GOSSAFUNC=main go build

This invocation will output to the terminal, but the more interesting artifact is an HTML file, named ssa.html, written out to the current directory. Open the file in your web browser and you’ll see something like:

What you are looking at is a table with many columns extending to the right, each one except for the first and last representing an optimization pass over the preceding SSA form. (I counted 37 separate passes.) The first column is the the compiler’s initial, unoptimized SSA output, and the last column is the Go-flavored assembly that will be turned into machine code for the final compiled binary executable or shared library.

While this can look intimidating to the uninitiated, SSA is relatively simple by design -- each line represents a either a value being assigned the result of an instruction (i.e., one of the infinite number of registers), or a label of a "basic block" (a set of statements, aka, the things between curly braces in source code), or the exit of a basic block which jumps execution to a different basic block (eg., control flow like an if-statement or returning from a function call).

For example:

v4 = Const64  [42]

Means assign the 64-bit integer constant value 42 to the register labeled v4.

b5: ← b4
  v15 = Copy  v14
  v16 = StaticCall  {runtime.printnl} v15
Call v16 → b6

Means b5 is the label for a basic block with two statements. It concludes with an exit Call instruction, taking program execution to another basic block, b6, when returning from the function call that produces the v16 value.

The tokens like Const64, Copy, and StaticCall are analogous to assembly instructions like MOV and LEA.

One special operation is Phi, or a "Phi node". Notice that a Phi node takes two arguments, which are two values. Also notice that a basic block with a Phi node has two basic block labels next to its own label, unlike every other basic block:

b3: ← b1 b2
   v20 = Phi  v4 v6
   ...

This is an interesting construct and it relates to program control flow. A basic block is defined by having a single entry and a single exit point, and having a set of statements that execute sequentially (i.e., no branching logic) in between. And "SSA" stands for "single static assignment", which means that each value is assigned one and only one time. But what do you do if you have a reference to a variable that could have different values depending on which branch of an if statement the program took? A Phi node is a way of resolving this apparent contradiction. Since each branch of an if statement by definition assigns to a unique value, a Phi node coalesces them into the final value depending on which branch was actually taken. So you can think of it as the run-time retrieval of a value based on some condition. This is why the block has two dependencies at the top rather than just one.

Let’s write a silly program to motivate some examples:

package main

func main() {
	x := 5

	if 1 < 0 {
		x = -42
	}

	println(x)
}

Let’s start with the initial basic block, b1:

b1:
  v1 = InitMem 
  v2 = SP 
  v3 = SB 
  v4 = Const64  [5]
  v5 = ConstBool  [false]
  v6 = Const64  [-42]
  v11 = OffPtr <*int64> [0] v2
If v5 → b2 b3

After some program initialization, v4 is the assignment to the local var x in our code of the constant 5. Go knows at compile-time that 1 < 0 is always false so it just assigns false to v5. v6 is the assignment of -42 to x that will happen during program execution.

At the end we have the basic block exit, If v5 → b2 b3. This tests the truth value of v5 to decide whether to jump program execution to either b2 (if true) or b3 (if false). This is similar to the following chunk of assembly:

    JNZ b2
b3:
  ...
b2:
  ...

One nice thing about the Go SSA HTML view is you can click on any token in the SSA form and it will highlight the references to and from that element.

We can see from the different colors how the control flow will go. You can visually connect the blocks of code that will execute and the assignments, function calls, and additional branching that will result.

Clicking on the Phi node and its dependencies, you can see from where the possible values come from in previous control flow.

Moving on, the function call prints out the integer value is in the following basic block:

b4: ← b3
  v9 = Copy  v20
  v10 = Copy  v9
  v12 = Copy  v8
  v13 = Store  [8] v11 v10 v12
  v14 = StaticCall  {runtime.printint} [8] v13
Call v14 → b5

The StaticCall instruction invokes the function from the Go runtime that is specialized to format integer values and print them to the terminal. One interesting thing to note is that the preamble to call sets some things up in memory, the location of which is fed to the printint function. If you notice, v11 refers back to the value set in b1, which is a pointer offset from v2, which was set from the stack pointer SP near the top of the program initialization. Which makes sense, because the generate assembly language needs concrete memory locations to address when invoking functions taking pointers.

There’s much more to investigate here, including the particular optimization passes, and tracing how individual instructions make their way through to the final assembly or are eliminated. But hopefully this has given you an introduction into SSA and how it maps to constructs in your applications.

Modifying a Go slice in-place during iteration

paulsmith@pobox.com (Paul Smith) — Tue, 26 Jul 2016 04:26:51 -0000

Update: See a better way of doing this below.

I'll often have a slice that I want to filter down on, removing elements based on some test, and I would prefer to modify the slice in-place for whatever reason, either because I want to retain the reference to the original slice or I don't want to allocate a new slice as destination for the desired values.

You might think that modifying a slice in-place during iteration should not be done, because while you can modify elements of the slice during iteration if they are pointers or if you index into the slice, changing the slice itself by removing elements during iteration would be dangerous.

Here's a straightforward way to accomplish it. The idea is that, when you encounter an element you want to remove from the slice, take the beginning portion of the slice that has values that have passed the test up to that point, and remaining portion of the slice, i.e., after that element to the end, and copy them over the original slice. Then, assign a slice expression up to the number of values that passed the test to the original variable.

Here's an example. Let's say I have a slice of integers, and I only want to retain the even ones.


var x = []int{90, 15, 81, 87, 47, 59, 81, 18, 25, 40, 56, 8}

i := 0
l := len(x)
for i < l {
	if x[i] % 2 != 0 {
		x = append(x[:i], x[i+1:]...)
		l--
	} else {
		i++
	}
}
x = x[:i]
	
fmt.Println(x)
// [90 18 40 56 8]

The i variable is used to keep track of the number of even values found in the slice. When an element is odd, we create a temporary slice using append and two slice expressions on the original slice, skipping over the current element. The temporary smaller slice is copied over the existing, shifting down the remaining values. The l variable makes sure we make the right number of comparisons despite moving things around. It's important to note the memory location of the original slice is unchanged with the copy. No new heap allocations are performed, even with the temporary slice.

Update: A number of people, including here in comments and on the golang reddit, have pointed out that the method I outline here is pretty inefficient; it's doing a lot of extra work, due to the way I'm using append. A much better way to go about it is the following, which also happens to have already been pointed out in the official Go wiki:

y := x[:0]
for _, n := range x {
    if n % 2 != 0 {
        y = append(y, n)
    }
}

This also has the benefit of being simpler and shorter. Use it instead!

A simple way to limit the number of simultaneous clients of a Go net/http server

paulsmith@pobox.com (Paul Smith) — Wed, 13 Apr 2016 22:55:13 -0000

This is a simple and easily generalizable way to put an upper-bound on the maximum number of simultaneous clients to a Go net/http server or handler.

The idea is to use a counting semaphore, modeled with a buffered channel, to cause new clients to queue which arrive after the nth current client, where n is the size of the buffer.

Ideally, we wouldn't want to limit the amount of concurrency to our application, but practically, there are limits on underlying resources, and forcing clients to queue after a certain limit gives us control over that resource utilization.

Let's say we have a simple HTTP handler that requests access to some expensive resource, like a database or complex computation:

package main

import (
    "io"
    "log"
    "net/http"
)    

func main() {
     http.Handle("/", http.HandleFunc(func(w http.ResponseWriter, r *http.Request) {
         res := getExpensiveResource()
         io.WriteString(w, res.String())
     })

     log.Fatal(http.ListenAndServe(":8080", nil))
}

The handler can be requested by an unbounded number of clients, potentially exhausting our resources.

Let's add a counting semaphore that will gate entry into the handler:

func main() {
     const maxClients = 10
     sema := make(chan struct{}, maxClients)

     http.Handle("/", http.HandleFunc(func(w http.ResponseWriter, r *http.Request) {
         sema <- struct{}{}
         defer func() { <-sema }()

         res := getExpensiveResource()
         io.WriteString(w, res.String())
     })

We make a channel of type struct{}, because we are only interested in the send/receive semantics of the channel, not its value. The first statement of the handler is a send on the channel, which will succeed up to maxClients number of simulatenous requests. Think of the buffered channel as having empty slots, and being able to send on it means that you can fill a slot and proceed. If there are no empty slots, in other words, the length of the channel is equal to the buffer size, then the send will block, and will have to wait to proceed until a slot frees up. The next statement defers until after the handler has returned or panicked, and frees a slot by receiving from the channel.

If we have more than one handler to limit access to, we can move the semaphore into a middleware and wrap the original handler, leaving the body of it unchanged:

package main

import (
    "io"
    "log"
    "net/http"
)    

func maxClients(h http.Handler, n int) http.Handler {
     sema := make(chan struct{}, n)

     return http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
         sema <- struct{}{}
         defer func() { <-sema }()

         h.ServeHTTP(w, r)
     })
}

func main() {
     handler := http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
         res := getExpensiveResource()
         io.WriteString(w, res.String())
     })

     http.Handle("/", maxClients(handler, 10))

     log.Fatal(http.ListenAndServe(":8080", nil))
}

Note that this implementation will cause clients beyond the maximum number to queue without bound, until they hit the system limit of the listen(2) backlog.

This pattern can be used to control the amount of concurrency to any resource, not just net/http handlers.

The Bloomingdale Trail

paulsmith@pobox.com (Paul Smith) — Fri, 05 Jun 2015 21:00:00 -0000

There was a moment last Friday while I was on top of the soon-to-open Bloomingdale Trail with a tour group when I had a strange feeling. We had been walking for more than a mile to that point, 17 feet up above Chicago streets, passing houses, factories, and alleyways in Logan Square. I paused to consider the feeling, and I realized it was that I had been walking continuously for half an hour through a Chicago neighborhood and not once had to contend with an intersection or motor vehicle in all that time. Unless you live in the city and walk or bike often, it's hard to convey how pleasantly odd that feeling was. It's not something you can get from a typical park or trail. Parks are usually compact open spaces, polygons boxed in by streets. Most other trails are at grade level, so whatever flow or momentum you build up is periodically interrupted by an intersection. The Bloomingdale Trail, however, is both apart from and woven through the neighborhoods it is situated in. If you go the length from the west trailhead to the east trailhead or vice versa, you'll have travelled 2.7 miles -- a massive span across 4 Chicago neighborhoods -- in an entirely human-mediated fashion. And yet you never feel as though you've taken yourself out of the fabric of the city, as you might when going into a park. Thanks to periodically spaced adjacent parks and access ramps, you can dip in and out of the Trail as casually or deliberately as you choose. You gain both a new vista on the city, and a deeper connection to the neighborhoods you've always known. It's remarkable what a mere 17 feet of elevation can do to both take you out of the city and give you greater access to it.

It is this embeddedness that I believe will ultimately make The Bloomingdale Trail and the entire 606 system of parks a success. It's not a jewel, a thing to be admired, with its aesthetics upfront. It's a relentessly practical bit of new human-scale infrastructure in a vibrant residential area. It will materially improve the lives of its neighbors each day by enabling them to be active, to commute, to play, and to discover in a new and unique way. It's worth remembering that the project was funded largely by federal transportation dollars, earmarked for reducing traffic congestion and air pollution. People will wonder what this thing is, and the answer will be in its daily use.

I remember walking around the old Bloomingdale Line, a disused elevated railroad embankment, in 2002 with a group of work colleagues. We would sometimes take our lunch up top, ducking under a fence at Milwaukee and Leavitt to gain access. The germ of the Friends of the Bloomingdale Trail was planted there; the non-profit community organization officially formed a year later. The circumstances at the time were fortunate: the development of the High Line in New York provided a template and a healthy competitive jolt; the railroad company was looking to rid themselves of their responsibilities to the line; the City wanted to tear down the embankment and spanning viaducts, providing further impetus; and crucially, the rights-of-way were all contiguous and owned by the City: there need be no time-consuming negotiating with private owners to acquire the trail's property, as there was in New York. From there we held community meetings, trash pick-up days, festivals, goofy but earnest Valentine's Day events, led tours, pitched aldermen and city planners, documented the Trail as it existed, helped open a new neighborhood park next to the Trail, printed posters and brochures, hosted arts events, let David Schalliol do his magic, connected with open space plans, and started a partnership with the Trust for Public Land and the City of Chicago to design and build the Trail. In 2007 and 2008, we convened neighbors in a series of meetings and surveys to listen to, capture, and synthesize the community's vision for the project. The product of this effort, the Community Visioning Update, was perhaps our most important practical work as an organization: this document was incorporated into the City's official request for proposals for design and construction. To the best of our ability, we made sure the future Trail would be reflective of the community it came from and would serve.

It's time now to celebrate the opening of the Trail and begin a new phase in the life of FBT. The original goals of the organization were to:

Preserve the elevated right of way
Beautify the public space
Create a new, mixed-use trail/linear park
Establish a broad coalition that supports the proposed park
Connect with neighborhood schools and institutions

Our new mission is to be the community stewards of the Trail, and to that end, we recently applied and have been approved to be a Chicago Park District Advisory Council, or PAC. As befits our unusual new park, we're breaking new ground as a PAC. We're unique in that our bylaws state there will be board representation from each of the 4 neighborhoods, and from each of the constituent park PACs (Julia de Burgos, Walsh, Churchill Field, and Kimball). Because no other park covers as much ground, cuts through as many neighborhoods, and links up as many adjacent smaller parks, governance and community orginizing around The Bloomingdale Trail will be a new experiment for all involved.

One last thought. There are very few good west-east routes in Chicago: most transporation infrastructure radiates from and to the Loop. The Bloomingdale Trail is a stroke across the spokes, and the physical, economic, and cultural circulation it promotes will be fascinating to watch. But there are bigger things at stake. Even before this new park was built, the Trail conspicuously ended at the north branch of the Chicago River. (Now it ends at Ashland, that street's bridge having been born-again over Western.) It's always been a dream and a goal of FBT and the 606 partners to extend the Trail across the river in a future phase. From there, on-street bicycle paths can be knit together, ultimately arriving at the lakefront. However, there's an even bigger dream to be dreamt. A few miles west of the western terminus of the Trail, the Illinois Prarie Path has its eastern endpoint. The IPP carries you out due west 60 miles past the outer suburbs. A network of rural trails beyond can be followed all the way to Iowa. So while we celebrate the opening of Chicago's next great park tomorrow, the notion of a bicycle trip that begins at the Mississippi River and ends at Lake Michigan, on bike paths the entire span, should stay in the back of our minds as a not-too-distant possibility.

Look up! It's The Bloomingdale Trail

Chicago wards & precincts shapefiles in 2015

paulsmith@pobox.com (Paul Smith) — Sat, 28 Feb 2015 01:53:00 -0000

Update: On April 6, 2015, the City of Chicago updated its Data Portal with the official wards and precincts shapefiles.

tl;dr: I tried to make a map of Chicago election results, I found only out-of-date wards & precincts shapefiles, I had to FOIA the up-to-date versions, I got them, I republished them so anyone can download them, and finally made that map.

Read on for the full saga.

After this week’s municipal general elections in Chicago, I was looking for detailed results in the mayor’s race, which didn’t end Tuesday night but is headed for a run-off between Mayor Rahm Emanuel and challenger Cook County Commissioner Chuy Garcia on April 7. Specifically, I wanted to see where in the city the support for each candidate was, and at as granular a level as possible.

The Chicago Board of Elections posts vote tallies by precinct (50 wards in Chicago, with on average 40 precincts per ward). Precincts are the smallest unit of political geography—in Chicago, they are roughly a few square city blocks each. Given the neighborhoody nature of Chicago and the block-by-block affinities that exist (which leads politicians to produce carefully sculpted gerrymanders like the 2nd Ward in order to corral voters into favorable pens), a map showing the relative intensity of voting percentages per candidate by precinct would be a good tool for aiding detailed understanding of this election or any election, and a building block for many possible similar analyses in the future.

So I set out to make such a map. My plan was to gather the vote totals per precinct, shapefiles of the city ward and precinct boundaries, and join them together using tools like d3 to draw a choropleth or thematic map in a web browser. This is a straightforward plan and is well-trod ground. However, I naïvely assumed the official source material I gathered would be accurate and up-to-date.

After scraping the vote totals from the BOE site[1], I downloaded the wards and precincts shapefiles from the City of Chicago’s Data Portal site, which is a service that hosts many different types of data, from building permits to restaurant inspections. I did this by typing “wards” and “precincts” into the search box and downloading from the results pages the links titled “[Boundaries

Wards]dataportalwards” and “Ward Precincts”. There was nothing to indicate that these files were out of date, nor anything else to indicate that they were not the current, authoritative source of these data sets.

I put together a first draft of the map and shared it with some colleagues who are experts in mapping and Chicago data. They quickly pointed out that the map appeared to be using the old wards and precincts.[2] In 2012, the city council approved a new set of ward boundaries, redrawing the city’s political map. They were to go into effect in 2015, and this week’s election, which included all 50 aldermanic races, were to be contested on this new geography. The conspicuously missing new 2nd Ward was the tip-off my map was wrong.

I searched for the updated boundaries, but came up with only unofficial sources, and only for wards at that. There was the WBEZ map from their original 2012 story, and the Tribune had created a side-by-side comparison of the old and new wards. But I couldn’t trust these for my own use, because of their uncertain provenance. And without matching updated precincts, I couldn’t join vote totals for use in a map in any case.

Taking a page from the people person at my old job, I made a phone call to the Board of Elections: maybe I could just ask for the data and they would give it to me? I stated my request very plainly and without explanation of motive, and was told to “hold please” a couple of times while I bounced between departments. A few moments later, I heard “Districts and Boundaries” on the line. Success! Here was, literally, the person who could help me, right then. Or so I thought. I repeated my request, and without a moment’s hesitation, the Districts and Boundaries voice said that I would need to contact the BOE’s FOIA officer, and here was their email address.[3]

It was hard to tell how much of this was bluffing, as in, let’s see you actually bother to make a FOIA request, but I went ahead and stubbornly wrote an email to the FOIA officer anyway. I was under no illusions that my request would be fulfilled quickly enough to make my post-election map still relevant.

I then took to Twitter to register my displeasure for this state of affairs—we just had a citywide election for our top local offices, operating on the assumption of the new city council-vouched districts, and yet, despite nearly a decade of the open data movement, despite official portaldom, the key base layers of the political strata were still available only to the learned monks—and moved on.

Lo, but was my request not answered but a few scant hours later! I can’t tell you how surprised I was to see this in my inbox:

I thanked the officer and downloaded the payload, which was a set of 50 folders, each corresponding to a ward and containing a shapefile of that ward’s precincts therein. I eyeballed the boundaries with QGIS and was satisfied that they appeared to be legit. (Again, the shape of the notorious 2nd Ward was the main clue.)

In the absence of official publication, I was determined to at least not have the next person who goes looking for wards and precincts to wind up in FOIA land. As relatively pain-free as this episode was, the fact that I had to engage with the FOIA plumbing in order to fulfill a minor data request is not good. And there is every reason to think that a typical FOIA request will take orders of magnitude longer to fulfill than my jackpot.

My approach was to self-publish the data, but to be clear about its source and my methodology for any transformations. While I’d much rather prefer this data appear on Data Portal, I’d also prefer not for our collective energies to be wasted in pursuits such as these.

Regarding those transformations, I had a set of precincts, but I also wanted the wards that derive from them (a ward is completely defined by its constituent precincts). I imported the precincts into a PostgreSQL database with the PostGIS extension. From there I created wards by grouping precincts by their ward number, and unioning their geometries (i.e., merging a bunch of small precinct polygons into one large ward polygon). Then I exported from the database into various geospatial data formats—Shapefile, TopoJSON, GeoJSON, KML, etc.

I made these exports available for download by anyone, hosted on GitHub.

I finally was able to make the map I wanted, at least, the first-order map, a basic voter preference density map. I hope to build on this data infrastructure with different overlays, result sets, future elections, and so on.

You can view the map here; choose between mayoral candidates in the drop-down selector to update the map with their vote percentages.

With several candidates, in can be useful to see them arrayed as small multiples for easier comparison[4]:

I’d like to see the left hand of the operators of the Chicago Data Portal talk with the right hand of the Chicago Board of Elections, and simply take down the pre-2015 ward and precinct boundaries (or better yet, rename them to something that won’t be mistaken for the most recent version and leave them up for historical research) and get the current shapefiles uploaded as soon as possible. In the meantime, I hope that interested parties will avail themselves of my hosted shapefiles.

More generally I’d like for stakeholders in the world of government data to reflect on the state of the open data movement, and consider examples such as these as the tiny abrasions that impede all sorts of productivity, beyond my modest map-making efforts. On one hand, we’ve made enormous progress; on the other, we’re still fighting the same 10-year-old battles.

And to the FOIA officer at the BOE who responded so promptly, many thanks!

It is 2015 and the third-largest U.S. city is still publishing official election results on a decade-old system that doesn’t lend itself to machine-readability without substantial friction, which violates #5 of the 8 Principles of Open Government Data. I wrote a Python program to extract the data from the particular formatting of the BOE site. ↩
In my defense, while I’ve lived in Chicago for more than 10 years, I only recently moved back after a 5-year hiatus, so my map intuitions are a little stale. ↩
Thus arguably in violation of #1, #3, #4, and #6 of the 8 Principles of Open Government Data. ↩
For this I just screenshotted and collaged them in an image editor. ↩

How to get started with the LLVM C API

paulsmith@pobox.com (Paul Smith) — Wed, 21 Jan 2015 01:53:00 -0000

I enjoy making toy programming languages to better understand how compilers (and, ultimately, the underlying machine) work and to experiment with techniques that aren’t in my repertoire. LLVM is great because I can tinker, and then wire it up as the backend to have it generate fast code that runs on most platforms. If I just wanted to see my code execute, I could get away with a simple hand-rolled interpreter, but having access to LLVM’s JIT, suite of optimizations, and platform support is like having a superpower — your little toy can perform impressively well. Plus, LLVM is the foundation of things like Emscripten and Rust, so I like developing intuition about how new technologies I’m interested in are implemented.

I’m going to show how to use the LLVM API to programmatically construct a function that you can invoke like any other and have it execute directly in the machine language of your platform.

In this example, I’m going to use the C API, because it is available in the LLVM distribution, along with a C++ API, and so is the simplest way to get started. There are bindings to the LLVM API in other languages — Python, OCaml, Go, Rust — but the concepts behind using LLVM to generate code are the same across the wrapper APIs.

This example sort of skips to the middle phase of compiler construction. Assume the frontend (lexer, parser, type-checker) has built an AST and we’re now walking it to emit the intermediate representation of the code for the backend to take and optimize and spit out machine code.

In this case, we’ll just type out the straight-line procedural code for a simple function that would normally be dynamically cobbled together in a AST walker function, calling the LLVM API when it encounters certain nodes in the tree.

For the example, we’ll build a simple adder function, which takes two integers as arguments and returns their sum, the equivalent of, in C:

int sum(int a, int b) {
    return a + b;
}

To be clear about what we are doing here: we are using LLVM to dynamically build an in-memory representation of this function, using its API to set up things like function entry and exit, return and parameter types, and the actual integer add instruction. Once this in-memory representation is complete, we can instruct LLVM to jump to it and execute it with arguments we supply, just as if it was a executable we had compiled from a language like C.

Click here to view the final code.

Modules

The first step is to create a module. A module is a collection of the global variables, functions, external references, and other data in LLVM. Modules aren’t quite like, say, modules in Python, in that they don’t provide separate namespaces. But they are the top-level container for all things built in LLVM, so we start by creating one.

LLVMModuleRef mod = LLVMModuleCreateWithName("my_module");

The string "my_module" passed to the module factory function is an identifier of your choosing.

Note that as you’re navigating the LLVM C API documentation, different aspects are grouped together under different header includes. Most of what I’m detailing here, such as modules and functions, is contained in the Core.h header, but I’ll include others as we move along.

Types

Next, I create the sum function and add it to the module. A function consists of:

its type (return type),
a vector of its parameter types, and
a set of basic blocks.

I’ll get to basic blocks in a moment. First, we’ll handle the type and parameter types of the function — its prototype, in C terms — and add it to the module.

LLVMTypeRef param_types[] = { LLVMInt32Type(), LLVMInt32Type() };
LLVMTypeRef ret_type = LLVMFunctionType(LLVMInt32Type(), param_types, 2, 0);
LLVMValueRef sum = LLVMAddFunction(mod, "sum", ret_type);

LLVM types correspond to the types that are native to the platforms we’re targeting, such as integers and floats of fixed bit width, pointers, structs, and arrays. (There’s no platform-dependent int type like in C, where the actual size of the integer, 32- or 64-bit, depends on the underlying machine architecture.)

LLVM types have constructors, and follow the form "LLVMTYPEType()". In our example, both the arguments passed to the sum function and the function’s type itself are 32-bit integers, so we use LLVMInt32Type() for each.

The arguments to LLVMFunctionType() are, in order;

the function’s type (return type),
the function’s parameter type vector (the arity of the function should match the number of types in the array), and
the function’s arity, or parameter count,
a boolean whether the function is variadic, or accepts a variable number of arguments.

Notice that the function type constructor returns a type reference. This reinforces the notion that what we did here is the LLVM equivalent of declaring a function prototype in C.

The third line in here adds the function type to the module, and gives it the name sum. We get a value reference in return, which can be thought of as a concrete location in the code (ultimately, memory) upon which to add the function’s body, which we do below.

Basic blocks

The next step is to add a basic block to the function. Basic blocks are parts of code that only have one entry and exit point - in other words, there is no other way execution can go than by single stepping through a list of instructions. No if/else, while, loops, or jumps of any kind. Basic blocks are the key to modeling control flow and creating optimizations later on, so LLVM has first-class support for adding these to our in-progress module.

LLVMBasicBlockRef entry = LLVMAppendBasicBlock(sum, "entry");

Note the "append" in the name of the function: it’s helpful to think of what we’re doing as growing a running tally of chunks of code, and so our basic block is appended relative to the function we added to the module previously.

Instruction builders

This notion of a running tally fits with the instruction builder, which is how we add instructions to our function’s one and only basic block.

LLVMBuilderRef builder = LLVMCreateBuilder();
LLVMPositionBuilderAtEnd(builder, entry);

Similar to appending the basic block to the function, we’re positioning the builder to start writing instructions where we left off with the entry to the basic block.

LLVM IR

Sidebar: LLVM’s main stock-in-trade is the LLVM intermediate representation, or IR. I’ve seen it referred to as a midway point between assembly and C. The LLVM IR is a very strictly defined language that is meant to facilitate the optimizations and platform portability that LLVM is known for. If you look at IR, you can see how individual instructions can be translated into the loads, stores, and jumps of the ultimate assembly that will be generated. The IR has 3 representations:

as an in-memory set of objects, which is what we’re using in this example,
as a textual language like assembly,
as a string of bytes in a compact binary encoding, called bitcode.

You may see clang or other tools emit LLVM IR as text or bitcode.

Back to our example. Now comes the crux of our function, the actual instructions to add the two integers passed in as arguments and return them to the caller.

LLVMValueRef tmp = LLVMBuildAdd(builder, LLVMGetParam(sum, 0), LLVMGetParam(sum, 1), "tmp");
LLVMBuildRet(builder, tmp);

LLVMBuildAdd() takes a reference to the builder, the two integers to add, and a name to give the result. (The name is required due to LLVM IR’s restriction that all instructions produce intermediate results. This can further be simplified or optimized away by LLVM later, but while generating IR, we follow its strictures.) Since the numbers we wish to add are the arguments that were supplied to the function by the caller, we can retrieve them in the form of the function’s parameters using LLVMGetParam(): the second argument to is the index of the parameter we seek from the function.

We call LLVMBuildRet() to generate the return statement and arrange for the temporary result of the add instruction to be the value returned.

Analysis & execution

That concludes the building instructions phase of creating our function; the module is now complete. The next phase of the example is setting it up for execution.

First, let’s verify the module. This will ensure that our module was correctly built and will abort if we missed or mixed up any steps.

char *error = NULL;
LLVMVerifyModule(mod, LLVMAbortProcessAction, &error);
LLVMDisposeMessage(error);

LLVM provides either a JIT or an interpreter to execute the IR we’ve built. It will create a JIT if it can for the target platform, and fall back to an interpreter otherwise. In any case, the thing that will run our code is called the execution engine.

LLVMExecutionEngineRef engine;
error = NULL;
LLVMLinkInJIT();
LLVMInitializeNativeTarget();
if (LLVMCreateExecutionEngineForModule(&engine, mod, &error) != 0) {
    fprintf(stderr, "failed to create execution engine\n");
    abort();
}
if (error) {
    fprintf(stderr, "error: %s\n", error);
    LLVMDisposeMessage(error);
    exit(EXIT_FAILURE);
}

We could hard-code some integers to be summed, but it’s easy enough to have our program receive them from the command line.

if (argc < 3) {
    fprintf(stderr, "usage: %s x y\n", argv[0]);
    exit(EXIT_FAILURE);
}
long long x = strtoll(argv[1], NULL, 10);
long long y = strtoll(argv[2], NULL, 10);

Now that we have two integers in the representation of our host language, we need to transform them into the analogous representation in LLVM. LLVM provides factory functions that convert values into the types we need to pass to our function:

LLVMGenericValueRef args[] = {
    LLVMCreateGenericValueOfInt(LLVMInt32Type(), x, 0),
    LLVMCreateGenericValueOfInt(LLVMInt32Type(), y, 0)
};

Now for the moment of truth: we can call our (JIT’d) function!

LLVMGenericValueRef res = LLVMRunFunction(engine, sum, 2, args);

We have a result, but it’s still in LLVM-land. We recover it to a C type, the reverse operation from above, and print the sum:

printf("%d\n", (int)LLVMGenericValueToInt(res, 0));

And there we have it. We’ve programmatically constructed a function from the ground up, and had it run directly in machine code native to our platform. There is much more to LLVM, including control flow (eg., implementing if/else) and optimization passes, but we’ve covered the basics that would be in any LLVM-IR-to-code program.

Compiling

In order to compile our program, we need to reference the LLVM includes and link its libraries. Even though we’ve written a C program, the linking step requires the C++ linker. (LLVM is a C++ project, and the C API is a wrapper thereof.)

$ cc `llvm-config --cflags` -c sum.c
$ c++ `llvm-config --cxxflags --ldflags --libs core executionengine jit interpreter analysis native bitwriter --system-libs` sum.o -o sum
$ ./sum 42 99
141

Bitcode

One final thing. I mentioned previously that LLVM IR has three representations, including bitcode. Once you have a completed module, you can emit bitcode and write it out to a file.

if (LLVMWriteBitcodeToFile(mod, "sum.bc") != 0) {
    fprintf(stderr, "error writing bitcode to file, skipping\n");
}

From there, you can use tools to manipulate it, like llvm-dis to disassemble the bitcode into the textual LLVM IR assembly language.

$ llvm-dis sum.bc
$ cat sum.ll
; ModuleID = 'sum.bc'
target datalayout = "e-m:o-i64:64-f80:128-n8:16:32:64-S128"

define i32 @sum(i32, i32) {
entry:
  %tmp = add i32 %0, %1
  ret i32 %tmp
}

Source code of example

Here is the complete source of the program from above:

/**
 * LLVM equivalent of:
 *
 * int sum(int a, int b) {
 *     return a + b;
 * }
 */

#include 
#include 
#include 
#include 
#include 

#include 
#include 
#include 

int main(int argc, char const *argv[]) {
    LLVMModuleRef mod = LLVMModuleCreateWithName("my_module");

    LLVMTypeRef param_types[] = { LLVMInt32Type(), LLVMInt32Type() };
    LLVMTypeRef ret_type = LLVMFunctionType(LLVMInt32Type(), param_types, 2, 0);
    LLVMValueRef sum = LLVMAddFunction(mod, "sum", ret_type);

    LLVMBasicBlockRef entry = LLVMAppendBasicBlock(sum, "entry");

    LLVMBuilderRef builder = LLVMCreateBuilder();
    LLVMPositionBuilderAtEnd(builder, entry);
    LLVMValueRef tmp = LLVMBuildAdd(builder, LLVMGetParam(sum, 0), LLVMGetParam(sum, 1), "tmp");
    LLVMBuildRet(builder, tmp);

    char *error = NULL;
    LLVMVerifyModule(mod, LLVMAbortProcessAction, &error);
    LLVMDisposeMessage(error);

    LLVMExecutionEngineRef engine;
    error = NULL;
    LLVMLinkInJIT();
    LLVMInitializeNativeTarget();
    if (LLVMCreateExecutionEngineForModule(&engine, mod, &error) != 0) {
        fprintf(stderr, "failed to create execution engine\n");
        abort();
    }
    if (error) {
        fprintf(stderr, "error: %s\n", error);
        LLVMDisposeMessage(error);
        exit(EXIT_FAILURE);
    }

    if (argc < 3) {
        fprintf(stderr, "usage: %s x y\n", argv[0]);
        exit(EXIT_FAILURE);
    }
    long long x = strtoll(argv[1], NULL, 10);
    long long y = strtoll(argv[2], NULL, 10);

    LLVMGenericValueRef args[] = {
        LLVMCreateGenericValueOfInt(LLVMInt32Type(), x, 0),
        LLVMCreateGenericValueOfInt(LLVMInt32Type(), y, 0)
    };
    LLVMGenericValueRef res = LLVMRunFunction(engine, sum, 2, args);
    printf("%d\n", (int)LLVMGenericValueToInt(res, 0));

    // Write out bitcode to file
    if (LLVMWriteBitcodeToFile(mod, "sum.bc") != 0) {
        fprintf(stderr, "error writing bitcode to file, skipping\n");
    }

    LLVMDisposeBuilder(builder);
    LLVMDisposeExecutionEngine(engine);
}

See the GitHub repo for the Makefile and details on how to build the example on your machine.

T-shirt retirement

paulsmith@pobox.com (Paul Smith) — Mon, 28 Jul 2014 05:37:53 -0000

I said “smell you later” to some printed t-shirts in my possession today.

Texas "Tremodillo"

I stopped attending college after my freshman year, in part to go be the guitar tech for a band. This was 1996. Some friends of mine from high school had formed it, and a major label had signed them to produce a record. The band had me spend a week with César Díaz at his home and workshop in Pennsylvania. César had been the tech for guitarists like Stevie Ray Vaughn and Eric Clapton. Vaughn was, I think, César’s idol. He was a guitar player himself, and even had his guitar set up like Stevie’s, with heavy strings and high action. César had stopped touring, and made guitar amps and effects pedals instead. He also would teach new guitar techs the secrets of the profession. The Texas Tremodillo was one of two pedals he made. It had a tremolo effect, which is like a fast, regular wobble. We would spend the day in his workshop, tinkering on an amp that had blown a capacitor, or on a pedal that was buzzing. He was soft-spoken, reserved, and didn’t have a lot of patience for others. At night we’d get dinner at a local Indian place, or eat with his wife, whom I remember being kind, and their young son. One day we drove in to New York, somewhere in the Lower East Side or Village, to peck around at a used guitar show in an auditorium. We ran into Jimmy Vivino outside. He was wearing a pork pie hat and we walked around with him, looking at guitars—they were close friends. Later that night, Vivino and a bunch of other guys showed up at César’s workshop to jam. They played “The Weight” and many other classic rock songs. César soloed on an Stevie Ray Vaughn song. César was kind to me in the end. He wasn’t too annoyed when I would later call him to ask what to do about a failing pickup, or if I could use these tubes instead of those in this amp. I stopped working for the band after a year or so, and forgot about César, except when I would see the Tremodillo shirt in my closet. I rarely wore it—it was too big, and I felt like I should preserve it somehow, which I never made an attempt to do. Sometime later I read that he died in the early 2000s. He had been sick with liver failure not long after I visited him. At one point he had a transplant, but he only lived another couple of years. RIP, César.

Vote, F*cker

This is a shirt Ben Helphand gave me after a trip he took to Oregon. This would have been around 2004. The Bus Project had made the shirt. Ben and I became friends after we went to Minnesota in 2002, volunteering on the senate campaign of the late Paul Wellstone. After that, we would scheme up ways to try to improve small-d democratic participation. The Bus Project was a big part of the inspiration that led to the creation of the Election Day Advent Calendar in 2006. I’ll miss the Vote, F*cker shirt but many washes rendered it too shrunken and I looked like a sausage in it.

Pope Benedict’s Army

In 2005, I played on a co-ed softball team with my then-girlfriend, now-wife Michelle. We had been dating for a few months, and exploring a new group of friends in common together, some of whom were on the team. A conclave had elected Joseph Ratzinger pope on April 19. Three weeks later, the team organizer, John Pick, wrote the potential players an email:

I pulled the trigger on the league and put it on my credit card.

Belmont and California
6:15- 7:15
Thurs nights
starting June 2
80 per person.

we're called Pope Benedict's Army

IIRC, we did not do the pontiff proud. I think we might have won one game?

Molly Sircher, who was our ringer, conceived and produced the shirt. She had played softball at DePaul, and was by far the best player on our team. I wanted to hang on to the shirt, but the shape was a little boxy, and it had a musty smell it picked up probably during that summer and never got rid of.

PyCon US 2012

In 2012, I gave a talk at PyCon for the first and so far only time. The U.S. conference was in Santa Clara, CA, near San Jose. At the time, I was the deputy director of technology at the DNC. I was eager to attend tech conferences during my tenure there, to try to recruit software engineers to work with me on the campaign. This trip was a bust on that score, and I felt afterwards that my presentation had been lackluster. In retrospect, it was a stressed time. I felt like we didn’t have enough engineering help at the DNC to get through the campaign, and that it was difficult to attract any. I always liked Idan Gazit’s official PyCon snake logo that year and thought it was a handsome shirt. But it’s too small now, into the donation pile you go.

quickserver

paulsmith@pobox.com (Paul Smith) — Fri, 20 Jun 2014 02:44:10 -0000

Everyone knows python -m SimpleHTTPServer to start a quick webserver in a directory, it’s pretty awesome. It either mounts on port 8000 by default or you can give it an alternate port as an command line argument. But if you’re like me, you have lots of server processes running at once, you often get conflicts where the port is already in use, or you have to hunt and peck for a free one.

It’s much better to just let the OS assign an unused port to this quick webserver process, since you don’t really care where it goes. You can do this by passing 0 as the port argument, and that totally works: Python prints out the port it started the HTTP server on. There’s just one problem that trips me up: it prints out this new port number in such a way that you have to either mouse over, select, copy, open a new tab, paste, after typing in “localhost” or “0.0.0.0”, or you have to eyeball it and type it in with the new tab:

$ python -m SimpleHTTPServer 0
Serving HTTP on 0.0.0.0 port 61200 ...

See what I mean, you have to snag that 61200 somehow. I just want to start a webserver and have it immediately open to that address in my browser! That output should be clickable or hook into OS X’s open.

So this shell script does that.

$ ./quickserver
Serving HTTP on 0.0.0.0 port 61209 ...
http://0.0.0.0:61209/
127.0.0.1 - - [19/Jun/2014 17:21:56] "GET / HTTP/1.1" 200 -
127.0.0.1 - - [19/Jun/2014 17:21:57] code 404, message File not found
127.0.0.1 - - [19/Jun/2014 17:21:57] "GET /favicon.ico HTTP/1.1" 404 -

Probably too small to deserve it’s own repo but I figured someone might want to make it work on Ubuntu or whatever. Here it is on GitHub.

Things of recent interest

paulsmith@pobox.com (Paul Smith) — Thu, 22 May 2014 01:23:10 -0000

Here are a few things that have kept my interest lately:

Immediate-Mode Graphical User Interfaces Immediate-mode GUI is a straight-forward way of rendering a UI. It’s so simple, in fact, that I had to watch the video twice and do a tutorial to understand it. (I did the tutorial using SDL on OS X, and then ported it to JavaScript and canvas. Incidentally, I also used the C preprocessor on my JavaScript file, following this, to get statically-generated IDs for the widgets, it worked well.) The short explanation of immediate-mode GUI is, in your render() function that’s called for each frame of your application (ala requestAnimationFrame), you call functions that handle everything needed to draw, handle events, change state, and trigger other events, for your UI’s widgets. Your code looks something like if (button(id, x, y)) buttonWasPressed();, and that’s the entirety of rendering a button widget to the screen and handling click events on it. (In most cases, the widget functions return a boolean of whether the button was pressed, text field was changed, etc.) There are no callbacks or separate bindings. You maintain a tiny bit of global state that helps coordinate all the action. The upside is you have total control over your UI’s appearance and behavior. The downside is, you have to implement all of your UI’s appearance and behavior yourself. My feeling so far is that it is not something you would do if you were just implementing a typical UI in a web browswer, because you have all the browser’s widgets already at your disposal (not to mention HTML and CSS layout). You’d be reinventing the wheel. But it seems an ideal approach for a game UI (which is where I believe the idea originated, in the game development world), on platforms where you don’t already have a core UI or widget library available, in a native mobile application where performance is paramount, or any kind of custom application, even on the web, where you want or need complete over the UI, because, for instance, the supplied browser form elements don’t suffice. For example, immediate-mode GUI would fit something like Soundslice’s custom interface perfectly. (via)
Functional reactive programming This was an eye-opening talk for me. FRP could show us the way out of the fly bottle of complicated, callback-knotted async JavaScript UIs in the browser. The core idea is to treat events not as isolated occurances to be handled on a per-callback basic, but instead as collections, and once you do that, you have the power of higher-order functions like map, reduce, filter, and merge to describe complex behaviors as sort of a pipeline of collection processing. If you took Python’s list and generator comprehensions to browser events, you start to get the idea. RxJS is the tool highlighted in the talk, but bacon.js also seems to be a popular FRP library for JavaScript (haven’t tried it myself). There’s also a browser-based FRP tutorial to work through.
Traceur Programming FRP in JavaScript becomes a lot more pleasant with the new anonymous function syntax ((x) => x + 1 instead of function(x) { return x + 1; }) coming in JavaScript 6, or ES6. Traceur compiles ES6 to JavaScript that will run in current browsers, so you can code and get the benefit of the new syntax and other upcoming language features now. I have it as a build step in a Makefile, alongside minification. Then presumably, barring language-breaking changes, you’d be able to remove the build step at some future date when ES6 has become widely adopted.
Elm Elm is an entire language built around FRP which targets the browser. It is a Haskell or OCaml-like language that compiles down to HTML, CSS, and JavaScript. It seems to rely on the functor concept, which it calls ‘lift’, to convert browser events into something its built-in higher-order functions can process. It’s arguable that because it is a functional language like Haskell, it’s more naturally suited for dealing with the sorts of concurrency issues in UIs that libraries like RxJS were created to address in JavaScript. I’m still just in playground mode with it.
GopherCon talks It says something about the Go community how uniformly excellent and entertaining these talks are. Interesting and dense with practical knowledge.

I also recently tried to teach myself Acme. You can certainly glimpse the power of a system like that. But ultimately I decided editing speed is more important to me, and I’m pretty fast in Vim, so I abandoned the effort.

CoreOS seems like it could become pretty important.

Programming a computer, still a fun thing to do.

Fixing healthcare.gov

paulsmith@pobox.com (Paul Smith) — Mon, 03 Mar 2014 01:04:00 -0000

The story of how HealthCare.gov was fixed is told by Stephen Brill in the cover story of the March 10, 2014 issue of TIME magazine.

I also appeared on All In with Chris Hayes on MSNBC on February 28, 2014 to talk about it and my time with the ad hoc team.

Helping healthcare.gov

paulsmith@pobox.com (Paul Smith) — Thu, 31 Oct 2013 22:00:00 -0000

For the past 18 days, I have been part of the so-called “tech surge” that is helping to fix healthcare.gov.

We have already improved performance and stability of the site, and have helped to establish better processes for getting things done. There is still a lot of work to do to make the site the stable platform it needs to be.

There is much talk about, but for now I am staying focused on the work ahead.

healthcare.gov and ACA marketplace sites from the perspective of a software engineer

paulsmith@pobox.com (Paul Smith) — Fri, 04 Oct 2013 21:00:00 -0000

Cross-posted at Talking Points Memo

Full disclosure: my wife works at the Centers for Medicare and Medicaid Services (and this post is entirely my views, not hers), I worked on the president’s re-election campaign, and politically, I wish to see the PPACA law in general and the new marketplaces specifically succeed.

This has been an important week in the history of health care in the United States and for technology professionals working in government and on related services. Here are some thoughts on healthcare.gov and the state-based marketplace websites from my perspective as someone who was been developing and deploying web-based software applications for many years and who has experience with large systems and high-traffic sites.

As I write this there is a weird mixture of angst, elation, anticipation, control-freakery, sympathetic embarassment, hope, and generalized anxiety about healthcare.gov and the state-based marketplace sites among supporters of Obamacare and also among left-leaning technologists. On the one hand, affordable health insurance is now available to any American; on the other, availability doesn’t necessarily mean you can get it, due to errors during the sign-up process on healthcare.gov and the state-based marketplace sites which have been widely reported. There is a sense that, while this is primarily a technology problem to be fixed, the political problem is larger and may risk the implementation and success of the overall law—if enough people perceive the marketplace sites to be broken, support for the law—already tenuous according to some polls—will erode, and the law’s opponents’ argument that implementation needs to be delayed or even defunded will be persuasive.

It is natural for technologists to go into crisis mode and immediately start triaging problems and brainstorming solutions. They are smart and want to help and believe they can fix things. This is a totally appropriate attitude, and their nervous feelings are valid. The people implementing the marketplace sites have all the problems of developing large-scale, integrated, enterprise software, plus delivering a high-quality consumer experience. I think we should also have some perspective on what’s happening, and I would caution against panic. There are a number of things to bear in mind:

Architecture. Caveat: I don’t have direct experience with the marketplace sites, only second-hand knowledge about how they’re implemented. That said, I know some details. The main thing to understand is there is no one, single Obamacare site—there is healthcare.gov, which is home to the federal marketplace and a portal to the state-based marketplaces, and there are the 14 state-based sites. The federal marketplace is for all Americans for whom their states either chose not to implement their own marketplace or their site isn’t ready yet.

The user interface, or frontend, of healthcare.gov is quite interesting. It’s design has been compared favorably with top commercial sites. It was implemented using modern web development techniques, working well across browsers and on mobile devices. We used similar techniques on the president’s campaign: generate static files from templates with Jekyll, serve them from behind a CDN (Akamai, in the case of healthcare.gov). This gives you a very fast, low-latency user experience that’s very durable in the face of high-traffic loads. Dave Cole has written about the process by which the frontend was developed, it’s fascinating to read if you have any experience with how government sites have typically been built. And you’ll notice, no one has complained about being able to access the site itself. healthcare.gov itself has been up continuously since October 1st. It’s submitting forms back to the server that’s been the issue.

About the backend server: having a great frontend experience means little if you can’t complete a transaction with the service. (Although, not nothing—many important informational consumer resources reside on the frontend and have been wholly unaffected by the reported outages.) People may not realize that a major part of PPACA was the streamlining the rules surrounding Medicaid eligibility. healthcare.gov serves then as a portal, routing people to the appropriate resource they need to help them get covered. This means not only sending you to your state-based marketplace site if your state has one, but directing you to Medicaid instead of the marketplaces, if you are eligible, or determining that you meet requirements for a subsidy on the marketplace. In order to do these things, the system verifies your identity, income, and other personal data with new and existing government databases. In other words, so that it may route you to the correct entity that will be offering or providing you health insurance, healthcare.gov looks up your information online (i.e., during the course of a request-response cycle with the site). The architecture of healthcare.gov is an example of both the challenges of integration—different software services working together—and distributed systems—independent systems that may or may not be available or meeting certain service-level agreements or standards.

An alternative to an online lookup of personal data or account creation would be to store the request for later processing. This is commonly referred to as queuing. It turns an online process into an offline one: the system goes from being synchronous—waiting for a response from another system after making a request to it—to asychronous—not waiting for the response and arranging to check the result somehow later. This is not a trivial change, as people who have implemented these systems will know. It requires a fairly fundamental redesign of the flow of the software, the application of business rules, and how certain operational details are carried out. However, it is now widely established pattern for system development. For example, when you buy a ticket from an airline reservation site, and wait for your credit card to be processed and the whole transaction to complete, that is an example of a synchronous, or online, system (internally, the system may very well be composed of asynchronous services, but the frontend interface that the user interacts with presents a synchronous experience). When you place an order with Amazon, on the other hand, you receive a response almost immediately (“thank you for your order!”). If there is a problem with your order—your card is expired, or was declined—you later receive a notification, usually an email, asking you to update your payment info. That is an example of an asynchronous system. Why does this matter? Asynchrous, distributed systems have components that are de-coupled—if one fails, it doesn’t necessarily bring the rest down with that. You have to design your system to be resilient for such failures, but it enables you to do things such as quickly store the contents of a form submission and acknowledge the user with a thank-you message when the system that looks up personal data or creates new accounts is down. This introduces operational complexity: you must have a functioning queue system, you must have programs that process the queue, they need to be monitored and errors have to be handled appropriately (since there is no online user that can respond to them), and notification systems like email that are out-of-band of the website may need to be employed (in case you need to ask the user to come back and provide more information).

I don’t know to what extent healthcare.gov was designed with the challenges of distributed systems in mind, but moving toward more asynchronous data flows where possible will alleviate some of the poor user experiences we’ve seen reported. It will also free them up to still take in a high volume of requests while independently working to fix bugs in the transactional or informational data services.

Errors, user experience, and expectations. In the reports about problems users have experienced with healthcare.gov and the state-based marketplace sites, we’ve seen screenshots and descriptions of ugly error messages. The quality of the healthcare.gov frontend, with its attractive design that’s more like a retail site than a government site, I think has primed users for an overall experience experience reflective of that design. They expect the under-the-hood to be as good as the hood appears. Ugly error messages, and disappointment at not being able to complete the sign-up process, frustrate expectations that were set by the site itself, and by its champions, myself included, who encouraged people to go to the site on day 1.

The ugly error messages have for the most part been replaced with friendlier views, and we know that the backend engineers are working to fix the sign-up process. A way to handle expectations at this point for site users might be to remind them, at the point of a system error or maintenance page, that they have until December 15th to enroll for coverage beginning January 1st, 2014, and until March 31st to enroll for coverage in 2014. Another mechanism to reassure a frustrated user that couldn’t sign up might be a simple form that collect email addresses to be notified when the system is back online.

Unprecedented environmental hostility and limited time. Ever since PPACA was passed, I’ve heard griping about would it take so long for Obamacare to come online. In reality, given the scope of the changes to the regulatory framework for health insurance markets, changes to Medicaid eligibility, and the implementation of the federal and state-based marketplaces, there was a huge amount of work to deliver a major new social insurance program in such a short amount of time. It’s natural that there would be bugs, and the president, HHS, and CMS teams have said as much. Going back, many regulatory and technical fixes to the law have been prevented from being taken up by Congress by the law’s opponents. And now of course the federal government is shutdown due in part to opposition to the law. While little of this hostility is new information to implementers, it is nonetheless remarkable what they were able to achieve in this environment. A suspected denial-of-service attack on New York’s site only compounds the outside forces set against this fledgling program.

State-based marketplaces. It is a joke among Medicaid staff that you’ve seen one state’s Medicaid system, you’ve seen one state’s Medicaid system. 14 states chose to implement their own marketplace. While their sites will share some common services with the federal marketplace, and some large contractors worked on multiple sites, these are independently developed and administered sites with their own architectures, infrastructure, designs, and staff.

Time. My strong belief is that these early problems will be largely forgotten very soon. People will get covered. People are getting enrolled, now, despite the problems. It’s worth remembering what happened during the implementation of Medicare Part D. There were many of the same types of reports, from pharmacies that couldn’t connect to government data services, to seniors that were temporarily unable to receive their benefit. Do we think about those stories now when we think about Part D? Of course not. Part D is just as strong and beloved piece of the social safety net firmament as any other. So it will be with Obamacare.

None of this is to excuse the problems healthcare.gov has had this week. October 1st was a known deadline, major sites have been launched under hostile or constrained circumstances before. But I think if we understand a bit more everything involved, we might not be so quick to condemn or dismiss out of hand.

Update: my original post incorrectly stated there were 24 state-based marketplaces; there are 14.

Public Good Software and me

paulsmith@pobox.com (Paul Smith) — Wed, 24 Jul 2013 00:00:00 -0000

Software that helps civil society organizations—non-profits, NGOs, charities—do their work should be better. It can be better. I want to help make it better. That’s why I’ve started, along with two colleagues from the 2012 election, a new company, called Public Good Software.

If you survey the kind of technology that CSOs use to support their missions, it’s a sorry sight. It’s full of complex interfaces and complicated experiences, thin layers over old systems, aging and poorly-supported applications, and disconnected data. Worse, the companies that develop and sell this software seem to have stagnated—their websites often feel frozen in time from 10 years ago. There isn’t a lot of innovation happening here.

This is frustrating. These organizations are increasingly counted on to confront our most serious challenges, like hunger, climate change, conservation, joblessness, homelessness, affordable housing, poverty, public health, literacy and education, and yet the technology tools they need are not keeping up with them. Why shouldn’t people who work at CSOs expect software every bit as good and as powerful as what they use on their smartphones everyday?

The situation is not much better if you are a supporter of these organizations. Let’s say you give $100 a year to your local public radio station, volunteer regularly at a community garden, and write your congressperson on behalf of an animal rights advocacy campaign. You should be able to keep track of all you do, and if you choose, share it with your community. You should be able to find new opportunities that you might not have been aware of, based on the kinds of organizations you support. You have a civic profile, based on how you help others, that you should be able to claim and control.

The first problem to tackle, and the first product that PGS will be developing to help solve, is the problem of disconnected data. It’s a fundamental problem that impacts CSOs and their supporters. Information about donors is in one database, volunteers in another, email subscribers in a third, then there’s Facebook likers and Twitter followers and you don’t know if they’re in the other databases … Think of Mint.com, the way that service in its early days brought sanity to your financial life. We want to connect these disparate databases in much the same way and provide CSOs with a new, high-level view of their data, with more complete pictures of their supporters. We’ll do this through the use of statistical models, summaries, and visualizations that let CSOs track how they are doing on the goals they set for themselves. This will become a platform on which, over time, we’ll create and add new products.

We aren’t setting out to reinvent the wheel. We’re not building YACRM (yet another CRM). We’re not even aiming to replace the technology CSOs currently use. We want to provide new tools and experiences that reflect the new needs of these organizations and their supporters. And it will be great, modern software: fast, a pleasure to use, designed and built for mobile devices, with maps and geo data throughout, and ready for international users. This is what CSOs and their supporters deserve.

We decided early on that we wanted to be aligned with our customers in a way that was sustainable, that built trust, and held us as a company accountable to ensure that a double-bottom-line isn’t just a convenience to be discarded when the “real” pressure (i.e., financial) builds up. At the same time, we knew that the best way to grow the company the way we believed it should was through traditional capital investment. That led us to become a benefit corporation. * This is new legislation, found in a dozen or so states, and we think we’re one of the first software startups to go that route. Essentially what this means is that we are in all other respects like a normal for-profit company (we are a C corp under the hood), but that we have a social mission, stated right in our corporate by-laws (ours is roughly “to return more capital to organizations that provide a benefit to the public”), and there are two mechanisms ensuring that the social mission is not discarded if it becomes inconvenient. One is that there is a board-level position called the social benefit director, whose job is to ensure that the company is sticking to the social mission. The other is that our fiduciary responsibility to our shareholders does not override that social mission. This is where the rubber meets the road—you won’t see PGS suddenly pivot to sell software to the NRA to return a few more percentage points to our investors.

All this comes at an interesting time for the public sector. Executive directors and supporters alike are demanding more accountability and better ways of measuring success or failure. At the same time, demand for CSO services is up, while capital—in the form of dollars and volunteer time—is flat, or even declining slightly. There is a small but increasingly vocal minority of development directors saying CSOs need to be less obsessed with converting every dollar to program, and to find new ways to expand and be more effective. All this leads to an increasing need for better data and analysis, and better tools—for fundraising, communications, volunteer mobilization—that build on it. We think there is an enormous opportunity here.

So it will be fun. I’m the CTO. My co-founders Jason and Dan were director of UX and director of development, respectively, in the OFA 2012 technology department. We’ve also got two more OFA tech alums, Chris and Aaron, as part of the founding team. Our current status is, talking with potential investors, meeting with a handful of CSOs who’ve agreed to pilot the software as we build it, and making prototypes and getting our basic infrastructure running. We’re using Go for our server software, which is a fun language. Incidentally, it should go without saying that we’re big believers in open source, but most of what we develop will be available under an open source license, and I’ll write more about that in another later post. But I’ve already released some open source software that was developed on PGS time, gogeos, a small Go library for working with geospatial data. We’ll be hiring software engineers soon, so if any of this sounds interesting to you, drop me a line.

* Not to be confused with the B Corp certification, which is related but is not a corporate structure. ↩

Announcing gogeos, a spatial data library for Go

paulsmith@pobox.com (Paul Smith) — Wed, 12 Jun 2013 21:00:00 -0000

I am announcing the initial release of gogeos, a library for the Go programming language. gogeos provides spatial data operations and geometric algorithms. While it is a Go library, the hard work is done by the GEOS C library.

The kinds of things you can do with gogeos include:

set-theoretic operations, such as computing the intersection, union, or difference of two geometries,
topological operations, such as computing buffers and convex hulls,
binary predicates, such as whether two geometries intersect or are disjoint,
validity checking, and
much more.

It also provides interoperability with other spatial data processing systems like PostGIS by decoding and encoding geometries as Well-Known Text (WKT) and Well-Known Binary (WKB).

I started working on gogeos because I looked at the landscape of GIS and spatial data libraries for Go, and found it lacking. Binding to the GEOS library with cgo was a way to get started quickly. Relying on GEOS has its drawbacks, for instance, it creates a large binary dependency, and cgo doesn’t allow for cross-platform compiles.

In the long term, I would like to create a pure Go library that implements functionality such as GEOS and the JTS provide. That would allow for use on platforms that don’t or can’t support C shared libraries, such as Google App Engine, and make it easier for developers to get started working with it.

In the meantime, I hope that gogeos enables more developers who are working with spatial data or GIS to get involved in the Go ecosystem.

gogeos is a fully open-source project, and I welcome contributors and feedback.

—@paulsmith

Democratic Party’s voter registration app is now free and open-source software

paulsmith@pobox.com (Paul Smith) — Tue, 29 Jan 2013 03:30:00 -0000

We (the DNC) have relicensed the Democratic Party’s voter registration application under a standard MIT license, and accompanied the source code with an advisory notice regarding the use of the software. I wanted to explain why we did this.

The Democratic Party initially released the source code to its online voter registration app late last summer, with the intent of making it available for all the standard reasons people and organizations choose when they open-source code: so that it can be improved, so that bugs can be fixed, so others can take it and build further new applications on top of it.

However, it was quickly apparent we had a problem with the open source community. The issue was with the license. It contained a clause that placed restrictions on its use. The reason this clause was included was to address our concerns regarding the highly regulated and closely monitored nature of voting and voter registration. We wanted to avoid a scenario where, either inadvertently or through malice, someone set up a site based on the code, and without following state and federal guidelines and rules, defrauded or disenfranchised a voter. Now, regardless of our good intentions on this matter, the fact that we had taken a standard open source license and amended it with this restrictive clause meant that we did not pass “free and open source” muster, with emphasis on the “free” as in “speech”.

We needed a solution that addressed both the problematic license and our concerns regarding the good-faith use of the software that protected voters. A member of the open source community, Karl Fogel, stepped forward with a proposal: change the license to an unmodified standard OSI-approved license, and include along with the source code an advisory document that outline these legal concerns. The notice would not be binding or otherwise modify the license and therefore terms of use; however, like any piece of open source software, people are “free” to use it illegally, and free to suffer the consequences if they do. The important thing is to remind users of their responsibility to act in accordance with the law, especially when it comes to something as precious and beseiged as our franchise. We feel the combination of a standard FOSS license and a non-binding advisory document expressing the intent of the copyright holder is a way forward for political organizations to release potentially sensitive soure code while at the same time communicating the vital issues animating and conditioning that release.

Now, some observers may not see this as remarkable. There was a bad license, it’s been changed, what’s the fuss? I want to acknowledge the hard work across the organization, from software engineers to lawyers, to find a way to give back to the open source community and satisfy the concerns of both. There are many reasons why organizations don’t release their software as open source. We want to set an example, however small, that there are non-license ways to state any reservations or guiding principles your organization that ordinarily would have prevented a release. Key among these are engaging with the community. As we have learned time and again, good solutions often originate through trust and dialogue.

Lexing Oscar

paulsmith@pobox.com (Paul Smith) — Fri, 11 Jan 2013 10:01:00 -0000

For the past n years, I’ve built and hosted a web app that lets my film buff friends and me compete by guessing who will win the Academy Awards by voting for nominees in each category. I do a new one from scratch each time. It’s a fun diversion, but it’s also a playground for me to try out new skills picked up in the past year or new tools or techniques I’ve been wanting to fool around with.

The first thing I need to do each time is get a list of that year’s nominees in some machine-readable format. Being a lazy programmer, I’m not going to type in the 100+ nominees into a spreadsheet or text file, so I wind up writing a short throwaway script to coax some list I’ve found online into the form I need for importing. This sort of script is the meat-and-potatoes of the workaday programmer, the ones you whip up in a few minutes as an intermediate step in a larger task. Ordinarily, they’re hardly worth commenting on. They have a vanishingly short half-life, since there is rarely any generality to be derived from them: they only work on the exact input given.

This year, I wanted to try out a new way of getting the nominee list together. Sure, for a small task like this, there’s no compelling reason not to go with the same kind of quick throwaway script as before. But again, the point of the Oscars app is to exercise new or different muscles.

My goal was to generate a representation of the list of nominees in a format such as CSV suitable for importing into a database. I found a source list of nominees, formatted as follows: the name of the category is on the first line, then a list of nominees comes next, each requiring two lines, one being the name of the film and the other a name or list of names associated with the nomination, all followed by a blank line, then the subsequent category starts on the next line and we repeat. I wanted to read in and parse text formatted like this:

Directing
Amour
Michael Haneke
Beasts Of The Southern Wild
Benh Zeitlin
Life Of Pi
Ang Lee
Lincoln
Steven Spielberg
Silver Linings Playbook
David O. Russell

Actor in a Leading Role
Lincoln
Daniel Day-Lewis
…

And convert it to this:

Directing,Amour,Michael Haneke
Directing,Beasts Of The Southern Wild,Benh Zeitlin
Directing,Life Of Pi,Ang Lee
Directing,Lincoln,Steven Spielberg
Directing,Silver Linings Playbook,David O. Russell
Actor in a Leading Role,Lincoln,Daniel Day-Lewis

Normally, to scan and parse this type of input, I would write a program to loop over each line of the input, with a number of global state variables, keeping track of what tokens I was currently processing. In this case, I might have global state variables indicating whether I was currently processing a category and what the current film is, and I would have a set of if/elif/else statements for tests of various combinations of those variables, including for the contents of the current line (a blank line or EOF indicating the end of a category).

Each time through the loop, then, we get a line from the text and check to see what state we’re in. While this approach is easy to get started with, it leads to fragile code and requires a lot of mental bookkeeping. Worse, each time through the loop, the state of where we are and what we just did is forgotten. That accounts for the proliferation of state variables to be checked in order to restore the state of the processing. Think about it, we are marching sequentially through this text, wouldn't it be nice if we could just pick up where we left off with the last action?

My approach this time is inspired by Rob Pike’s talk on lexical scanning. Instead of a loop where we get the next bit of text to examine and restore the state of the processing by examing a number of state variables, we instead have a loop where a function is called that returns the next function to be called. In other words, a function is called which does a bit of processing of the text, advancing the pointer or consuming from a stream, maybe emitting some tokens, and then returns to the caller the function that should proceed from where the returning function just left off. For instance, we just scanned a category, which means we know we are ready to scan a film, so call the film scan function. That next function can just carry on its processing without any state-checking preliminaries. The loop of our system therefore is very concise, just calling functions and getting the next one to call the subsequent time around. Roughly:

def run():
    state = start_state
    while state:
        state = state()

When we are done processing input, say, EOF is reached, the state function currently executing can return None to the caller, which will end the while loop and shut down the machine.

The advantage to the programmer is that instead of building up a complicated switch of control to determine what state our machine is in, we simply write functions that proceed naturally from the last state, and then hand off control to the subsequent function. It’s clean and helps keep the complexity of the system manageable. Any time you can reduce the number of control flow statements and replace them with simple functions is a win in my book.

So back to the Oscars. This year, I opened the official nominee list from the Academy’s site, a PDF. I selected the text, copied and pasted it into a text document. The only manual editing I did was to add a blank line between each group of nominees by category, and I also joined lines in categories like Music (Original Song) where the title of the song and the name of the composer is split across multiple lines—these were quick changes that simplified the scanning logic.

There are three state functions in my program, one for each of category, film, and name (or list of names):

def lex_category(lexer):
    lexer.emit(CATEGORY, title(getline()))
    return lex_film

def lex_film(lexer):
    line = getline()
    if line == '':
        lexer.emit(BLANK, '')
        return lex_category
    elif line is None: # EOF, shut down lex machine
        return None
    lexer.emit(FILM, title(line))
    return lex_names

def lex_names(lexer):
    lexer.emit(NAMES, title(getline()))
    return lex_film

(title() handles some odd case formatting in the source text by converting strings to title case.)

lex_film is the most complex, having to handle the possibilities of a blank line, meaning we’re moving on to the next category, EOF, which shuts down scanning, and the film itself. But in all cases we merely return the next state function to called (or None).

Admittedly, this is more sophistication than normally appears in my yearly nominee list parsing. But I have to say that I was able to write the program in about the same amount of time, found it ran correctly the first time, and was actually kind of fun to do. And while this was a silly example, you can start to see the power you can get from this approach when lexing different kinds of input with more and more complex tokens. When you lift the flow of control up a level and let your functions focus on the task at hand, the result I think is a more elegant and more obviously correct program.

The script and input text are here, and the output list of nominees is here.