The Sound of Software

Availability Zones in Juju

2014-08-19T10:49:00.001+08:00

You would be forgiven for thinking that I'd fallen off the face of the earth, considering how long it has been since I last wrote. I've been busy with my day job, moving into a new house; life in general. Work on llgo has been progressing, mostly due to Peter Collingbourne. I'll have more to say about llgo's progress in future posts.

This post is about some of the work I've done on Juju recently. Well, semi-recently; this post has been sitting in my drafts for a little while, waiting for the new 1.20.5 release to be announced.

Availability Zones in Juju

One of the major focuses of the Juju 1.20 release has been around high availability (HA). There are two sides to this: high availability of Juju itself, and high availability of your deployed services. We’ll leave the “Juju itself” side for another day, and talk about HA charms/services.

Until now, if you deployed a service via a charm with Juju, your cloud instance containing the service unit would be allocated wherever the cloud provider decided best. Most cloud providers split their compute services up into geographic regions (“us-east-1” in Amazon EC2, “US West” in Microsoft Azure, etc.). Some providers also break those regions down into “availability zones” (though the actual term may vary between providers, we use the term availability zone to describe the concept). An availability zone is essentially an isolated subset of a region.

If you’re developing an application that demands high availability, then you probably want to make sure your application is spread across availability zones. Some providers will guarantee a service level agreement (SLA) if you do this, such as on Microsoft Azure. Provided that you allocate at least two VMs to a “Cloud Service” on Azure, then you’re guaranteed a 99.95% uptime under the SLA and you get reimbursed if the guarantee isn’t met.

In Juju 1.20, there are two options for distributing your service units across availability zones: explicit (akin to machine placement) and automatic. So far we have enabled explicit availability zone placement in the Amazon EC2 and OpenStack (Havana onwards) providers, with support for the MAAS provider on the horizon. To add a new machine to a specific availability zone, use the “zone=” placement directive as below:

juju add-machine zone=us-east-1b

As well as support for explicit zone placement, we’ve implemented automatic spreading of services units across availability zones for Amazon EC2, OpenStack and Microsoft Azure. When cloud instances are provisioned, they will be allocated to an availability zone according to the density of the availability zone population for related instances. Two cloud instances are considered related if they both contain units of a common service, or if they are both Juju state servers.

To illustrate automatic spread, consider the mongodb charm. You’re going to use MongoDB as the datastore for your application, and you want to make sure the datastore is highly available; to do that, you’ll want to create a MongoDB replica set. It’s trivial to do this with the mongodb Juju charm:

juju deploy -n 3 mongodb

Wait a little while and you’ll have a 3-node MongoDB replica set. If a node happens to disappear, then the replica set will rejig itself so that there is a master (if the master was in fact lost) and everything should continue to work. If all the nodes go away, then you’re in trouble. This is where you want to go a step further and ensure your nodes are distributed across availability zones for greater resilience to failure. As of Juju 1.20, that “juju deploy” you just did handles that all for you: your 3 nodes will be uniformly spread across availability zones in the environment. If you add units to the service, they will also be spread across the zones according to how many other units of the service are in the zones. Let’s see what Juju did…

$ juju status mongodb | grep instance-id
instance-id: i-7a6d2b50
instance-id: i-ff1562d4
instance-id: i-627f0a30

$ ec2-describe-availability-zones
AVAILABILITYZONE us-east-1a available us-east-1
AVAILABILITYZONE us-east-1b available us-east-1
AVAILABILITYZONE us-east-1d available us-east-1

$ ec2-describe-instance-status i-7a6d2b50 i-ff1562d4 i-627f0a30 | grep i-
INSTANCE i-627f0a30 us-east-1d running 16 ok ok active
INSTANCE i-ff1562d4 us-east-1a running 16 ok ok active
INSTANCE i-7a6d2b50 us-east-1b running 16 ok ok active

(Note: the ec2-* commands are available in the ec2-api-tools package.)

Juju has distributed the mongodb units so that there is one in each zone, so if one zone is impaired the others will be unaffected. If we add a unit, it will go into one of the zones with the fewest mongodb units.

Explicit placement is currently only supported by Juju’s Amazon EC2 and OpenStack providers, but automatic spread is also supported by the Microsoft Azure provider. Due to the way that Microsoft Azure ties together availability zones and load balancing, it is currently necessary to forego “density” (i.e. explicit machine placement) in order to support automatic spread. If you are upgrading an existing environment to 1.20, then automatic spread will not be enabled. Newly created environments enable spread (and disable placement) by default, with an option to disable (availability-sets-enabled=false in environments.yaml).

Enjoy.

llgo on ssa

2014-01-06T22:02:00.001+08:00

Hello there!

I've been busy hacking on llgo again. In case you're new here: llgo is a Go frontend for LLVM that I've been working on for the past ~2 years on and off. It's been quite a while since I last wrote; there has been a bunch of new work since, so I have some things to talk about at last.

A few months ago, I started working on rewriting swathes of llgo's internals to base it on go.tools/ssa. LLVM uses an SSA representation, which made the process fairly straightforward. Basing llgo on go.tools/ssa gives me much higher confidence in the quality of the output; it also presented a good opportunity to clean up llgo's source itself, which I have begun, but certainly not finished. llgo is now able to compile all packages in the standard library, except those that require cgo (net, os/user, runtime/cgo).

llgo now works something like this:

Go source is scanned and parsed by go/ast and go/parser, producing an AST;
The AST is fed into go/types, type-checking;
The output of go/types is passed onto go.tools/ssa, which generates the SSA form;
llgo translates the go.tools/ssa SSA form into an LLVM module,
llgo-build links the LLVM modules for a program together and translates to an executable.

go.tools/ssa supports translating a whole program to SSA form, but llgo works in the traditional way: packages are translated one at a time. Whole program optimisation is enabled by linking the LLVM modules together, prior to any translation to machine code.

There were a few bits that I stumbled on, when rewriting. Alan Donovan, the author of go.tools/ssa, was kind enough to give me some assistance along the way. Anyway, the main issues I had were:

Translating Phi nodes requires a bit of finessing, to ensure processing of the Phi or the edges is not order-sensitive. This was dealt with by generating placeholder values for instructions that haven't yet been visited, and then replacing them later.
ssa.Index is emitted for indexing into arrays. If an array is in a register, then indexing it means extracting a value; in LLVM, an array element extraction requires a constant index. This is currently kludged by storing to a temporary alloca, and using the getelementptr LLVM instruction. Hopefully I'm missing something and this is easily fixed.
The Recover block is not dominated by the entry block, so it may not be valid for it to refer to the Alloc instructions for parameters and results. To deal with this, I generate a prologue block that contains the param/result Allocs; the prologue block conditionally jumps to either the recover or entry block, depending on panic/recover control flow. Alan has agreed to do something along these lines in go.tools/ssa.
The ssa.Next instruction required some assumptions to be made about block ordering and instruction placement, in order to be able to translate string-range using Phi nodes. Recent changes to go.tools/ssa exposed the dominator tree, making it possible to do away with the assumption now.

Various significant changes have been made during the course of the migration to go.tools/ssa:

Interfaces are now represented like in gc: empty interfaces with the runtime type & data, non-empty interfaces with an "itab" and data. Russ Cox wrote an article about the interface representation back in 2009.
Panic/recover (and defer, by consequence) are now using setjmp/longjmp. I had been using exceptions, but it was rightly pointed out to me that this wouldn't work unless there were a way of doing non-call exceptions in LLVM (which has not been implemented). The setjmp/longjmp approach incurs a cost for every function that may defer or recover, but it works without modifications to LLVM. Perhaps this will be revisited in the future.
go/types/typemap is now used for mapping types.Types to runtime type descriptors and LLVM types. Runtime type descriptors are now generated more completely, and more correctly. Identical type descriptors will now be merged at link-time.
llgo no longer generates conditional branching for calls to non-global functions, when comparing structs, or in map iteration. Apart from producing better code, this makes it much simpler to work with go.tools/ssa, which has its own idea about how the SSA basic blocks relate to one another.

There have also been miscellaneous bug fixes, and improvements, not directly related to the move to go.tools/ssa. Some highlights:

A custom importer/exporter, thanks to Fredrik Ehnbom. The importer side is disabled at the moment, due to an apparent bug in go.tools/ssa.
Debugger support, thanks again to Fredrik Ehnbom. I haven't reenabled it since the move to go.tools/ssa. I'll get onto that real soon now, because debugging without it can be tiresome.
llgo-build can now take a "-test" flag that causes llgo-build to compile the test Go files, yet again thanks to Fredrik Ehnbom. This is currently reliant on the binary importer being enabled, so it won't work out of the box until that bug is fixed.
Shifts now generate correct values for shifts greater than the width of the lhs operand.
Signed integer conversions now sign-extend correctly.
bytes.Compare now works as it should (-1, 0, 1, not <0, 0, >0). "llgo-build -test bytes": PASS
llgo-build can now take a "-run" flag that causes llgo-build to execute and then dispose of the resulting binary.
Type strings are propagated to LLVM types, making the IR more legible, thanks to Travis Cline.

I think that's everything. I have various things I'd like to tackle now, but not enough time to do it all at once. If you're interested in helping out then there's plenty to do, including:

Move to using libgo. Ideally the gc runtime would be rewritten in Go already, but that's not going to happen just yet. The compiler and linker are due to be rewritten in Go soon, which is a lot of work as it is.
Finish off runtime type descriptor generation (notably, type algorithms).
Get PNaCl support working again. This should be pretty close, but requires the binary importer to be enabled.
Implement cgo support.
Implement bounds checking, nil pointer checks, etc.
Get garbage collection working. There's Pull Request #108, but this is perpetuating the problem that is llgo's custom runtime. Since GC is fairly invasive, I don't want to go tying llgo to that runtime any more than it is currently. I expect this will have to wait until libgo is integrated.
Escape analysis. This is a must-have, but not immediately necessarily. The implementation should be based on go.tools/ssa, interfacing with the exporter/importer to record/consume information about external functions.
Make use of go.tools/ssa/ssautil/Switches. This is an optimisation, so again, not immediately necessary.

If you want to have a play around, then grab LLVM and Go, and then:

go get github.com/axw/llgo/cmd/llgo-dist && llgo-dist
llgo-build <some/package> or llgo-build file1.go, file2.go, ...

Let me know how you get on with that.

Here's hoping 2014 can be a productive year for llgo. Happy new year.

Cheers,
Andrew

llgo update #14

2013-08-16T22:55:00.000+08:00

Ahoy there, mateys!

It's been three months since our last correspondence. Apologies for the negligence. I've been busy, as usual, but it's more self-inflicted than usual. I've taken up a new role at Canonical, working on Juju. I'm really excited about Juju (both the concept and realisation), and the fact that it's written in Go is icing on the cake. Working remotely is taking some getting used to, but so far it's been pretty swell. Anyway, you didn't come here to read about that, did you?

I'm still working on llgo in the background, quietly prodding it along towards the 0.1 milestone. There's just one big ticket item left, and that's partially done now: channels. I've just finished porting the basics of channels from gc's standard library to llgo's runtime. That doesn't include select, which is entirely missing. When that's done, I'll be content to release 0.1.

So what's new since last time?

There's a new llgo-build tool, which takes the pain out of building packages and programs with llgo and the LLVM toolchain. Just run "llgo-build <package>", and you'll either build and install a package, or build a program in the working directory. There's no freshness checking, so you're currently required to manually build all dependencies before building a program.
Simplified building against PNaCl: llgo-dist now accepts a "-pepper" option, which points to a NaCl SDK.
Implemented support for map literals.
Implemented complex number arithmetic.
Implemented channels (apart from anything select-related)
Numerous bug fixes.

In my previous post I talked about having implemented panic/recover, and having implemented them in terms of DWARF exception handling. Well, it looks like PNaCl isn't going to support that, at least initially, so a setjmp/longjmp version is likely inevitable now.

I also said I would be working on a temporary for of cmd/go. I gave up on that, after hitting a few stumbling blocks. I figured it was more important to actually get the compiler and runtime working than get bogged down in the tooling, hence the simpler llgo-build tool.

That's about it! "Feature complete" is getting closer, though lots of things still don't work very nicely. Still no garbage collection, no proper escape analysis, etc. Those will come in time.

For now, though... I think I might go catch up on some sleep.

llgo on Go 1.1

2013-05-18T21:28:00.000+08:00

Hi folks,

(For those of you coming from HN/Twitter/elsewhere, this is a post about llgo. llgo is an LLVM frontend for the Go programming language).

In my last post I mentioned that work had began on moving to Go 1.1 compatibility; this has been my primary focus since then. Since Go 1.1 is now released (woohoo!), I've gone ahead and pulled all the changes back into the master branch on GitHub. If you want to play around, you can do the following:

Get Go 1.1.
Get Clang and LLVM (I've tested with 3.2, Ubuntu x86-64). Make sure llvm-config is in your $PATH.
Run "go get github.com/axw/llgo/cmd/llgo-dist"
Run "llgo-dist". This will install llgo into $GOBIN, and build the runtime.

The biggest new feature would have to be: defer, panic and recover (I'm lumping them together as they're closely related). I've implemented them on top of LLVM's exception handling support. The panic and recover functions are currently tied to DWARF exception handling, though it's simple enough that it should be feasible to use setjmp/longjmp on platforms where DWARF exception handling isn't viable.

Aside from that, there's some new bits and bobs:

Method sets are handled properly now (or at least not completely wrong like before). This means you can use a embedded types' methods to satisfy an interface.
"return" requirements are now checked by go/types
cap() is now implemented for slices.
llgo-dist now builds against the LLVM static libraries (if available) by default now, with an option for building against the shared libraries.

I'll be working on a temporary fork of cmd/go to build programs with llgo, while a long-term solution is figured out. I'd also like to get PNaCl integration working again, given that its release is nigh.

That's all for now.

llgo update #12

2013-03-01T21:50:00.002+08:00

Oh my, it's been a while.

In my previous post I wrote about llgo and PNaCl. I haven't had much time to play with PNaCl recently, but I have been prodding llgo along. In February, my wife gave birth to our son, Jeremy, so naturally I've been busy. But anyway, let's talk about what has been happening in llgo. Quick, while he's sleeping!

Feature-wise, there's nothing terribly exciting going on. Without getting too boring, what's new is:

A new "go1.1" branch in the Git repository. The go1.1 branch aims to make llgo compatible with the Go tip, and will replace the master branch when Go 1.1 is released.
Removed llgo/types (a fork of the old exp/types package), and moved to go/types.
Updated runtime type representations to match those from gc's tip (thanks to minux for initiating this effort).
Updated to use architecture-specific size for "int" (same as uintptr).
Changed function representation to be a pair of pointers, to avoid trampolines/runtime code generation for closures. The rationale is the same as for rsc's proposal for Go 1.1; using runtime code generation limits the environments that Go can run in (e.g. PNaCl).
A slew of bug fixes and minor enhancements.

The go/types change in particular was not a small one, but llgo came out much better at the end. As of the most recent go/types commits, llgo now passes all of its tests in the go1.1 branch. Now I can get back to implementing features again.

That's about all there is to report. It has been suggested that I set up some milestones in the GitHub project; I will spend a bit of time coming up with what I think are the bare essentials for a 0.1 release, and what would constitute future releases and so on.

One last thing: there's a new(ish) llgo-dev mailing list. If you want to get involved, or just lurk, come and join the party.

Until next time.

Go in the Browser: llgo does PNaCl

2012-12-09T12:45:00.000+08:00

Last week I briefly reported on Google+ that I had written a Go-based Native Client module, built it with llgo, and successfully loaded it into Google Chrome. I'd like to expand on this a little now, and describe how to build and run it.

Before your start...

If you want to want to try this out yourself, then you'll need to grab yourself a copy of the Native Client SDK. I've only tested this on Ubuntu Linux 12.10 (x86-64), so if you're trying this out on a different OS/arch you may need to alter the instructions.

Anyway, grab the SDK according to the instructions on the page I linked to above. Be sure to get the devevelopment/unstable branch, by updating with the "pepper_canary" target:

$ cd nacl_sdk; ./naclsdk update pepper_canary

This is not a small download, so go and brew some tea, or just read on to see where we're going with this.

The anatomy of a PNaCl module

By now I guess you probably know whatNative Client is, but if you don't, I suggest you take a moment to read about it on the Google Developers (https://developers.google.com/native-client/) site.What may not be so well known is PNaCl, the next evolution of Native Client. PNaCl (pronounced pinnacle), is short for PortableNative Client, and is based on LLVM.

Developers continue to write their code the same as in traditional NaCl, but now it is compiled to LLVM bitcode; PNaCl restricts usage to a portable subset of bitcode so that it can then be translated to native x86, x86-64, or ARM machine code. To compile C or C++ modules to PNaCl/LLVM bitcode, one uses the pnacl-clang compiler provided with the Native Client SDK.

To make use of Native Client, one develops a module, which is an executable, that can be loaded into Google Chrome (or Chromium). A module implements certain functions specified in the Pepper API (PPAPI), which is the API that interfaces your module with the browser. One of the functions is PPP_InitializeModule, and another is PPP_GetInterface. The former provides a function pointer to the module for calling back into the browser; the latter is invoked to interrogate the module for interfacesthat it implements.

**Anacl/ppapi package for Go**

Since llgo speaks LLVM, it should be feasible to write PNaCl modules in Go, right? Right! So I set about doing this last week, and found that it was fairly easy to do. I have written a demo module which you can find here: https://github.com/axw/llgo/tree/master/pkg/nacl/ppapi, which I later intend to morph into a reusable Go package, with a proper API. I have made a lot of shortcuts, and the code is not particularly idiomatic Go; bear in mind that llgo is still quite immature, and that this is mostly a proof of concept.

Most of the code in the package is scaffolding; the example module is mostly defined in example.go, some also in ppapi.go. At the top of example.go, we instantiate a pppInstance1_1, which is a structure which defines the “Instance” interface. This interface is used to communicate the lifecycle of an instance of the module; when a module is loaded in a web page, then this interface is invoked. We care about when a module instance is created, and when it is attached to a view (i.e. the area of the page which contains the module). Note that when I say interface, I mean a PPAPI interface, not a Go interface. Later, I hope to have modules implement Go interfaces, and hide the translation to PPAPI interfaces.

The example is contrived, and quite simple; it demonstrates the use of the Graphics2D interface, which, as the name suggests, enables a module to perform 2D graphics operations. The demo simply draws repeating rectangles of different colours, animated by regularly updating the graphics context and shifting the pixels on each iteration. I would have used the standard “image” Go package, but unfortunately llgo is currently having trouble compiling it. I'll look into that soon.

Building llgo

Alright, how do we build this thing? We're going to do the following things:

Build llgo, and related tools.
Compile the PNaCl-module Go code into an LLVM module.
Link the llgo runtime into the module.
Link the ppapi library from the Native Client SDK into the module.
Translate the module into a native executable.*

*The final step is currently necessary, but eventually Chrome/Chromium will perform the translation in the browser.

Let's begin by building the llgo-disttool. This will be used to build the llgo compiler, runtime, and linker. More on each of those in a moment. Go ahead and build llgo-dist:

$ go get github.com/axw/llgo/cmd/llgo-dist

The llgo-dist tool takes two options: -llvm-config, and -triple. The former is the path to the llvm-configtool, and defaults to simply “llvm-config” (i.e. find it using PATH). The latter is the LLVM target triple used for compiling the runtime package (and other core packages, like syscall). The Native Client SDK contains an llvm-config and the shared library that we need to link with to use LLVM's C API.

As I said above, I'm running on Linux x86-64, so for my case, the llvm-config tool can be found in:

$ nacl_sdk/pepper_canary/toolchain/linux_x86_pnacl/host_x86_64/bin/llvm-config

At this point, you should put the “host_<arch>/bin” directory in your PATH, and the “host_<arch>/lib” directory in your LD_LIBRARY_PATH, as llgo currently requires it, and I refer to executables without their full paths in some cases.

The Native Client SDK creates shared libraries with the target armv7-none-linux-gnueabi, so we'll do the same. Let's go ahead and build llgo now.

$ llgo-dist -triple=armv7-none-linux-gnueabi -llvm-config=nacl_sdk/pepper_canary/toolchain/linux_x86_pnacl/host_x86_64/bin/llvm-config

We now have a compiler, linker, and runtime. As an aside, on my laptop it took about 2.5s to build, which is great! The gc toolchain is a wonderful thing. You can safely ignore the warning about “different data layouts” when llgo-dist compiles the syscall package, as we will not be using the syscall package in our example.

Building the example

Now, let's compile the PNaCl module:

$ llgo -c -o main.o -triple=armv7-none-linux-gnueabi llgo/pkg/nacl/ppapi/*.go llgo/testdata/programs/nacl/example.go

This creates a file called “main.o”, which contains the LLVM bitcode for the module. Next, we'll link in the runtime. Eventually, I hope that the “go” tool will be able to support llgo (I have hacked mine up to do this), but for now you're going to have to do this manually.

$ llgo-link -o main.o main.o $GOPATH/pkg/llgo/armv7-none-linux-gnueabi/runtime.a

Now we have a module with the runtime linked in. The llgo runtime defines things like functions for appending to slices, manipulating maps, etc. Later, it will contain a more sophisticated memory allocator, a garbage collector runtime, and a goroutine scheduler.

We can't translate this to a native executable yet, because it lacks an entry point. In a PNaCl module, the entry point is defined in a shared library called libppapi_stub.a,which is included by the libppapi.a linker script. We can link this in using pnacl-clang, like so:

$ pnacl-clang -o main.pexe main.o -lppapi

This creates a portable executable (.pexe), an executable still in LLVM bitcode form. As I mentioned earlier, this will eventually be the finished product, ready to load into Chrome/Chromium. For now, we need to run a final step to create the native machine code executable:

$ pnacl-translate -arch x86-64 -o main_x86_64.nexe main.pexe

That's it. If you want to load this in an x86 or ARM system, you'll also need to translate the pexe to an x86 and/or ARM nexe. Now we can run it.

Loading the PNaCl module into Chrome

I'm not sure at what point all the necessary parts became available in Chrome/Chromium, so I'll just say what I'm running: I have added the Google Chrome PPA, and installed google-chrome-beta. This is currently at version 24.0.1312.35 beta.

By default, Chrome only allows Native Client modules to load from the Chrome Web Store, but you can override this by mucking about in about:flags. Load up Chrome, go to about:flags, enable “Native Client”, and restart Chrome so the change takes effect. Curiously, there's a “Portable Native Client” flag; it may be that the translator is already inside Chrome, but I'm not aware of how to use it.

To simplify matters, I'm going to hijack the hello_world example in the Native Client SDK. If you want to start from scratch, refer to the Native Client SDK documentation. So anyway we'll build the hello_world example, then replace the executable with our own one.

$ cd nacl_sdk/examples/hello_world

$ make pnacl/Release/hello_world.nmf

$ cp <path/to/main_x86_64.nexe> pnacl/Release/hello_world_x86_64.nexe

Now start an HTTP server to serve this application (inside the hello_world directory):

$ python -m SimpleHTTPServer

Serving HTTP on 0.0.0.0 port 8000 ...

Finally, navigate to the following location:

http://localhost:8000/index_pnacl_Release.html

Behold, animated bars! Obviously the example is awfully simplistic, but the I wanted to get this out so others can start playing with it. I'm not really in the business of fancy graphics, so I'll leave more impressive demos to others.

Next Steps

I'll keep dabbling with this, but my more immediate goals are to complete llgo's general functionality. As wonderful as all of this is, it's no good if the compiler doesn't work correctly. Anyway, once I do get some more time for this, I intend to:

Clean up nacl/ppapi, providing an external API.
Update llgo-link to transform a “main” function into a global constructor (i.e. an “init” function) when compiling for PNaCl.
Update llgo-link to link in libppapi_stub.a when compiling for PNaCl, so we don't need to use pnacl-clang. Ideally we should be able to “go build”, and have that immediately ready to be loaded into Chrome.
Get the image package to build, and update nacl/ppapi to use it.
Implement syscall for PNaCl. This will probably involve calling standard POSIX C functions, like read, write, mmap, etc. Native Client code is heavily sandboxed, but provides familiar POSIX APIs to do things like file I/O.

If you play around with this and produce something interesting, please let me know.

That's all for now – have fun!

llgo update #10: "hello, world!" redux

2012-11-25T14:28:00.003+08:00

It's about time for another progress update on llgo. I've made decent progress recently, so let's go through what's new.

Highlights

I've been refactoring bits of code and fixing bugs aplenty, so there is a mass of noise in the git commits. In terms of new function, the news is that we now have:

Type switches.
Type assertions.
Labeled statements; goto, labeled break and continue.
The llgo-dist command; more on this below.
String conversions: to/from byte slices; from rune/int.
String range. I'm sure the implementation could be improved.
Implemented sync/atomic using LLVM atomic operations intrinsics.
Various changes to enable linking multiple packages (e.g. exported symbols are now prefixed with their package path).
Additional support for floats (thanks to spate); partial support for complex numbers.
"...args" calls to variadic functions (including slice append).
A self-contained runtime package. I have cloned (and slightly modified in some cases) the Go portion of the runtime package from gc, and combined it with the runtime code I had already written for llgo.
Bridge code for the math package, which mostly just redirects the exported functions to the internal, pure-Go implementations.
System calls (Linux/AMD64 only so far).
Closures; more below.

llgo-dist

I have begun implementing a command that takes care of building llgo, its runtime, and in the future any other tools that might be considered part of llgo (e.g. an in-development linker). This tool will set up the cgo flags given the path to an "llvm-config" program, and build gollvm.

reflect, fmt, oh my!

Last week, I mentioned on Google+ that I managed to get the reflect package working. At least enough of it to get the fmt package to work. At least enough of the fmt package to get fmt.Println("Hello, world!") to work... Yep, the holy grail of programming examples now compiles, links, and runs, using llgo. This demonstrates the following things work:

Compilation of the following packages: errors, io, math, os, reflect, runtime, strconv, sync, sync/atomic, syscall, time, unicode/utf8, unsafe.
Package imports (still using the gcimporter from exp/types.)
Linking multiple compiled packages using llvm-link.
Interfaces and reflection (fmt.Println uses reflection to determine the underlying type).
System calls (fmt.Println will eventually issue a system call to write to the stdout file).

Closures

Yes indeed, we now have closures. The code is pretty hackish, so I expect it's not very solid. I have implemented them using LLVM's trampoline intrinsics. Essentially you provide LLVM with a function that takes N parameters, give it a block of (executable) memory and an argument to bind, and it fills in the block with function code for a function with N-1 parameters (the Nth one being bound).

Unfortunately I have found that the closures are not playing nicely with lli/JIT, which means the closure unit test I have written fails. If I compile it with llc/gcc, though, it works just fine. So either I've done something subtly stupid, or the JIT is clobbering something it shouldn't. As far as I got with debugging was finding that the bound argument value is wrong when the function is entered.

I expect I'll probably replace this implementation for a couple of reasons:

Portability: I'd rather avoid platform-specific code like this. For one thing, the PNaCl ABI calls out trampoline intrinsics as being unsupported.
Testability: I should investigate the problems I observed with lli/JIT further, and I'm loath to change implementation to support tests, it is a real problem. I rely heavily on tests to make sure I haven't broken anything.

Until I find out that using trampolines has a marked benefit to performance in real programs, I intend to replace the current implementation with one that uses a pair of pointers for functions. The bound argument will stored in one pointer, and the function pointer in another. This has implications for all function calls, though it should be simple to achieve good performance in most cases.

What's next?

Haven't figured this one out yet. I have been meaning to play more with PNaCl, so I might take some time now to do that. I expect I'll be slowing down development considerably early 2013, as (a) we're knocking down our place and rebuilding, and (b) my second child is on the way. I hope to have llgo in a better state for contributions by then, so others can pick up the slack.

I expect in the near future I'll start playing with clang/cgo integration, as I start playing with PNaCl. I'll write back when I have something to demonstrate.

Until then.

llgo update, milestone

2012-09-09T18:26:00.002+08:00

In between gallivanting in Sydney, working, and organising to have a new house built, I've squeezed in a little bit of work on llgo. If you've been following along on Github, you'll have seen that things have progressed a bit since last time I wrote.

Aside from a slew of bug fixes and trivialities, llgo now implements:

Slice operations (make, append, slice expressions). I've only implemented single-element appends so far, i.e. No append(s, a, b, c, ...) or (s, abc...) yet.
Named results in functions.
Maps - creation, indexing, assignment, and deletion. The underlying implementation is just a dumb linked-list at this point in time. I'll implement it as a hash map in the future, when there aren't more important things to implement.
Range statements for arrays, slices and maps. I haven't done strings yet, simply because it requires a bit more thought into iterating through strings runes-at-a-time. I don't expect it'll be too much work.
Branch statements, except for goto. You can now break, continue, and fallthrough.
String indexing, and slicing.
Function literals. Once upon a time these were working, but they haven't been for a while. Now they are again. Note that this does not include support for closures at this stage, so usefulness is pretty limited.

Early on in the development of llgo, I decided that rather than implementing the compiler by going through the specification one item at a time, I'd drive the development by attempting to compile a real program. For this, I chose maketables, a program from the unicode standard library package. As of today, llgo can successfully compile the program. That is, it compiles that specific file, maketables.go. It doesn't yet compile all of its dependencies, and it certainly doesn't link or produce a usable program.

So now I'll be working towards getting all of the dependencies compiling, then linking. In the interest of seeing usable progress, I think I might now take a bottom-up approach and start focusing on the core libraries, like runtime and syscall. I'll report back when I have something interesting to say.

gocov, llgo update

2012-07-21T22:09:00.001+08:00

I guess it's time for a quick update. I'm not very diligent with this blogging thing; too busy having fun, programming. Sorry about that!

Introducing gocov

A couple of weeks ago I announced gocov, a coverage testing tool for the Go programming language. I wrote gocov to quickly get an idea of how broadly tested packages are (namely exp/types, which I'm working on in the background). The tool itself is written in Go, and works by source instrumentation/transformation. Currently gocov only does statement coverage.

Using gocov is relatively simple (if I do say so myself). First, you install gocov by running:

go get github.com/axw/gocov/gocov

This will install the gocov tool into your $GOPATH/bin directory. Once you have it installed, you can test a package (i.e. run its tests, and generate coverage data), by running:

gocov test <path/to/package>

Under the covers, this will run "go test <path/to/package>", after having gone through the process of instrumenting the source. Once the tests are complete, gocov will output the coverage information as a JSON structure to stdout. So you might want to pipe that output somewhere...

Once you've got the coverage information, you'll probably want to view it. So there are two other gocov commands: report, and annotate. The report command will generate a text report of the coverage of all the functions in the coverage information provided to it. For example:

gocov test github.com/axw/llgo/types | gocov report

... will generate a report that looks something like:

...
types/exportdata.go      readGopackHeader              69.23% (9/13)
types/gcimporter.go      gcParser.expect               66.67% (4/6)
types/gcimporter.go      gcParser.expectKeyword        66.67% (2/3)
...

The annotate command will print out the source for a specified function, along with an annotation for each line that was missed. For example:

gocov test github.com/axw/llgo/types | gocov annotate - types.gcParser.expectKeyword

... will output the following:

266             func (p *gcParser) expectKeyword(keyword string) {
267                     lit := p.expect(scanner.Ident)
268                     if lit != keyword {
269 MISS                        p.errorf("expected keyword %s, got %q", keyword, lit)
270                     }
271             }

As is often the case when I write software, I wrote gocov for my own needs; as such it's not terribly featureful, only doing what I've needed thus far. If you would like to add a feature (maybe HTML output, or branch coverage), feel free to send a pull request on the Github repository, and I'll take a gander.

Anyway, I hope it's of use to people. But not too many people, I don't have time to fix all of my crappy code! (Just kidding, I have no life.)

Update on llgo: interface comparisons, exp/types

I don't have a lot to report on this front, as I've been doing various other things, like that stuff up there, but I can share a couple of bits of mildly interesting news.

I've been working a little on the runtime for llgo, and I'm proud to say there's now an initial implementation of interface comparison in the runtime. This involved filling in the algorithm table for runtime types, implementing the runtime equality function (runtime.memequal), and implementing a runtime function (runtime.compareI2I) to extract and call it. It probably doesn't sound exciting when put like that, but this is something of a milestone.

By the way, if you want to actually use the runtime, you can do it like this:

Compile your program with llgo, storing the bitcode in file x.ll.
Compile llgo/runtime/*.go with llgo, storing the bitcode in file y.ll.
Link the two together, using llvm-link: llvm-link -o z.ll x.ll y.ll

And you're done. The resultant module, housed in z.ll, contains your program and the llgo runtime. Now you can concatenate strings and compare interfaces to your heart's content. Eventually llgo will contain an integrated linker, which will rewrite symbol names according to package paths.

Finally, on exp/types: I submitted my first two CL's. Some of my ideas for exp/types were ill thought out, so the first was rejected (fairly), and the second needs some rework. I'll be writing up a design proposal document at some stage, to better document my rationale for changes. Anyway, I'll keep plugging away...

Ade!

Unit-testing llgo's runtime

2012-06-03T15:53:00.001+08:00

It's been a while since I last wrote, primarily because I've been moving house and was without Internet at home during the process. It's back now, but now I have Diablo III to contend with.

In my previous post I mentioned that I would create a new branch for working on the llgo runtime. I haven't done that yet, though I haven't broken the build either. Rather, I've introduced conditional compilation to gollvm for builds against LLVM's trunk where unreleased functionality is required, e.g. LinkModules. This isn't currently being used in llgo-proper, so I've gotten away without branching so far.

The tag for building gollvm with unreleased functions is "llvmsvn", so to build gollvm with LLVM's trunk, including the LinkModules function, do the following:

curl https://raw.github.com/axw/gollvm/master/install.sh -tags llvmsvn | sh

So I didn't break "the build", meaning you can still build gollvm/llgo without also building LLVM from source. I did, however, break the llgo unit tests, as they are using the new LinkModules function. If you want to run the unit tests without building LLVM from source, then you can comment out the call to llvm.LinkModules in llgo/utils_test.go; of course, you should expect failures due to the runtime not being linked in, but that doesn't involve all tests.

What else is new?

I announced on golang-dev a couple of weeks ago that I intend to work on getting exp/types up to snuff. I've moved some of the type construction code out of llgo-proper into llgo/types (a fork of exp/types), and eliminated most of the llgo-specific stuff from llgo/types. I'll need to set aside some time soon to learn how to use Mercurial and create some changelists.

A few weeks ago I started playing with llgo and PNaCl, to see how hard it would be to get something running in Chrome. It works (with the IR Translator/external sandbox anyway), but then llgo doesn't really do much at the moment.

That's all for now.

An llgo runtime emerges

2012-04-28T15:44:00.002+08:00

It's been a long time coming, but I'm now starting to put together pieces of the llgo runtime. Don't expect much any time soon, but I am zeroing in on a design at least. The sleuths in the crowd will find that only string concatenation has been implemented thus far, which is pretty boring. Next up, I hope, will be interface-to-interface conversions, and interface-to-value conversions, both of which require (for a sane implementation) a runtime library.

I had previously intended to write the runtime largely in C, as I expected that would be the easiest route. I started down this road writing a basic thread creation routine using pthread, written in C. The code was compiled using Clang, emitting LLVM IR which could be easily linked with the code generated by llgo. It's more or less the same idea implemented by the gc Go compiler (linking C and Go code, not relying on pthread). Even so, I'd like to write the runtime in Go as much as possible.

Why write the runtime in Go? Well for one, it will make llgo much more self contained, which will make distribution much easier since there won't be a reliance on Clang. Another reason is based on a lofty, but absolutely necessary goal: that llgo will one day be able to compile itself. If llgo compiles itself, and compiles its own runtime, then we have a great target for compiler optimisations: the compiler itself. In other words, "compiler optimisations should pay for themselves."

In my last post I mentioned that LLVM 3.1 is coming up fast, and this release has the changes required by llgo. Unfortunately, I've just found that the C API lacks an interface for linking modules, so I'm going to have to submit a patch to LLVM again, and the window for inclusion in 3.1 has certainly passed. Rather than break gollvm/llgo's trunk again, I'll create a branch for work on the runtime. I'll post again when I've submitted a patch to LLVM, assuming the minor addition is accepted.

llgo update: Go1, automated tests

2012-04-08T21:44:00.002+08:00

This week I finished up Udacity CS373: Programming a Robotic Car, and also finally finished reading GEB. So I'll hopefully be able to commit some more time to llgo again.

I moved on to Go's weekly builds a while back, and updated both llgo and gollvm to conform. I'm now on Go 1, as I hope most people are by now, and llgo is in good shape for Go 1 too. That's not to say that it compiles all of the Go 1 language, just that it runs in Go 1. Apart from that, I've just been working through some sample programs to increase the compiler's capability.

One of the things that I've been a bit lazy about with llgo is automated testing, something I'm usually pretty keen on. I've grown anxious over regressions as time has gone on in the development, so I've spent a little bit of time this week putting together an automated test suite, which I mentioned in golang-nuts a few days ago. The test suite doesn't cover a great deal yet, but it has picked up a couple of bugs already.

One of the numerous things I like about Go is its well integrated tooling. For testing, Go provides the testing package, and go test tool. So you write your unit tests according to the specifications in the "testing" package, run "go test", and your tests are all run. This is comparable to, say, Python, which has a similar "unittest" package. It is vastly more friendly than the various C++ unit test frameworks; that's in large part due to the way the Go language is designed, particularly with regard to how it fits into build systems and is parsed.

In Go, everything you need to build a package is in the source (assuming you use the "go" command).

The only external influences on the build process (environment variables GOOS, GOARCH, GOROOT, etc.) apply to the entire build procedure, not to single compilation units. Each variant will end up in a separate location when built: ${GOPATH}/pkg/${GOOS}_${GOARCH}/<pkgname>.
Platform-specific code is separated into multiple files (xxx_linux.go, xxx_windows.go, ...), and they're automatically matched with the OS/architecture by the "go" command.
Package dependencies are automatically and unambiguously resolved. Compare this with C/C++ headers, which might come from anywhere in the preprocessor's include path.

So anyway, back to llgo's testing. It works just like this: I've created a separate program for each test case in the llgo/llgo/testdata directory. Each of these programs corresponds to a test case written against the "testing" package, which does the following:

Run the program using "go run", and store the output.
Redirect stdout to a pipe, and run a goroutine to capture the output to a string.
Compile the program using llgo's Compile API, and then interpret the resultant bitcode using gollvm's ExecutionEngine API.
Restore the original stdout, and compare the output with that of the original "go run".

Pretty obvious I guess, but I was happy with how easy it was to do. Defer made the job of redirecting, restoring and closing file descriptors pain free; the go statement and channels made capturing and communicating the resulting data a cinch.

This is getting a little ramble-ish, so I'll finish up. While testing, I discovered a problem with the way LLVM types are generated from types.Type's, which basically means that they need to be cached and reused, rather that generated afresh each time. At the same time I intend to remove all references to LLVM from my clone of the "types" package, and offer my updates back to the Go team. It's not fully functional yet, but there's at least a few gaps that I've filled in.

One last thing: LLVM 3.1 is due out May 14, so gollvm and llgo will no longer require LLVM from SVN. I really want to eliminate the dependency on llvm-config from the build of gollvm. I'm considering a dlopen/dlsym shim and removing the cgo dependency on LLVM. I'd be keen to hear some opinions, suggestions or alternatives.

Until next time.

Imports in llgo, jr.

2012-02-19T13:50:00.001+08:00

So I realised I'm a doofus the other day, when I started getting closer to completion on producing export metadata in llgo. Rolling my own import mechanism is unnecessary for now. Instead, I can just lean on the import mechanism that exists in the standard library (well, until Go 1 at least): go/types/GcImporter.

I've modified llgo to use go/ast/NewPackage, rather than the old code I had that was using go/parser/ParseFiles. The NewPackage function takes an optional "importer" object which will be used for inter-package dependency resolution, whereas ParseFiles does no resolution. The standard GcImporter type may be used to identify exports by interrogating the object and archive files in $GOROOT. The AST that's generated is filled in with external declarations, so it's then up to llgo to convert those into LLVM external declarations. Easy peasy.

Now it's time to come up with a symbol naming scheme. Without having thought about it too hard, I'm going to start off with the assumption that the absolute name of the symbol (package+name), with slashes converted to dots, will do the trick. Once I've implemented that, I'll need to start work on the runtime in earnest. It's also high time I put some automated tests in place, since things are starting to get a little stabler.

In the long term I'll probably want to continue on with my original plan, which is to generate module-level metadata in the LLVM bitcode, and then extract this in a custom importer. It should be quite straightforward. Earlier this week I wrapped up some updates to gollvm to add an API to make generating source-level debugging metadata simpler. This will be useful not only for describing exports, but also for what it's intended: generating DWARF debug information.

In other news: my wife just ordered the 1-4a box set of The Art of Computer Programming for me. At the moment I am slowly making way through Gödel, Escher, Bach: an Eternal Golden Braid, and so far, so good. Looking forward to more light reading for the bus/train!

llgo: back in business

2012-02-11T21:05:00.000+08:00

I've been hacking away on llgo, on evenings and weekends when I've had the chance. It's now roughly equivalent in functionality to where it was before I upgraded LLVM to trunk (3.1) and broke everything. There's a couple of added bonuses too, like proper arbitrary precision constants, and partial support for untyped constants.

Now that the basics are working again, I'll get back to working on the import/export mechanism. I expect this will expose more design flaws, and will take a while. I still plan to make use of debug metadata, which I am not altogether familiar with. I'll also need to decide how the linker and the runtime library are going to work.

In other news, I've moved Pushy to GitHub. I'm not actively developing it at the moment, but I wanted to consolidate the services I'm consuming. I do have an addition to the Java API in the works: a sort of remote classloader, that will communicate over a Pushy connection to fetch classes/resources. The idea is to make it really quick and easy to run a bit of Java code on a remote machine, without having to deploy the application remotely. I'll hopefully get around to pushing this change within the coming few weeks.

cmonster 0.2 released

2012-01-07T21:20:00.000+08:00

Last week I announced a new version of cmonster (now version 0.2) on the Clang mailing list. I've finally updated the Github page for cmonster with some basic examples, and installation instructions.

I asked on the Clang mailing list for some feedback, but so far all I'm hearing is crickets. I'm surprised that nobody's interested enough to reply, but I'll freely admit that I'm not particularly good at marketing. If you do check it out, let me know what I can do to make it useful for you.

In other news: I haven't spent much time on llgo recently, what with Real Life happening all the time. I sent a patch into LLVM to add improved support for named metadata, which was accepted. I also made a bunch of fixes to gollvm so that it builds and works with LLVM 3.0 (and some additional changes to work with trunk).

There's been some changes to LLVM that mean I can no longer abuse the metadata system by attaching metadata to values. Metadata can now be specified only on instructions. This means that I can no longer attach type information to values using metadata, nor identify function receivers in a similar way. So llgo will need to maintain type and receiver (amongst other) information outside of LLVM's values. This was always going to be necessary, I'd just been putting it off to get something that worked.

I hope to get back to making inroad with llgo soon. I feel like I made pretty good progress on implementing my crazy ideas in 2011. Here's hoping 2012 works out as well.

Imports in llgo

2011-12-03T12:26:00.001+08:00

It's been a while. I've implemented bits and pieces of the Go language: constants (though not properly handling arbitrary precision numbers at this stage), structs, functions (with and without receivers, as declarations and as literals), and goroutines. Much of the implementation is simplistic, not covering all bases. I want to get something vaguely useful working before I go down the path to completeness.

I intended to wait until I had something interesting to show off, but unfortunately I've hit a snag with LLVM.

Debug information might sound luxurious, but it's how I'm intending to encode information about exported symbols, so I can implement imports. The gc compiler creates a header in archives that lists all of this information. LLVM provides a standard set of metadata for describing debug information, using DWARF descriptors.

So I figured I could build on the LLVM metadata, extending it to describe Go-specific information where necessary. The benefit here, aside from reusing an existing well-defined format, is that I'd get DWARF debugging information in my output binaries, something I'd eventually want anyway. Unfortunately it appears that there's no C API for generating named metadata.

I'll have a look at extending the LLVM C API now. Down the rabbit hole...

Update - 7 January 2012
I submitted a patch to LLVM to add support for adding named metadata to a module, which has been accepted. This will, presumably, be part of LLVM 3.1.

Writing a Compiler, or Learning a Language the Hard Way (Part Deux)

2011-11-07T11:49:00.001+08:00

A couple of weeks ago I wrote that I would be attempting to write a Go compiler using the LLVM compiler infrastructure. The blog post got a bit of attention after being linked on Lamer News (and a bit more attention than intended!), so I guess there's at least a couple of people out there interested in hearing whether anything comes of this effort.

I've been a mixture of sick and busy (maybe a little bit slack?), but this weekend I finally found some time to put into my shiny new Go compiler. Yesterday I buffed the corners of what I had written so far, and pushed it to github. Behold, llgo! llgo does not currently compile much of the language, and it only compiles completely standalone programs (imports aren't handled). I'll be adding in features as I get time, but this is largely an educational endeavour for me, so I'm not really in a rush.

Let's look briefly at how it's put together. llgo is written in Go, and uses the standard Go parser/ast packages. For code generation, the gollvm (Go bindings for LLVM) package is used. So it's just a matter of calling the parser to generate an AST, then walking the AST and generating LLVM instructions. The latter part is what I'm most new to, and I'm just starting to get to grips with LLVM and SSA (Single Static Assignment), and their nuances. It's mostly pretty straightforward though.

Forward, ho!

There's plenty of things to implement, but there's a bunch of big ticket items that I'm particularly interested in solving. They are, in no particular order:

Imports.
Interfaces.
Goroutines and channels.
Deferred function calls.
Closures.
cgo
Garbage Collection

Imports

If you were to attempt to compile a program with llgo - "Hello, World!", say - then I'm sure you'd find one giant gaping hole in the lack of support for imports. So you wouldn't be able to import fmt and do fmt.Println. Actually I have implemented the println builtin, but that's beside the point. The module system is pretty fundamental, so I'll have to address this soon.

The picture I have in my head is that each package will compile to (ideally) machine-independent LLVM bitcode libraries, which will go somewhere in the $GOROOT/pkg hierarchy. Just as Go examines archives to determine what they export, so llgo will load and examine the modules defined by the bitcode.

Somewhat related to imports is the runtime. I dare say that most of the imports people will ever do will be importing standard libraries, which will at some time or another use the Go runtime (e.g. reflection, and string comparisons, system calls). So I'll have to start thinking seriously about which parts of the runtime I'll be able be able to reuse, and which parts I'll rewrite.

Interfaces

In my previous blog post I talked about the idea of pseudo duck-typing in a statically compiled language ("implicit interfaces"), so this feature has a special place in my heart. I have some ideas of how to implement them, but I'll have to implement user-defined types first.

Goroutines and Channels

I'm not going for efficiency at this stage, I'm just going for functionality. So I intend, initially, to implement goroutines 1-1 with respect to threads. The gc compiler/runtime does M-N with a preemptive application-level scheduler; I think gccgo still does 1-1. I also do not intend, initially at least, to implement split stacks. These things can well be considered functionality, especially since the language intends to make creating many goroutines inexpensive and efficient. I have to prioritise, though, so I'll tackle efficiency and scalability later.

I've implemented channel-like data structures in C++ before, so I don't expect that to be too difficult. I'll just start out with a simple mutex/condition-based data structure, with or without an underlying FIFO array/queue depending on whether or not the channel is buffered.

Deferred Function Calls

As a general rule, I'm trying to do this... not clean room, but not just by copying what's done in gc/gccgo either. After all, I'm trying to teach myself something here, and I find doing things is a great way of learning for me. Sometimes things go awry, but then I know what not do next time. It also serves as a good background when reading about how others have solved the problem.

Russ Cox wrote an interesting article on how deferred function calls are implemented in Go. Actually the article was nominally about how breaking abstractions can be a good thing, and I tend to agree. LLVM adds another level of abstraction, which means some of these functions don't end up being quite as efficient as when they're implemented directly in assembler or machine code.

Closures

If you're not familiar with this term, it's essentially a function that has captured some variables in its defining environment. In Go you can define function literals, which are anonymous functions. If a function literal refers to a variable defined in the outer scope, then the function will be defined as a closure.

I had been wondering about how I would go about implementing closures in llgo. I was thinking, broadly, that I would need to store the variables in memory alongside the function code. How would I do this in LLVM? I could treat closures differently, representing them as a structure containing the captured variables, and a pointer to the function, which would need to have additional arguments defined to accept the captured variables. Then the caller of the closure would have to treat it differently to a simple function. This seems a bit ugly, so I wondered, how does gc do it?

The gc implementation is very elegant: it allocates memory on the heap, with PROT_EXEC enabled, and stores dynamically generated machine code in it. At the end of the code, the captured variables are stored. The machine code loads the variables from memory onto the stack for use in the closure. Elegant, but how can we do that in LLVM?

LLVM abstracts away the details of the underlying target system, which means you deal with an intermediate representation of instructions rather than machine-specific instructions. We could dynamically generate code using LLVM, but that would mean requiring LLVM in every program, which seems a bit heavyweight. Or we could just reuse the code from gc, since it's pretty well contained, but that means adding in platform-specifics where there were none before. I think that's a better solution, but I might have to see what other LLVM-based compilers have done. I guess the Glasgow Haskell Compiler might be a good first place to look.

cgo

This one has the potential to be quite interesting. There's already a mature LLVM-based C compiler: clang. So llgo could potentially leverage it to implement a cgo equivalent. Both clang and llgo will emit bitcode; llgo (or the cgo equivalent) will analyse the bitcode emitted from clang, and determine the available functions and types.

Garbage Collection

I know conceptually how mark & sweep algorithms work, but I have never implemented one, nor even analysed an implementation. LLVM provides some support for garbage collection, which will presumably make things easier.

Over and Out

Rather than continuing to talk about doing stuff, I'm going to go and do stuff. If you're interested in following my progress, I suggest that you watch llgo on github.

Without further ado, adieu.

Writing a Compiler, or Learning a Language the Hard Way

2011-10-23T15:16:00.000+08:00

For a while now I've been lamenting the lack of progress in systems programming languages. I work primarily in C++ (professionally, anyway), a language which leaves a lot to be desired. The end result, assuming you know what you're doing, is likely to be fast code. But getting there can be a tedious and precarious journey. Don't get me wrong, I'm not a C++ hater. I just think we can do better.

So for the past few I've been storing away ideas for writing my own language. (Groan, another one.) The vast majority of languages these days seem to be JVM based, or some other VM such as the CLR. This is fine for many applications, but for a lot of systems programming you typically want to be much closer to the hardware. What I want is something with the ease and rapidity of development of, say, Python, with the power of a lower-level language such as C/C++.

The Easy: Ideas

So what are those ideas I'd stored away then? Well to be honest, many of them boil down to syntactic sugar (syntactic caramel?), so here are the more functional ones:

Implicit Interfaces. The idea is that classes should be able to implement an interface without explicitly saying so, just by implementing the methods defined in the interface. Poor-man's duck-typing if you will.
Link Time Optimisation (LTO) should be the norm. Say you write a function in a library which does everything and the kitchen sink. If the calling program only uses a function from your library in a certain way, then the function should be optimised for that usage.
Pure Functions. This kind of fits under LTO. I want simpler meta-programming: it should be possible to mark functions as being "pure", which means (in my terminology) that the function either has no side-effects, or affects only temporary variables. Calls to these functions could then be evaluated at compile time, e.g. to calculate complex mathematical expressions. I guess this just comes under "fancy optimiser"?
Pattern Matching. I haven't really fleshed this one out, but I think it would be handy to be able to match functions not just based on signature, but based on the values of the arguments.
No preprocessor, no headers. When code is distributed, the modules should be self describing, such as with Java's compiled .class files. This would eliminate problems such as compiling against an old version of an interface, and linking against a new, incompatible implementation.
For the love of God, no garbage collection. What can I say? I like my RAII.
I'm not really sure what title to give this one, but I wanted to eliminate the problems you get with linking objects together that have been compiled with different options in C++ (e.g. with/out RTTI, exceptions, or perhaps against an entirely different STL/standard library.)

There were some other requirements I had in mind, such as having a single reference implementation à la CPython, and including a parser with the runtime to simplify tool development.

The Hard: Writing a Compiler

Anyone can come up with ideas. Even coming up with practical ideas isn't too hard, but implementing them is something else altogether. Especially so for someone like me who has had no formal education in compilers, or language design/implementation. Hell, my university course didn't even cover formal grammars or parsing. I've picked some things up by myself, but it's a very academic field.

Back when I was in uni, I toyed around with C--, which at the time looked like quite a promising "portable assembly language". This language is (or at least was) used as the intermediate language for the Glasgow Haskell Compiler. My self-taught knowledge of language design/implementation and code generation were not really up to scratch, so I didn't get much further than writing toy languages.

Fast forward a few years, and I've got the itch to implement a compiler again. I've recently been playing around with LLVM. It boasts a relatively easy to use API, which takes out some of the drudgery of writing the "bitcode" (an intermediate code that can be translated to machine code). I've got a bit more knowledge and programming skill behind my belt now, so let's get to work on that language, right!?

I read something recently which is summarised as: learn intimately that which you wish to replace. Now I don't have any illusions of replacing any existing language, but I think the message is still relevant. I've never implemented a language OR a compiler before, and now I'm going to both at once? How about I make this a bit easier on myself, and write a compiler for an existing language. If I still want to implement my own language later, I can use the experience I gained.

I've been looking for an excuse for a while now to learn the Go Programming Language. It has some of the same design goals as my hypothetical language, so it seemed like a good fit.

The Fun: Learning Go and LLVM

I've been learning Go in my precious spare time for the last couple of weeks. There's a nifty web-app which provides an interactive tutorial in Go, called A Tour of Go. It's a bit of fun, and I recommend it to anyone wanting to learn the language.

So anyway, I'm now playing around with writing a Go compiler using LLVM. Why?

To learn LLVM, and more generally how to write a compiler and generate code.
To learn Go.
Potentially to fill a gap in JIT compilation of Go programs.
Why not?

Writing the compiler will be made much easier by the fact that the Go runtime includes a Go parser, and someone has already implemented Go bindings for LLVM. I haven't made a great deal of progress yet, but it seems achievable. When I've got something vaguely useful, I'll chuck it over on GitHub.

If you know anything about Go, you'll probably have noticed that at least one of the ideas that I presented above is present in Go: implicit interfaces. I really can't say whether this concept is new or not - it's probably not - but I did at least come up with it independently. Just sayin'!

I'll write a follow-up post when I get some time, describing a bit more about the Go compiler, the challenges I've come across, and my thoughts on how to solve them.

C++ Source-to-Source Translation

2011-10-07T12:01:00.001+08:00

I've been on annual leave this week, so I've taken the opportunity to do some work on cmonster. I've added preliminary support for source-to-source translation by introducing a wrapper for Clang's "Rewriter" API. My fingers have been moving furiously so it's all a bit rough, but it does work.

The API flow is:

Parse translation unit, returning an Abstract Syntax Tree (AST).
Walk AST to find bits of interest.
Insert/replace/erase text in the original source, using the location stored in each declaration/statement/token.

Motivating Example

Logging is a necessary evil in complex software, especially when said software is running on a customer's system, inaccessible to you. To make problem determination easier, we want a decent amount of information: file names, line numbers, function names, time of day, thread ID, ... but all of this comes at a cost. I'm not talking just cost in terms of CPU usage, though that is a major concern. I'm talking cost in terms of source code quality and maintainability.

We'll start off with a trivial C program:

int main(int argc, const char *argv[])
{
    if (argc % 2 == 0)
    {
        return 1;
    }
    else
    {
        return 0;
    }
}

Let's say our needs are fairly humble: we just want to log the entry and exit of this function. Logging entry is easy: add a blob of code at the top of the function. We can get the function name and line number using __func__ (C99, C++11) and __LINE__. What about __func__ in C89? C++98? There's various alternatives, but some compilers have nothing. And that makes writing a cross-platform logging library a big old PITA. The information is there in the source code - if only we could get at it! In the end, we're more likely to forego DRY, and just reproduce the function name as a string literal.

Getting the function name and line number isn't a huge problem, but how about adding function exit logging? Now we're going to have to insert a little bit of code before our return statements. So we'll have something like:

int main(int argc, const char *argv[])
{
    const char *function = "main";
    printf("Entering %s:%s:%d\n", function,
           __FILE__, __LINE__);
    if (argc % 2 == 0)
    {
        printf("Leaving %s:%s:%d\n", function,
               __FILE__, __LINE__);
        return 1;
    }
    else
    {
        printf("Leaving %s:%s:%d\n", function,
               __FILE__, __LINE__);
        return 0;
    }
    return 0;
}

Ugh. And that's just the start. It gets much nastier when we need to turn logging on/off at runtime, filter by function name, etc. We could make it much nicer with a variadic macro. Something like LOG(format...), which calls a varargs function with the 'function' variable, __FILE__, __LINE__ and the format and arguments you specify. Unfortunately variadic macros are not supported by some older compilers. The first version of Visual Studio to support them was Microsoft Visual Studio 2005. So there goes that idea...

Hmmm, what to do, what to do? Wouldn't it be nice if we could just tag a function as requiring entry/exit logging, and have our compiler toolchain to the work? Entry/exit logging is the sort of thing you want to be consistent, so it should suffice to define one set of rules that covers all functions. Let's take a little peek at what we could do with cmonster.

First, we'll parse the source to get an AST. We'll locate all functions defined in the main file, and insert an "Entry" logging statement at the beginning of the body, and an "Exit" logging statement before each return statement in the body. At the end we dump the rewritten source to stdout, and we have a program, with logging, ready to be compiled.

Tada! Running this, we're given:

#include <stdio.h>
int main(int argc, const char *argv[])
{
    printf("Entering main at line 2\n");
    if (argc % 2 == 0)
    {
        printf("Returning from main at line 6\n");
        return 1;
    }
    else
    {
        printf("Returning from main at line 10\n");
        return 0;
    }
}

Future Work

What we can't do yet is insert, replace, erase or modify declarations or statements directly in the AST, and have that reflected as a text insertion/replacement/erasure. For example, maybe I want to rename a function? Why can't I just go "function_declaration.name = 'new_name'". At the moment we'd need to replace the text identified by a source range... a bit clunky and manual. So I may add a more direct API in at a later stage. It should be doable, but may be a lot of work.

Also, the Visitor class defined in the above example could be called minimal at best. If there were any statements in the code that weren't handled by our visitor, the translation program would barf. I'll eventually build a complete Visitor class into cmonster to be reused. This should make writing translation tools a breeze; in our example, we would just override "visit_ReturnStatement" in the visitor.

Now, I think it's about time I learnt Go.

cmonster update

2011-09-25T14:30:00.001+08:00

I've been quietly beavering away at cmonster, and thought I should share an update.

Note: the changes I describe below are not yet part of a cmonster release. I'll release something once I've stabilised the API and tested it more thoroughly. In the mean time you can pull the source from github.

In the last month or so I have been adding C/C++ parsing capabilities to cmonster, by exposing the underlying Clang parser API. I've been wanting a simple, scriptable framework for analysing and manipulating C++ source for a few years now. The reason for wanting such a thing is so that I, and others, can more rapidly develop tools for writing better C++, and eliminate some of the drudgery. I've only just made a start, but cmonster now provides an API for parsing a C/C++ source file, returning a Python object to inspect the result.

So what's cmonster looking like now? We now have a preprocessor and a parser interface, the former now being incorporated into the latter. The parser interface will parse a single source file, and return its Abstract Syntax Tree (AST). As is typical with parsers, there are many classes involved to describe each declaration, statement, type, etc. So I've added Cython into the mix to speed up the process of defining Python equivalents for each of these classes.

Unfortunately Cython does not yet support PEP 384 (Py_LIMITED_API), so at the moment cmonster is back to requiring the full Python API, and thus must be rebuilt for each new release. I've had a tinker with Cython to get its output to compile with Py_LIMITED_API, and hope to provide a patch in the near future.

What's next? Once I get the AST classes mapped out, I intend to introduce a source-to-source translation layer. I'm not entirely sure how this'll work yet, but I think ideally you'd just modify the AST and call some function to rewrite the main source file. Elements of the AST outside of the main source file would be immutable. That's the hope, but it may end up being something a little more crude, using Clang's "Rewriter" interface directly to replace source ranges with some arbitrary string. I expect this will be a ways off yet, though.

Hello, Mr. Hooker.

2011-09-05T22:04:00.005+08:00

I've been procrastinating on cmonster. I have some nasty architectural decisions to make, and I keep putting it off. In the mean time I've been working on a new little tool called "Mr. Hooker" (or just mrhooker).

Introducing Mr. Hooker

The idea behind mrhooker is very simple: I wanted to be able to write LD_PRELOAD hooks in Python. If you're not familiar with LD_PRELOAD, it's a mechanism employed by various UNIX and UNIX-like operating systems for "preloading" some specified code in a shared library. You can use this to provide your own version of native functions, including those in standard libraries such as libc.

Anyway, I occasionally find the need for an LD_PRELOAD library to change the behaviour of a program that I can't easily recompile. Often these libraries will be throw-away, so it might end up taking just as long to write the LD_PRELOAD library. So I wrote mrhooker to simplify this.

It turns out there's very little to do, since Cython (and friends) do most of the hard work. Cython is a programming language that extends Python to simplify building Python extensions. It also has an interface for building these extensions on-the-fly. So mrhooker doesn't need to do much - it takes a .pyx (Pyrex/Cython source) and compiles it to a shared library using Cython. Mrhooker takes this, and some common code, and loads it into a child process using LD_PRELOAD.

Example - Hooking BSD Sockets

Let's look at an example of how to use mrhooker. Hooks are defined as external functions in a Cython script. Say we want to hook the BSD sockets "send" function. First we'd find the signature of send (man 2 send), which is:

ssize_t send(int sockfd, const void *buf, size_t len, int flags);

Given this, we can produce a wrapper in Cython, like so:

cdef extern ssize_t send(int sockfd, char *buf, size_t len, int flags) with gil:
    ...

There's a couple of important things to note here. First, the parameter type for "buf" drops const, since Cython doesn't know about const-ness. Second, and crucially, the function must be defined "with gil". This ensures that the function acquires the Python Global Interpreter Lock before calling any Python functions. Okay, with that out of the way, let's go on...

We'll want to do something vaguely useful with this wrapper. Let's make it print out the argument values, and then continue on with calling the original "send" function. To do that we'll use dlsym/RTLD_NEXT to find the next function called "send".

cdef extern ssize_t send(int sockfd, char *buf, size_t len, int flags) with gil:
    print "====> send(%r, %r, %r, %r)" % (sockfd, buf[:len], len, flags)
    real_send = dlsym(RTLD_NEXT, "send")
    if real_send:
        with nogil:
            res = (<ssize_t(*)(int, void*, size_t, int) nogil>real_send)(
                sockfd, buf, len, flags)
        return res
    else:
        return -1

We'll also need to declare dlsym and RTLD_NEXT. Let's do that.

# Import stuff from <dlfcn.h>
cdef extern from "dlfcn.h":
    void* dlsym(void*, char*)
    void* RTLD_NEXT

Now you just run:

mrhooker <script.pyx> <command>

And there we go. This is trivial - it would also be fairly trivial to write a C program to do this. But if we wanted to do anything more complex, or if we were frequently changing the wrapper function, I'd much rather write it in Python - or Cython, as it were.

Enjoy!

Edit: I just noticed that it's broken if you don't have a certain config file. I always had one while testing... until I got to work.
You'll get an error "ConfigParser.NoSectionError: No section: 'default'". I'll fix the code at home, but in the mean time you can do this:

$ mkdir ~/.mrhooker
$ echo [default] > ~/.mrhooker/mrhooker.config

P.S. if you add "build_dir = <path>" in that section, or a per-module section, mrhooker/Cython will store the shared library that it builds. Then if you don't change the source it'll be used without rebuilding.

Google App Engine Agent

2011-09-01T23:01:00.000+08:00

A couple of months ago I wrote about my foray into the world of Google App Engine. More recently, I'd gotten the itch again, and had some ideas of how to fix the problems I found when attempting to get Pushy to work in Google App Engine.

The root of most of the problems is that Google App Engine is stateless in nature. Server instances can be spun up or spun down without notice, and so we can't store complex state, which Pushy really requires. So a couple of weeks ago I set to investigating implementing a server-initiated RPC mechanism that is asynchronous and (mostly) stateless.

How would it work? Well, earlier this year I read that ProtoRPC was released, which brought RPC services to Google App Engine. In our case, Google App Engine is the client, and is calling the agent - but we can at least reuse the API to minimise dependencies and hopefully simplify the mechanism. Okay, so we have a ProtoRPC service running on a remote machine, consumed by our Google App Engine application. How do they talk?

One thing I wanted to avoid was the need for polling, as that's both slow and expensive. Slow in that there will necessarily be delays between polls, and expensive in that unnecessary polls will burn CPU cycles in Google App Engine, which aren't free. Long-polling isn't possible, either, since HTTP requests are limited to 30 seconds of processing time. If you read my last post, you probably already know what I'm going to say: we'll use XMPP.

What's XMPP? That's the Extensible Messaging and Presence Protocol, which is the protocol underlying Jabber. It is also the primary protocol that Google Talk is built on. It's an XML-based, client-server protocol, so peers do not talk directly to each other. It's also asynchronous. So let's look at the picture so far...

The client (agent) and server (GAE application) talk to each other via XMPP.
The agent serves a ProtoRPC service, and the GAE application will consume it.

Because our RPC mechanism will be server-initiated, we'll need something else: agent availability discovery. Google App Engine provides XMPP handlers for agent availability (and unavailability) notification. When an agent starts up it will register its presence with the application. When agent is discovered, the application will request the agent's service descriptor. The agent will respond, and the application will store it away in Memcache.

We (ab)use Memcache for sharing of data between instances of the application. When you make enough requests to the application, Google App Engine may dynamically spin up a new instance to handle requests. By storing the service descriptor in Memcache, it can be accessed by any instance. I said abuse because Memcache is not guaranteed to keep the data you put in it - it may be expelled when memory is constrained. Really we should use Datastore, but I was too lazy to deal with cleaning it up. "Left as an exercise for the reader." One thing I did make a point of using was to use the new Python Memcache CAS API, which allows for safe concurrent updates to Memcache.

Orrite. So now we have an agent and application which talk to each other via XMPP, using ProtoRPC. The application discovers the agent, and, upon request, the agent describes its service to the application. How can we use it? Well the answer is really "however you like", but I have created a toy web UI for invoking the remote service methods.

Wot 'ave we 'ere then? The drop-down selection has all of the available agent JIDs (XMPP IDs). The textbox has some Python code, which will be executed by the Google App Engine application. Yes, security alert! This is just a demonstration of how we can use the RPC mechanism - not a best practice. When you hit "Go!", the code will be run by the application. But before doing so, the application will set a local variable "agent", which is an instance of the ProtoRPC service stub bound to the agent selected in the drop-down.

ProtoRPC is intended to be synchronous (from the looks of the comments in the code, anyway), but there is an asynchronous API for clients. But given that application requests can only take up to 30 seconds to service a request, our application can't actively wait for a response. What to do? Instead, we need to complete the request asynchronously when the client responds, and convey some context to the response handler so it knows what to do with it.

In the demo, I've done something fairly straight forward with regards to response handling. When the UI is rendered, we create an asynchronous channel using the Channel API. We use this to send the response back to the user. So when the code is executed, the service stub is invoked, and the channel ID is passed as context to the client. When the client responds, it includes the context. Once again, security alert. We could fix security concerns by encrypting the context to ensure the client doesn't tamper with it. Let's just assume the client is friendly though, okay? Just this once!

So we finally have an application flow that goes something like this:

Agent registers service.
Server detects agent's availability, and requests client's service descriptor.
Client sends service descriptor, server receives and stores it in Memcache.

and then...

User hits web UI, which server renders with a new channel.
User selects an agent and clicks "Go!".
Server instantiates a service stub, and invokes it with the channel ID as context. The invocation sends an XMPP message to the agent.
Agent receives XMPP message, decodes and executes the request. The response is sent back to the server as an XMPP message, including the context set by the server.
The server receives the response, and extracts the response and channel ID (context). The response is formatted and sent to the channel.
The web UI's channel Javascript callback is invoked and the response is rendered.

Fin

I've put my code up on GitHub, here: http://github.com/axw/gaea. Feel free to fork and/or have a play. I hope this can be of use to someone. If nothing else, I've learnt a few new tricks!

It's Alive!

2011-08-04T15:56:00.001+08:00

I've been quietly hacking away on cmonster (née csnake). I like this name even more: I think it describes my creation better. If you thought preprocessors were horrible before, well...

What is cmonster?

cmonster is a C preprocessor with a few novel features on top of the standard fare:

Allows users to define function macros in Python, inline.
Allows users to define a callback for locating #include files, when the file can not be found in the specified include directories.
Provides a Python API for iterating over tokens in the output stream.

cmonster is built on top of Clang, a modern C language family compiler, which contains a reusable, programmable preprocessor. At present, cmonster requires Clang 3.0 APIs, which has not yet been released. I have been working off Clang's subversion trunk.

I have just uploaded a binary distribution of the first alpha version (0.1) of cmonster to pypi. I have only built/tested it on Linux 32-bit, Python 3.2, and I don't expect it will work on anything else yet. If you want to play around with it, you can install cmonster using "easy_install cmonster" or by grabbing it off pypi and installing it manually.

Demonstration

Seeing is believing - how does this thing work? We'll ignore everying except for inline Python macros for now, because that's the most stable part of the API.

It is possible to define macros inline in cmonster, using the builtin "py_def" macro. For example:

py_def(factorial(n))
    import math
    return str(math.factorial(int(str(n))))
py_end

When cmonster sees this, it will grab everything up to "py_end", and define a Python function. It will also create a preprocessor macro with the function's name (as given in the py_def heading), and this macro will be directed to call the Python function. The Python function will be passed the argument tokens that the macro was invoked with. It can return either a sequence of tokens, or a string that will subsequently be tokenised.

Addressing Shortcomings

In my previous post about csnake I mentioned a few things that needed to be done. I have addressed some of these things in cmonster:

A way of detecting (or at least configuring) pre-defined macros and include paths for a target compiler/preprocessor. A standalone C preprocessor isn't worth much. It needs to act like or delegate to a real preprocessor, such as GCC.

I have added support for emulating GCC. This is done by consulting GCC for its predefined macros (using "gcc -dM"), and using the new "include locator" callback feature. By setting an include locator callback on a cmonster preprocessor, you provide cmonster with a second-chance attempt at locating an include file when the file can not be found in the specified include directories. This method can be used to determine GCC's system include directories: whenever we can't find an include file, we consult GCC and add parse the output to determine the location of the file on disk. I intend to add support for more (common) compilers/preprocessors in the future. Namely, MSVC.

I had another crazy idea for (ab)using include locators: handling specially formatted #include directives to feed off-disk files into the preprocessor. Buh? I mean, say, the ability to pull down headers from a hosted repository such as github (e.g. #include <github.yourproject/header.h>), and feeding them into the preprocessor as in-memory files. Or generating headers on the fly (e.g. #include <myapi.proto>, to automatically generate and include Protobuf headers).

A #pragma to define Python macros in source, or perhaps if I'm feeling adventurous, something like #pydefine.

Support for inline Python macros has been implemented: see the "Demonstration" section above. It's unlikely I'll attempt to create a #pydefine, as it would be more work than it's worth.

A simple command line interface with the look and feel of a standard C preprocessor

The distribution now contains a "cmonster" script, which invokes the preprocessor on a file specified on the command-line. This will need a lot of work: presently you can't add user include directories or (un)define macros. Neither of these things are difficult to add, they just haven't been top priorities.

Future Work

Still remaining to do (and sticking out like sore thumbs) are unit-tests and documentation. Now that I've got something vaguely usable, I will be working on those next.

Once I've tested, documented and stabilised the API, I'll look at (in no definite order):

Improvement the command line interface. Add the standard "-I", "-D" and "-U" parameters.
Portability. Probably Windows first, since it's common and I have access to it.
Emulation of additional preprocessors.
Passing in-memory files ("file like" Python objects) to #include.

Controlling Remote Agents from Google App Engine

2011-07-17T12:57:00.000+08:00

A month or so ago I was brainstorming ideas related to Google App Engine (GAE), as I had been wanting a reason to play with it for a while. One idea that stuck was connecting a remote Python process to GAE via Pushy, so we could either control GAE or GAE could control the remote process. I'm still working on the C/Python Preprocessor thingy, but I took a break from that this weekend to look into the possibility of a GAE Pushy transport.

So yesterday morning I signed up for an account, and started tinkering. I had been reading the docs already, and I figured there were a few possible approaches:

The obvious: use HTTP. This has one major drawback in that it is inherently synchronous and wholly driven by the client. Moreover, GAE only allows request handlers around 30s to complete, so no kind of fancy long-polling is possible here.
Channel API. This sounds like the right kind of thing to use, but it's aimed at interacting with Javascript in a webpage.
XMPP. Huh? Isn't that for instant messaging? Exactly. The client (remote Python process) and server (GAE) are peers in XMPP, and either one can initiate sending messages to the other. Let's look into this a bit more...

So I did a quick search for Python XMPP libraries, and a few came up. I settled on xmpppy, but to be honest I didn't find any of them particularly compelling. The APIs are a bit clunky. Anyway, the approach I took was to have an XMPP handler in my GAE application create a persistent Pushy connection object associated with a pair of read/write files that wrapped the XMPP API. When an XMPP message came in, the application would extract the Pushy request from base-64 encoded body of the message, and return the result in a similar manner.

And it worked, but only just. I had to make a few hacks to Pushy to get all of this work. There were some oddities I had to work around, such as the "eval" built-in in GAE's Python not taking any keyword arguments. Unfortunately, I don't think this particular transport is very useful in the flaky state it's in at the moment. Also, it's not terribly valuable to build a transport to control a GAE application, since other APIs exist for that purpose (ProtoRPC, remote_api). More useful would be to have the GAE application control the remote process without the need for polling. I'll be looking into this further.

Le sigh

2011-06-21T22:32:00.000+08:00

I've been coming across more problems with Boost Wave. The current ones blocking me are:

A lack of an efficient way to conditionally disable a macro. The "context policy" provides hooks for handling macro expansions, and its return value is meant to control whether the expansion takes place. It doesn't work. I'll write up a bug report when I get some time.
Wave isn't very forgiving about integer under/overflow. For example, the GNU C library's header "/usr/include/bits/wchar.h" has the following tidbit to determine the sign of wide characters, which Boost Wave barfs on:

#elif L'\0' - 1 > 0

I think the latter "problem" might actually be reasonable - I believe the standards say that handling of overflow is undefined, and preprocessor/compiler-specific. That doesn't help me much though. I could fix this by writing code to parse the expressions, which seems silly, or by passing off to the target preprocessor (e.g. GCC), which seems like overkill.

I'm going to have a look at how hard it would be to use LLVM/Clang's preprocessor instead. If that's a better bet, I may go that way. Otherwise, it might be time to get approval to send patches to the Wave project.