James Long 2014-12-10T00:00:00Z http://jlongster.com James Long Transducers.js Round 2 with Benchmarks 2014-10-12T00:00:00Z 2014-10-12T00:00:00Z http://jlongster.com/Transducers.js-Round-2-with-Benchmarks A few weeks ago I released my transducers library and explained the algorithm behind it. It's a wonderfully simple technique for high-performant transformations like map and filter and was created by Clojure (mostly Rich Hickey I think).

Over the past week I've been hard at work polishing and benchmarking it. Today I published version 0.2.0 with a new API and completely refactored internals that make it easy to use and get performance that beats other popular utility libraries. (This is a different library than the recently released one from Cognitect)

A Few Benchmarks

Benchmarking is hard, but I think it's worthwhile to post a few of them that backs up these claims. All of these were run on the latest version of node (0.10.32). First I wanted to prove how transducers devastates many other libraries for large arrays (update: lodash + laziness comes the closest, see more in the next section). The test performs two maps and two filters. Here is the transducer code:

]]>
A few weeks ago I released my transducers library and explained the algorithm behind it. It's a wonderfully simple technique for high-performant transformations like map and filter and was created by Clojure (mostly Rich Hickey I think).

Over the past week I've been hard at work polishing and benchmarking it. Today I published version 0.2.0 with a new API and completely refactored internals that make it easy to use and get performance that beats other popular utility libraries. (This is a different library than the recently released one from Cognitect)

A Few Benchmarks

Benchmarking is hard, but I think it's worthwhile to post a few of them that backs up these claims. All of these were run on the latest version of node (0.10.32). First I wanted to prove how transducers devastates many other libraries for large arrays (update: lodash + laziness comes the closest, see more in the next section). The test performs two maps and two filters. Here is the transducer code:

into([],
     compose(
       map(function(x) { return x + 10; }),
       map(function(x) { return x * 2; }),
       filter(function(x) { return x % 5 === 0; }),
       filter(function(x) { return x % 2 === 0; })
     ),
     arr);

The same transformations were implemented in lodash and underscore, and benchmarked with an arr of various sizes. The graph below shows the time it took to run versus the size of arr, which starts at 500 and goes up to around 300,000. Here's the full benchmark (it outputs Hz so the y-axis is 1/Hz).

Once the array reaches around the size of 90,000, transducers completely blow the competition away. This should be obvious; we never need to allocate anything between transformations, while underscore and lodash always have to allocation an intermediate array.

Laziness would not help here, since we are eagerly evaluating the whole array.

Update: More Detailed Benchmark

This section was added after requests for a more thorough benchmark, particularly including lodash's new lazy behavior

The master branch of lodash supports laziness, which should provide performance gains. Let's include that in the benchmark to see how well it helps. Laziness is a technique where a chain doesn't evaluate the transformations until a final value method is called, and it attempts to reduce intermediate allocations. Here's the full benchmark that generated the following graph.

We also added comparisons with native map and filter, and a baseline that manually performs the same operations in a for loop (thanks @stefanpenner for that).

First, as expected the baseline performs the best. But the cost of transducers isn't too bad, and you get a far better and easier to use abstraction than manually hand-coding for loops. Unfortunately, native is slowest for various reasons.

The real interesting thing is that the laziness of lodash does help it out a lot. For some reason there's still a jump, but it's at a much higher point, around 280,000 items. In general transducers take about 2/3rds of the time though, and the performance is more consistent. Note that for there's actually a perf hit for lodash laziness for smaller arrays under 90,000.

This benchmark was run with node 0.10.32, and it most likely looks different on various engines. Transducers don't beat a lazy lodash as much (for some array sizes not at all) in Firefox, but I think that's more due to poor optimization in Firefox. The algorithm is inherently open to great optimizations as the process is only a few functions calls per item, so I think it will only get better across each engine. My guess is that Firefox needs to do a better job inlining functions, but I still need to look into it.

Small Arrays

While it's not as dramatic, even with arrays as small as 1000 you will see performance wins. Here is the same benchmarks but only running it twice with a size of 1000 and 10,000:

_.map/filter (1000) x 22,302 ops/sec ±0.90% (100 runs sampled)
u.map/filter (1000) x 21,290 ops/sec ±0.65% (96 runs sampled)
t.map/filter+transduce (1000) x 26,638 ops/sec ±0.77% (98 runs sampled)

_.map/filter (10000) x 2,277 ops/sec ±0.49% (101 runs sampled)
u.map/filter (10000) x 2,155 ops/sec ±0.77% (99 runs sampled)
t.map/filter+transduce (10000) x 2,832 ops/sec ±0.44% (99 runs sampled)

Take

If you use the take operation to only take, say, 10 items, transducers will only send 10 items through the transformation pipeline. Obviously if I ran benchmarks we would also blow away lodash and underscore here because they do not lazily optimize for take (and transform all the array first and then runs take). You can do this in some of the other libraries like lodash with explicitly marking a chain as lazy and then requesting the value at the end. We get this for free though, and still beat it in this scenario because we don't have any laziness machinery.

I ran a benchmark here but I don't have it anymore, but it's worth noting that we don't need to be explicitly lazy to optimize for take.

immutable-js

The immutable-js library is fantastic collection of immutable data structures. They implement lazy transformations so you get a lot of perf wins with that. Even so, there is a cost to the laziness machinery. I implemented the same map->map->filter->filter transformation above in another benchmark which compares it with their transformations. Here is the output with arr sizes of 1000 and 100,000:

Immutable map/filter (1000) x 6,414 ops/sec ±0.95% (99 runs sampled)
transducer map/filter (1000) x 7,119 ops/sec ±1.58% (96 runs sampled)

Immutable map/filter (100000) x 67.77 ops/sec ±0.95% (72 runs sampled)
transducer map/filter (100000) x 79.23 ops/sec ±0.47% (69 runs sampled)

This kind of perf win isn't a huge deal, and their transformations perform well. But we can apply this to any data structure. Did you notice how easy it was to use our library with immutable-js? View the full benchmark here.

Transducers.js Refactored

I just pushed v0.2.0 to npm with all the new APIs and performance improvements. Read more in the new docs.

You may have noticed the Cognitect, where Rich Hickey and other core maintainers of Clojure(Script) work, released their own JavaScript transducers library on Friday. I was a little bummed because I had just spent a lot of time refactoring mine, but I think I offer a few improvements. Internally, we basically converged on the exact same technique for implementing transducers, so you should find the same performance characteristics above with their library.

All of the following features are things you can find in my library transducers.js.

My library now offers several integration points for using transducers:

  • seq takes a collection and a transformer and returns a collection of the same type. If you pass it an array, you will get back an array. An iterator will give you back an iterator. For example:
// Filter an array
seq([1, 2, 3], filter(x => x > 1));
// -> [ 2, 3 ]

// Map an object
seq({ foo: 1, bar: 2 }, map(kv => [kv[0], kv[1] + 1]));
// -> { foo: 2, bar: 3 }

// Lazily transform an iterable
function* nums() {
  var i = 1;
  while(true) {
    yield i++;
  }
}

var iter = seq(nums(), compose(map(x => x * 2),
                               filter(x => x > 4));
iter.next().value; // -> 6
iter.next().value; // -> 8
iter.next().value; // -> 10
  • toArray, toObject, and toIter will take any iterable type and force them into the type that you requested. Each of these can optionally take a transform as the second argument.
// Make an array from an object
toArray({ foo: 1, bar: 2 });
// -> [ [ 'foo', 1 ], [ 'bar', 2 ] ]

// Make an array from an iterable
toArray(nums(), take(3));
// -> [ 1, 2, 3 ]

That's a very quick overview, and you can read more about these in the docs.

Collections as Arguments

All the transformations in transducers.js optionally take a collection as the first argument, so the familiar pattern of map(coll, function(x) { return x + 1; }) still works fine. This is an extremely common use case so this will be very helpful if you are transitioning from another library. You can also pass a context as the third argument to specify what this should be bound to.

Read more about the various ways to use transformations.

Laziness

Transducers remove the requirement of being lazy to optimize for things like take(10). However, it can still be useful to "bind" a collection to a set of transformations and pass it around, without actually evaluating the transformations. It's also useful if you want to apply transformations to a custom data type, get an iterator back, and rebuild another custom data type from it (there is still no intermediate array).

Whenever you apply transformations to an iterator it does so lazily. It's easy to convert array transformations into a lazy operation, just use the utility function iterator to grab an iterator of the array instead:

seq(iterator([1, 2, 3]),
    compose(
      map(x => x + 1),
      filter(x => x % 2 === 0)))
// -> <Iterator>

Our transformations are completely blind to the fact that our transformations may or may not be lazy.

The transformer Protocol

Lastly, transducers.js supports a new protocol that I call the transformer protocol. If a custom data structure implements this, not only can we iterate over it in functions like seq, but we can also build up a new instance. That means seq won't return an iterator, but it will return an actual instance.

For example, here's how you would implement it in Immutable.Vector:

var t = require('./transducers');
Immutable.Vector.prototype[t.protocols.transformer] = {
  init: function() {
    return Immutable.Vector().asMutable();
  },
  result: function(vec) {
    return vec.asImmutable();
  },
  step: function(vec, x) {
    return vec.push(x);
  }
};

If you implement the transformer protocol, now your data structure will work with all of the builtin functions. You can just use seq like normal and you get back an immutable vector!

t.seq(Immutable.Vector(1, 2, 3, 4, 5),
      t.compose(
        t.map(function(x) { return x + 10; }),
        t.map(function(x) { return x * 2; }),
        t.filter(function(x) { return x % 5 === 0; }),
        t.filter(function(x) { return x % 2 === 0; })));
// -> Vector [ 30 ]

I hope you give transducers a try, they are really fun! And unlike Cognitect's project, mine is happy to receive pull requests. :)

]]>
Transducers.js: A JavaScript Library for Transformation of Data 2014-09-18T00:00:00Z 2014-09-18T00:00:00Z http://jlongster.com/Transducers.js--A-JavaScript-Library-for-Transformation-of-Data If you didn't grab a few cups of coffee for my last post, you're going to want to for this one. While writing my last post about js-csp, a port of Clojure's core.async, they announced transducers which solves a key problem when working with transformation of data. The technique works particularly well with channels (exactly what js-csp uses), so I dug into it.

What I discovered is mind-blowing. So I also ported it to JavaScript, and today I'm announcing transducers.js, a library to build transformations of data and apply it to any data type you could imagine.

Woha, what did I just say? Let's take a step back for a second. If you haven't heard of transducers before, you can read about their history in Clojure's announcement. Additionally, there's an awesome post that explores these ideas in JavaScript and walks you through them from start to finish. I give a similar (but brief) walkthrough at the end of this post.

The word transduce is just a combination of transform and reduce. The reduce function is the base transformation; any other transformation can be expressed in terms of it (map, filter, etc).

]]>
If you didn't grab a few cups of coffee for my last post, you're going to want to for this one. While writing my last post about js-csp, a port of Clojure's core.async, they announced transducers which solves a key problem when working with transformation of data. The technique works particularly well with channels (exactly what js-csp uses), so I dug into it.

What I discovered is mind-blowing. So I also ported it to JavaScript, and today I'm announcing transducers.js, a library to build transformations of data and apply it to any data type you could imagine.

Woha, what did I just say? Let's take a step back for a second. If you haven't heard of transducers before, you can read about their history in Clojure's announcement. Additionally, there's an awesome post that explores these ideas in JavaScript and walks you through them from start to finish. I give a similar (but brief) walkthrough at the end of this post.

The word transduce is just a combination of transform and reduce. The reduce function is the base transformation; any other transformation can be expressed in terms of it (map, filter, etc).

var arr = [1, 2, 3, 4];

arr.reduce(function(result, x) {
    result.push(x + 1);
    return result;
}, []);
// -> [ 2, 3, 4, 5 ]

The function passed to reduce is a reducing function. It takes a result and an input and returns a new result. Transducers abstract this out so that you can compose transformations completely independent of the data structure. Here's the same call but with transduce:

function append(result, x) {
    result.push(x);
    return result;
}

transduce(map(x => x + 1), append, [], arr);

We created append to make it easier to work with arrays, and are using ES6 arrow functions (you really should too, they are easy to cross-compile). The main difference is that the push call on the array is now moved out of the transformation. In JavaScript we always couple transformation with specific data structures, and we've got to stop doing that. We can reuse transformations across all data structures, even streams.

There are three main concerns here that reduce needs to work. First is to iterate over the source data structure. Second is to transform each value. Third is to build up a new result.

These are completely separate concerns, and yet most transformations in JavaScript are tightly coupled with specific data structures. Transducers decouples this and you can apply all the available transformations on any data structure.

Transformations

We have a small amount of transformations that will solve most of your needs like map, filter, dedupe, and more. Here's an example of composing transformations:

sequence(
  compose(
    cat,
    map(x => x + 1),
    dedupe(),
    drop(3)
  ),
  [[1, 2], [3, 4], [4, 5]]
)
// -> [ 5, 6 ]

The compose function combines transformations, and sequence just creates a new collection of the same type and runs the transformations. Note that nothing within the transformations assume anything about the data structure from where it comes or where it's going.

Most of the transformations that transducers.js provides can also simply take a collection, and it will immediately run the transformation over the collection and return a new collection of the same type. This lets you do simple transformations the familiar way:

map(x => x + 1, [1, 2, 3, 4]);
filter(x => x % 2 === 0, [1, 2, 3, 4])

These functions are highly optimized for the builtin types like arrays, so the above map literally just runs a while loop and applies your function over each value.

Iterating and Building

These transformations aren't useful unless you can actually apply them. We figured out the transform concern, but what about iterate and build?

First lets take a look at the available functions for applying transducers:

  • sequence(xform, coll) - get a collection of the same type and fill it with the results of applying xform over each item in coll
  • transduce(xform, f, init, coll) - reduce a collection starting with the initial value init, applying xform to each value and running the reducing function f
  • into(to, xform, from) - apply xform to each value in the collection from and append it to the collection to

Each of these has different levels of assumptions. transduce is the lowest-level in that it iterates over coll but lets you build up the result. into assumes the result is a collection and automatically appends to it. Finally, sequence assumes you want a collection of the same type so it creates it and fills it with the results of the transformation.

Ideally our library wouldn't care about the details of iteration or building either, otherwise it kind of kills the point of generic transformations. Luckily ES6 has an iteration protocol, so we can use that for iteration.

But what about building? Unfortunately there is no protocol for that, so we need to create our own. transducers.js looks for @@append and @@empty methods on a collection for adding to it and creating new collections. (Of course, it works out of the box for native arrays and objects).

Let's drive this point home with an example. Say you wanted to use the immutable-js library. It already supports iteration, so you can automatically do this:

into([],
     compose(
       map(x => x * 2),
       filter(x => x > 5)
     ),
     Immutable.Vector(1, 2, 3, 4));
// -> [ 6, 8 ]

We really want to use immutable vectors all the way through, so let's augment the vector type to support "building":

Immutable.Vector.prototype['@@append'] = function(x) {
  return this.push(x);
};

Immutable.Vector.prototype['@@empty'] = function(x) {
  return Immutable.Vector();
};

Now we can just use sequence, and we get an immutable vector back:

sequence(compose(
           map(x => x * 2),
           filter(x => x > 5)
         ),
         Immutable.Vector(1, 2, 3, 4));
// -> Immutable.Vector(6, 8)

This is experimental, so I would wait a little while before using this in production, but so far this gives a surprising amount of power for a 500-line JavaScript library.

Implications

Works with Everything (including Streams and Channels)!

Let's play around with all the kinds of data structures we can use now. A type must at least be iterable to use with into or transduce, but if it is also buildable then it can also be used with sequence or the target collection of into.

var xform = compose(map(x => x * 2),
                    filter(x => x > 5));


// arrays (iterable & buildable)

sequence(xform, [1, 2, 3, 4]);
// -> [ 6, 8 ]

// objects (iterable & buildable)

into([],
     compose(map(kv => kv[1]), xform),
     { x: 1, y: 2, z: 3, w: 4 })
// -> [ 6, 8 ]

sequence(map(kv => [kv[0], kv[1] + 1]),
         { x: 1, y: 2, z: 3, w: 4 })
// -> { x: 2, y: 3, z: 4, w: 5 }

// generators (iterable)

function *data() {
  yield 1;
  yield 2;
  yield 3;
  yield 4;
}

into([], xform, data())
// -> [ 6, 8 ]

// Sets and Maps (iterable)

into([], xform, new Set([1, 2, 3, 3]))
// -> [ 6 ]

into({}, map(kv => [kv[0], kv[1] * 2], new Map([['x', 1], ['y', 2]])))
// -> { x: 2, y: 4 }

// or make it buildable

Map.prototype['@@append'] = Map.prototype.add;
Map.prototype['@@empty'] = function() { return new Map(); };
Set.prototype['@@append'] = Set.prototype.add;
Set.prototype['@@empty'] = function() { return new Set(); };

sequence(xform, new Set([1, 2, 3, 2]))
sequence(xform, new Map([['x', 1], ['y', 2]]));

// node lists (iterable)

into([], map(x => x.className), document.querySelectorAll('div'));

// custom types (iterable & buildable)

into([], xform, Immutable.Vector(1, 2, 3, 4));
into(MyCustomType(), xform, Immutable.Vector(1, 2, 3, 4));

// if implemented append and empty:
sequence(xform, Immutable.Vector(1, 2, 3, 4));

// channels

var ch = chan(1, xform);

go(function*() {
  yield put(ch, 1);
  yield put(ch, 2);
  yield put(ch, 3);
  yield put(ch, 4);
});

go(function*() {
  while(!ch.closed) {
    console.log(yield take(ch));
  }
});
// output: 6 8

Now that we've decoupled the data that comes in, how it's transformed, and what comes out, we have an insane amount of power. And with a pretty simple API as well.

Did you notice that last example with channels? That's right, a js-csp channel which I introduced in my last post now can take a transducer to apply over each item that passes through the channel. This easily lets us do Rx-style (reactive) code by simple reusing all the same transformations.

A channel is basically just a stream. You can reuse all of your familiar transformations on streams. That's huge!

This is possible because transducers work differently in that instead of applying each transformation to a collection one at a time (and creating multiple intermediate collections), they take each value separately and fire them through the whole transformation pipeline. That's leads us to the next point, in which there are...

No Intermediate Allocations!

Not only do we have a super generic way of transforming data, we get good performance on large arrays. This is because transducers create no intermediate collections. If you want to apply several transformations, usually each one is performed in order, creating a new collection each time.

Transducers, however, take one item off the collection at a time and fire it through the whole transformation pipeline. So it doesn't need any intermediate collections; each value runs through the pipeline separately.

Think of it as favoring a computational burden over a memory burden. Since each value runs through the pipeline, there are several function calls per item but no allocations, instead of 1 function call per item but an allocation per transformation. For small arrays there is a small difference, but for large arrays the computation burden easily wins out over the memory burden.

To be frank, early benchmarks show that this doesn't win anything in V8 until you reach a size of around 100,000 items (after that this really wins out). So it only matters for very large arrays. It's too early to post benchmarks. (update: there are actually good perf gains even with small arrays, see here. Previously the library was doing it wrong.)

How a Transducer is Born

If you are interested in walking through how transducers generalize reduce into what you see above, read the following. Feel free to skip this part though, or read this post which also does a great job of that.

The reduce function is the base transformation; any other transformation can be expressed in terms of it (map, filter, etc), so let's start with that. Here's an example call to reduce, which is available on native JS arrays:

var arr = [1, 2, 3, 4];

arr.reduce(function(result, x) { return result + x; }, 0);
// -> 10

This sums up all numbers in arr. Pretty simple, right? Hm, let's try and implement map in terms of reduce:

function map(f, coll) {
  return coll.reduce(function(result, x) {
    result.push(f(x));
    return result;
  }, []);
}

map(function(x) { return x + 1; }, arr);
// -> [2, 3, 4, 5]

That works. But our map only works with native JS arrays. It assumes a lot of knowledge about how to reduce, how to append an item, and what kind of collection to create. Shouldn't our map only be concerned with mapping? We've got to stop coupling transformations with data; every single collection is forced to completely re-implement map, filter, take, and all the collection operations, with varying incompatible properties!

But how is that possible? Well, let's start with something simple: the mapping function that we meant to create. It's only concernced with mapping. The key is that reduce will always be at the bottom of our transformation, but there's nothing stopping us from abstracting the function we pass to reduce:

function mapper(f) {
  return function(result, x) {
    result.push(f(x));
    return result;
  }
}

That looks better. We would use this by doing arr.reduce(mapper(function(x) { return x + 1; }), []). Note that now mapper has no idea how the reduction is actually done, or how the initial value is created. Unfortunately, it still has result.push embedded so it still only works with arrays. Let's abstract that out:

function mapper(f) {
  return function(combine) {
    return function(result, x) {
      return combine(result, f(x));
    }
  }
}

That looks crazy, but now we have a mapper function that is literally only concerned about mapping. It calls f with x before passing it to combine. The above function may look daunting, but it's simple to use:

function append(arr, x) {
    arr.push(x);
    return arr;
}

arr.reduce(mapper(function(x) { return x + 1; })(append),
           []);
// -> [ 2, 3, 4, 5 ]

We create append to make it easy to functionally append to arrays. So that's about it, now we can just make this a little easi-- hold on, doesn't combine look a little like a reducer function?

If the result of applying append to the result of mapper creates a reducer function, can't we apply that itself to mapper?

arr.reduce(
  mapper(function(x) { return x * 2; })(
    mapper(function(x) { return x + 1; })(append)
  ),
  []
);
// -> [ 3, 5, 7, 9 ]

Wow! So now we can compose these super generic transformation functions. For example, let's create a filterer. You wouldn't normally apply two maps right next to each other, but you would certainly map and filter!

function filterer(f) {
  return function(combine) {
    return function(result, x) {
      return f(x) ? combine(result, x) : result;
    }
  }
}

arr.reduce(
  filterer(function(x) { return x > 2; })(
    mapper(function(x) { return x * 2; })(append)
  ),
  []
);
// -> [ 6, 8 ]

Nobody wants to write code like that though. Let's make one more function compose which makes it easy to compose these, that's right, transducers. You just wrote transducers without even knowing it.

// All this does is it transforms
// `compose(x, y, z)(val)` into x(y(z(val)))`
function compose() {
  var funcs = Array.prototype.slice.call(arguments);
  return function(r) {
    var value = r;
    for(var i=funcs.length-1; i>=0; i--) {
      value = funcs[i](value);
    }
    return value;
  }
}

arr.reduce(
  compose(
    filterer(function(x) { return x > 2; }),
    mapper(function(x) { return x * 2; })
  )(append),
  []
);
// -> [ 6, 8 ]

Now we can write really clean sequential-looking transformations! Hm, there's still that awkward syntax to pass in append. How about we make our own reduce function?

function transduce(xform, f, init, coll) {
  return coll.reduce(xform(f), init);
}

transduce(
  compose(
    filterer(function(x) { return x > 2; }),
    mapper(function(x) { return x * 2; })
  ),
  append,
  [],
  arr
);
// -> [ 6, 8 ]

Voila, you have transduce. Given a transformation, a function for appending data, an initial value, and a collection, run the whole process and return the final result from whatever append is. Each of those arguments are distinct pieces of information that shouldn't care at all about the others. You could easily apply the same transformation to any data structure you can imagine, as you will see below.

This transduce is not completely correct in that it should not care how the collection reduces itself.

Final Notes

You might think that this is sort of lazy evaluation, but that's not true. If you want lazy sequences, you will still have to explicitly build a lazy sequence type that handles those semantics. This just makes transformations first-class values, but you still always have to eagerly apply them. Lazy sequences are something I think should be added to transducers.js in the future. (edit: well, this paragraph isn't exactly true, but we'll have to explain laziness more in the future)

Some of the examples my also feel similar to ES6 comprehensions, and while true comprehensions don't give you the ability to control what type is built up. You can only get a generator or an array back. They also aren't composable; you will still need to solve the problem of building up transformations that can be reused.

When you correctly separate concerns in a program, it breeds super simple APIs that allow you build up all sorts of complex programs. This is a simple 500-line JavaScript library that, in my opinion, radically changes how I interact with data, and all with just a few methods.

transducers.js is still early work and it will be improved a lot. Let me know if you find any bugs (or if it blows your mind).

]]>
Taming the Asynchronous Beast with CSP Channels in JavaScript 2014-09-08T00:00:00Z 2014-09-08T00:00:00Z http://jlongster.com/Taming-the-Asynchronous-Beast-with-CSP-in-JavaScript This is an entry in a series about rebuilding my custom blog with react, CSP, and other modern tech. Read more in the blog rebuild series.

Every piece of software deals with complex control flow mechanisms like callbacks, promises, events, and streams. Some require simple asynchronous coordination, others processing of event or stream-based data, and many deal with both. Your solution to this has a deep impact on your code.

It's not surprising that a multitude of solutions exist. Callbacks are a dumb simple way for passing single values around asynchronously, and promises are a more refined solution to the same problem. Event emitters and streams allow asynchronous handling of multiple values. FRP is a different approach which tackles streams and events more elegantly, but isn't as good at asynchronous coordination. It can be overwhelming just to know where to start in all of this.

I think things can be simplified to a single abstraction since the underlying problem to all of this is the same. I present to you CSP and the concept of channels. CSP has been highly influential in Go and recently Clojure embraced it as well with core.async. There's even a C version. It's safe to say that it's becoming quite popular (and validated) and I think we need to try it out in JavaScript. I'm not going to spend time comparing it with every other solution (promises, FRP) because it would take too long and only incite remarks about how I wasn't using it right. I hope my examples do a good enough job convincing you themselves.

Typically channels are useful for coordinating truly concurrent tasks that might run at the same time on separate threads. They are actually just as useful in a single-threaded environment because they solve a more general problem of coordinating anything asynchronous, which is everything in JavaScript.

]]>
This is an entry in a series about rebuilding my custom blog with react, CSP, and other modern tech. Read more in the blog rebuild series.

Every piece of software deals with complex control flow mechanisms like callbacks, promises, events, and streams. Some require simple asynchronous coordination, others processing of event or stream-based data, and many deal with both. Your solution to this has a deep impact on your code.

It's not surprising that a multitude of solutions exist. Callbacks are a dumb simple way for passing single values around asynchronously, and promises are a more refined solution to the same problem. Event emitters and streams allow asynchronous handling of multiple values. FRP is a different approach which tackles streams and events more elegantly, but isn't as good at asynchronous coordination. It can be overwhelming just to know where to start in all of this.

I think things can be simplified to a single abstraction since the underlying problem to all of this is the same. I present to you CSP and the concept of channels. CSP has been highly influential in Go and recently Clojure embraced it as well with core.async. There's even a C version. It's safe to say that it's becoming quite popular (and validated) and I think we need to try it out in JavaScript. I'm not going to spend time comparing it with every other solution (promises, FRP) because it would take too long and only incite remarks about how I wasn't using it right. I hope my examples do a good enough job convincing you themselves.

Typically channels are useful for coordinating truly concurrent tasks that might run at the same time on separate threads. They are actually just as useful in a single-threaded environment because they solve a more general problem of coordinating anything asynchronous, which is everything in JavaScript.

Two posts you should read in addition to this are David Nolen's exploration of core.async and the core.async announcement. You will find the rationale behind CSP and clear examples of how powerful channels are.

In this post, I will dive deeply into how we can use this in JavaScript, and illustrate many key points about it. CSP is enabled by js-csp which I will explain more soon. Here is a quick peek:

var ch = chan(); go(function*() { var val; while((val = yield take(ch)) !== csp.CLOSED) { console.log(val); } }); go(function*() { yield put(ch, 1); yield take(timeout(1000)); yield put(ch, 2); ch.close(); });

Note: these interactive example assume a very modern browser and have only been heavily tested in Firefox and Chrome.

We get synchronous-style code with generators by default, and a sophisticated mechanism for coordinating tasks that is simple for basic async workflows but also scales to complex scenarios.

Let's Talk About Promises

Before we dig in, we should talk about promises. Promises are cool. I am forever grateful that they have mostly moved the JavaScript community off of the terrible callback endemic. I really do like them a lot. Unlike some other advocates of CSP, I think they actually have a good error handling story because JavaScript does a good job of tracking the location from wherever an Error object was created (even so, find the "icing on the cake" later in this article about debugging errors from channels). The way promises simulate try/catch for asynchronous code is neat.

I do have one issue with how errors are handled in promises: because it captures any error from a handler, you need to mark the end of a promise chain (with something like done()) or else it will suppress errors. It's all too easy during development to make a simple typo and have the error gobbled up by promises because you forgot to attach an error handler.

I know that is a critical design decision for promises so that you get try/catch for async code, but I've been bitten too often by it. I have to wonder if it's really worth the ability to apply try/catch to async to ignore everything like TypeError and ReferenceError, or if there's a more controlled way to handle errors.

Error handling in CSP is definitely more manual, as you will see. But I also think it makes it clearer where errors are handled and makes it easier to rationalize about them. Additionally, by default syntax/null/etc errors are simply thrown and not gobbled up. This has drawbacks too, but I'm liking it so far.

I lied. I have a second complaint about promises: generators are an after-thought. In my opinion, anything that deals with asynchronous behavior and doesn't natively embrace generators is broken (though understandable considering you need to cross-compile them until they are fully implemented).

Lastly, when it comes down to it, using a channel is not that different from using a promise. Compare the following code that takes a value and returns a different one:

Promise

promiseReturningFunction().then(function(value) {
  return value * 2;
});

// Or with generators:

spawn(function*() {
  return (yield promiseReturningFunction()) * 2;
});

Channels

go(function*() {
  return (yield take(channelReturningFunction())) * 2;
});

The similarity is striking, especially when using generators with promises. This is a trivial example too, and when you start doing more async work the latter 2 approaches look far better than raw promises.

Channels are marginally better than promises with generators for single-value asynchronous coordination, but the best part is that you can do all sorts of more complex workflows that also relegate the need for streams and event-based systems.

Using CSP in JavaScript

The fundamental idea of CSP is an old one: handle coordination between processes via messsage passing. The unique ideas of modern CSP are that processes can be simple light-weight cooperative threads, use channels to pass messages, and block execution when taking or putting from channels. This tends to make it very easy to express complex asynchronous flows.

Generators are coming to JavaScript and allow us to suspend and resume functions. This lets us program in a synchronous style, using everything from while loops to try/catch statements, but "halt" execution at any point. In my opinion, anything dealing with asynchronous behavior that doesn't completely embrace generators natively is busted.

CSP channels do exactly that. Using generators, the js-csp project has been able to faithfully port Clojure's core.async to JavaScript. We will use all the same terms and function names as core.async. I eventually forked the project to add a few things:

  • The go block which spawns a lightweight process always returns a channel that holds the final value from the process
  • sleep was a special operation that you could yield, but if you wanted an actual channel that timed out you had to use timeout instead. I removed sleep so you always use timeout which makes it more consistent.
  • I added a takem instruction which stands for "take maybe". If an Error object is passed through the channel it will throw it automatically at the place were takem was yielded.

This project is early in development so things may change, but it should be relatively stable. You will need to cross-compile generators to run it in all browsers; I recommend the ridiculously awesome regenerator project.

If you don't know much about generators, Kyle Simpson posted a great 4-part series about them. He even explores CSP in the last post but misses some critical points which have serious consequences like breaking composition and the ease of transforming values.

Basic Principles

Let's study the basic principles of CSP:

  • Processes are spawned with go, and channels are created with chan. Processes are completely unaware of each other but talk through channels.
  • Use take and put to operate on channels within a process. take gets a value and blocks if one isn't available. put puts a value on a channel and blocks if a process isn't available to take it.

Wow, that's it! Pretty simple, right? There are more advanced usages of CSP, but even just with those 4 methods we have a powerful way to express asynchronous coordination.

Here's an example. We create 3 processes that put values on a channel and sleep for various times, and a 4th process that takes values off the channel and logs them. If you run the code below, you will see that that these processes are running as if they are separate threads! Each process has its own while loop that loops forever, which is an amazingly powerful way to express asynchronous interaction. The 4th process closes the channel after 10 values come through, which stops the other processes because a put on a closed channel returns false.

var ch = chan(); go(function*() { while(yield put(ch, 1)) { yield take(timeout(250)); } }); go(function*() { while(yield put(ch, 2)) { yield take(timeout(300)); } }); go(function*() { while(yield put(ch, 3)) { yield take(timeout(1000)); } }); go(function*() { for(var i=0; i<10; i++) { console.log(yield take(ch)); } ch.close(); });

Run the code to see a visualization that shows you what actually happened. If you hover over the arrows you will see details of how values moved across the program. The 3 processes all put a value on the channel at the start of the program, but then slept for different times. Note that the first 3 processes were almost always sleeping, and the 4th was almost always blocking. Since the 4th process was always available to take a value, the other processes never had to block.

timeout returns a channel that closes after a specific amount of time. When a channel closes, all blocked takes on it are resumed with the value of csp.CLOSED, and all blocked puts are resumed with false.

Each process also ended at different times because they woke up at different times. You don't always have to explicitly close channels; do it only when you want to send that specific signal to other parts of the program. Otherwise, a channel that you don't use anymore (and any processes blocked on it) will simply be garbage collected.

Here's another example. This program creates 2 processes that both take and put from/onto the same channel. Again, they contain their own event loops that run until the channel is closed. The second process kicks off the interaction by putting a value onto the channel, and you can see how they interact in the visualization below. The 3rd process just closes the channel after 5 seconds.

var ch = chan(); go(function*() { var v; while((v = yield take(ch)) !== csp.CLOSED) { console.log(v); yield take(timeout(300)); yield put(ch, 2); } }); go(function*() { var v; yield put(ch, 1); while((v = yield take(ch)) !== csp.CLOSED) { console.log(v); yield take(timeout(200)); yield put(ch, 3); } }); go(function*() { yield take(timeout(5000)); ch.close(); });

You can see how values bounce back and forth between the processes. This kind of interaction would be extremely difficult with many other asynchronous solutions out there.

These while loops have to check if the channel is closed when taking a value off the channel. You can do this by checking to see if the value is the special csp.CLOSED value. In Clojure, they pass nil to indicate closed and can use it simply in a conditional (like if((v = take(ch))) {}). We don't have that luxury in JavaScript because several things evaluate to false, even 0.

One more example. It's really important to understand that both take and put will block until both sides are there to actually pass the value. In the above examples it's clear that a take would block a process, but here's one where put obviously blocks until a take is performed.

var ch = chan(); go(function*() { yield put(ch, 5); ch.close(); }); go(function*() { yield take(timeout(1000)); console.log(yield take(ch)); });

The first process tried to put 5 on the channel, but nobody was there to take it, so it waited. This simple behavior turns out to be extremely powerful and adaptable to all sorts of complex asynchronous flows, from simple rendezvous to complex flows with timeouts.

Channels as Promises

We've got a lot more cool stuff to look at, but let's get this out of the way. How do processes map to promises, exactly? Honestly, this isn't really that interesting of a use case for channels, but it's necessary because we do this kind of thing all the time in JavaScript.

Treating a channel as a promise is as simple as spawning a process and putting a single value onto it. That means that every single async operation is its own process that will "fulfill" a value by putting it onto its channel. The key is that these are lightweight processes, and you are able to create hundreds upon thousands of them. I am still tuning the performance of js-csp, but creating many channels should be perfectly fine.

Here's an example that shows how many of the promise behaviors map to channels. httpRequest gives us a channel interface for doing AJAX, wrapping a callback just like a promise would. jsonRequest transforms the value from httpRequest into a JSON object, and errors are handled throughout all of this.

function httpRequest(url) { var ch = chan(); var req = new XMLHttpRequest(); req.onload = function() { if(req.status === 200) { csp.putAsync(ch, this.responseText); } else { csp.putAsync(ch, new Error(this.responseText)); } } req.open('get', url, true); req.send(); return ch; } function jsonRequest(url) { return go(function*() { var value = yield take(httpRequest(url)); if(!(value instanceof Error)) { value = JSON.parse(value); } return value; }); } go(function*() { var data = yield takem(jsonRequest('sample.json')); console.log(JSON.stringify(data)); });

You can see how this is very similar to code that uses promises with generators. The go function by default returns a channel that will have the value returned from the generator, so it's easy to create one-shot promise-like processes like jsonRequest. This also introduces putAsync (there's also takeAsync). These functions allow you to put values on channels outside of a go block, and can take callbacks which run when completed.

One of the most interesting aspects here is error handling. It's very different from promises, and more explicit. But in a good way, not like the awkward juggling of callbacks. Errors are simply sent through channels like everything else. Transformative functions like jsonRequest need to only operate on the value if it's not an error. In my code, I've noticed that really only a few channels send errors, and most of them (usually higher-level ones) don't need to worry because errors are handled at the lower-level. The benefit over promises is that when I know I don't need to worry about errors, I don't have to worry about ending the promise chain or anything. That overhead simply doesn't exist.

You probably noticed I said yield takem(jsonRequest('sample.json')) instead of using take. takem is another operation like take, except that when an Error comes off the channel, it is thrown. Try changing the url and checking your devtools console. Generators allow you to throw errors from wherever they are paused, so the process will be aborted if it doesn't handle the error. How does it handle the error? With the native try/catch of course! This is so cool because it's a very terse way to handle errors and lets us use the synchronous form we are used to. There's icing on the cake, too: in your debugger, you can set "pause on exceptions" and it should pause where it was thrown, giving you additional context and letting you inspect the local variables in your process (while the stack of the Error will tell you where the error actually happened). This doesnt work from the above editors because of eval and web worker complications.

Another option for error handling is to create separate channels where errors are sent. This is appropriate in certain (more complicated) scenarios. It's up to you.

Taming User Interfaces

We've seen a few abstract programs using channels and also how we can do typical asynchronous coordination with them. Now let's look at something much more interesting: completely reinventing how we interact with user interfaces.

The Clojure community has blown this door wide open, and I'm going to steal one of David Nolen's examples from his post to start with. (you'll also want to check out his other post). Here we make a simple listen function which gives us a channel interface for listening to DOM events, and we start a process which handles a mouseover event and prints the coordinates.

function listen(el, type) { var ch = chan(); el.addEventListener(type, function(e) { csp.putAsync(ch, e); }); return ch; } go(function*() { var el = document.querySelector('#ui1'); var ch = listen(el, 'mousemove'); while(true) { var e = yield take(ch); el.innerHTML = ((e.layerX || e.clientX) + ', ' + (e.layerY || e.clientY)); } });

Go ahead, move the mouse over the area above and you'll see it respond. We essentially have create a local event loop for our own purposes. You'll see with more complex examples that this is an extraordinary way to deal with user interfaces, bringing simplicity to complex workflows.

Let's also track where the user clicks the element. Here's where channels begin to shine, if they didn't already. Our local event loop handles both the mousemove and click events, and everything is nicely scoped into a single function. There's no callbacks or event handlers anywhere. If you've ever tried to keep track of state across event handlers this should look like heaven.

function listen(el, type) { var ch = chan(); el.addEventListener(type, function(e) { csp.putAsync(ch, e); }); return ch; } go(function*() { var el = document.querySelector('#ui2'); var mousech = listen(el, 'mousemove'); var clickch = listen(el, 'click'); var mousePos = [0, 0]; var clickPos = [0, 0]; while(true) { var v = yield alts([mousech, clickch]); var e = v.value; if(v.channel === mousech) { mousePos = [e.layerX || e.clientX, e.layerY || e.clientY]; } else { clickPos = [e.layerX || e.clientX, e.layerY || e.clientY]; } el.innerHTML = (mousePos[0] + ', ' + mousePos[1] + ' — ' + clickPos[0] + ', ' + clickPos[1]); } });

Mouse over the above area, and click on it. This is possible because of a new operation alts, which takes multiple channels and blocks until one of them sends a value. The return value is an object of the form { value, channel }, where value is the value returned and channel is the channel that completed the operation. We can compare which channel sent the value and conditionally respond to the specific event.

alts actually isn't constrained to performing a take on each channel. It actually blocks until any operation is completed on each channel, and by default it performs take. But you can tell it to perform put by specifying an array with a channel and a value instead of just a channel; for example, alts([ch1, ch2, [ch3, 5]]) performs a put on ch3 with the value 5 and a take on ch1 and ch2.

Expressing UI interactions with alts maps extremely well to how we intuitively think about them. It allows us to wrap events together into a single event, and respond accordingly. No callbacks, no event handlers, no tracking state across functions. We think about UI interactions like this all the time, why not express your code the same way?

If you've ever developed UI controls, you know how complex they quickly get. You need to delay actions by a certain amount, but cancel that action altogether if something else happens, and coordinate all sorts of behaviors. Let's look at a slightly more complex example: a tooltip.

Our tooltip appears if you hover over an item for 500ms. The complete interaction of waiting that amount, but cancelling if you mouse out, and adding/removing the DOM nodes is implemented below. This is the complete code; it relies on nothing other than the CSP library.

function listen(el, type, ch) { ch = ch || chan(); el.addEventListener(type, function(e) { csp.putAsync(ch, e); }); return ch; } function listenQuery(parent, query, type) { var ch = chan(); var els = Array.prototype.slice.call(parent.querySelectorAll(query)); els.forEach(function(el) { listen(el, type, ch); }); return ch; } function tooltip(el, content, cancel) { return go(function*() { var r = yield alts([cancel, timeout(500)]); if(r.channel !== cancel) { var tip = document.createElement('div'); tip.innerHTML = content; tip.className = 'tip-up'; tip.style.left = el.offsetLeft - 110 + 'px'; tip.style.top = el.offsetTop + 75 + 'px'; el.parentNode.appendChild(tip); yield take(cancel); el.parentNode.removeChild(tip); } }); } function menu(hoverch, outch) { go(function*() { while(true) { var e = yield take(hoverch); tooltip(e.target, 'a tip for ' + e.target.innerHTML, outch); } }); } var el = document.querySelector('#ui3'); el.innerHTML = '<span>one</span> <span>two</span> <span>three</span>'; menu(listenQuery(el, 'span', 'mouseover'), listenQuery(el, 'span', 'mouseout'));

Hover over the words above for a little bit and a tooltip should appear. Most of our code is either DOM management or a few utility functions for translating DOM events into channels. We made a new utility function listenQuery that attaches event listeners to a set of DOM elements and streams all those events through a single channel.

We already get a hint of how well you can abstract UI code with channels. There are essentially two components: the menu and the tooltip. The menu is a process with its local event loop that waits for something to come from hoverch and creates a tooltip for the target.

The tooltip is its own process that waits 500ms to appear, and if nothing came from the cancel channel it adds the DOM node, waits for a signal from cancel and removes itself. It's extraordinarily straightforward to code all kinds of interactions.

Note that I never said "wait for a hover event", but rather "wait for a signal from hoverch". We actually have no idea what is on the other end of hoverch actually sending the signals. In our code, it is a real mouseover event, but it could be anything else. We've achieved a fantastic separation of concerns. David Nolen talks more about this in his post.

These have been somewhat simple examples to keep the code short, but if you are intruiged by this you should also check out David's walkthrough where he creates a real autocompleter. All of these ideas come even more to life when things get more complex.

Buffering

There's another features of channels which is necessary when doing certain kinds of work: buffering. Channels can be buffered, which frees up both sides to process things at their own pace and not worry about someone blocking the whole thing.

When a channel is buffered, a put will happen immediately if room is available in the buffer, and a take will return if there's something in the buffer and otherwise block until there's something available.

Take a look below. You can buffer a channel but passing an integer to the constructor, which is the buffer size. We create a channel is a buffer size of 13, a process that puts 15 values on the channel, and another process that takes 5 values off every 200ms. Run the code and you'll see how buffering makes a difference.

var start = Date.now(); var ch = chan(13); go(function*() { for(var x=0; x<15; x++) { yield put(ch, x); console.log('put ' + x); } }); go(function*() { while(!ch.closed) { yield take(timeout(200)); for(var i=0; i<5; i++) { console.log(yield take(ch)); } } }); go(function*() { yield take(timeout(1000)); ch.close(); });

The first 13 puts happen immediately, but then it's blocked because the buffer is full. When a take happens, it's able to put another value in the buffer, and so on. Try removing 13 from the chan constructor and seeing the difference.

There are 3 types of buffers: fixed, dropping, and sliding. When an operation is performed on a fixed buffer, if it is full it will always block like normal. However, dropping and sliding buffers will never block. If the buffer is full when a put is performed, a dropping buffer will simply drop the value and it's lost forever, and a sliding buffer will remove the oldest value to make room for the new value.

Try it out above. Change chan(13) to chan(csp.buffers.dropping(5)) and you'll see that all the puts happen immediately, but only the first 5 values are taken and logged. The last 10 puts just dropped the values. You may see 5 nulls printed as well because there the second process ran one last time but nothing was in the buffer.

Try it with chan(csp.buffers.sliding(5)) and you'll see that you get the last 5 values instead.

You can implement all sorts of performance strategies using this, like backpressure. If you were handling server requests, you would have a dropping buffer of a fixed size that started dropping requests at a certain point. Or if you were doing some heavy processing from a frequent DOM event, you could use a sliding buffer to only process the latest values as fast as possible.

Transducers — Transformation of Values

Channels are a form of streams, and as with anything stream-like, you will want to frequently transform the data as it comes through. Our examples were simple enough to avoid this, but you will want to use map on channels just as frequently as you use map on arrays.

js-csp comes with a bunch of builtin transformations which provide a powerful set of tools for managing channels. However, you'll notice that a lot of them are duplications of ordinary transformers for arrays (map, filter, etc).

Within the past month Clojure has actually solved this with something called transducers. Even better, while I was writing this post, another post about CSP and transducers in JS came out. His channel implementation is extremely primitive, but he mostly focuses on tranducers and it's a great walkthrough of how we can apply them to channels.

I ran out of time to fully research transducers and show off good examples here. Most likely I will posting more about js-csp, so expect to see more about that soon.

The Beginning of Something New

Fron now on I will always be using js-csp in my projects. I sincerely believe that this is a better way to express asynchronous communication and has wide impact on everything from server management to user interfaces. I hope that the JS community learns from it, and I will be posting more articles as I wrote more code with it.

I also ran out of time to explore using sweet.js macros to implement native syntax for this. Imagine if you could just use var v = <-ch to take from a channel, or something like it? I'm definitely going to do this soon, so expect another post. Oh the power!

js-csp itself is somewhat new so I wouldn't go and write a production app quite yet, but it will get there soon. I give my gratitude to ubolonton for the fantastic initial implementation. It's up to you whether to use my fork or his project, but we will hopefully merge them soon.

]]>
Blog Rebuild: Build Systems & Cross-Compiling 2014-08-14T00:00:00Z 2014-08-14T00:00:00Z http://jlongster.com/Blog-Rebuild--Build-Systems---Cross-Compiling This is an entry in a series about rebuilding my custom blog with react, CSP, and other modern tech. Read more in the blog rebuild series.

A few years ago I remember being surprised at how popular grunt was getting. Not because it wasn't great software, but because I didn't understand what problem it solved. If I needed to process a few things like CSS before deploying to production, make seemed to work just fine.

Back then I thought things like build steps for JavaScript were an unnecessary complexity. I couldn't have been more wrong. A build system adds some complexity, yes, but a good one like gulp or broccoli is simple enough, and the returns are enormous. A complex Makefile for a JavaScript project would be a mistake, but these build tools are great.

tl;dr I chose gulp as my build system and webpack as my client-side module bundler. My final setup is on github, specifically gulpfile.js and webpack.config.js.

A Practical Approach

]]>
This is an entry in a series about rebuilding my custom blog with react, CSP, and other modern tech. Read more in the blog rebuild series.

A few years ago I remember being surprised at how popular grunt was getting. Not because it wasn't great software, but because I didn't understand what problem it solved. If I needed to process a few things like CSS before deploying to production, make seemed to work just fine.

Back then I thought things like build steps for JavaScript were an unnecessary complexity. I couldn't have been more wrong. A build system adds some complexity, yes, but a good one like gulp or broccoli is simple enough, and the returns are enormous. A complex Makefile for a JavaScript project would be a mistake, but these build tools are great.

tl;dr I chose gulp as my build system and webpack as my client-side module bundler. My final setup is on github, specifically gulpfile.js and webpack.config.js.

A Practical Approach

I'm going to be as practical as possible during this rebuild. I'm going to investigate newer things like ES6 modules, but if the tools are too immature I will fallback to something like CommonJS. I want something that works now with little effort.

What I need:

  1. A common module format for the client and server. Node uses CommonJS, and currently browsers do not enforce modules.
  2. For client-side code, a way to compile modules to run in the browser.
  3. An extensible pipeline for hooking in compilation stages for both server and client JS. This lets me hook in various JS transformations that I need.
  4. A watcher that will automatically trigger the necessary compilations to get updates automatically (and only re-compile the necessary files)
  5. Ability to define a few basic build tasks for moving files around and running the app

There are lots of things involved in the above requirements: compilation strategies, module bundling, and build task management. I don't know yet which combination of projects will work out, so let's investigate various solutions.

The main drive for a compilation pipeline is to compile out ES6 features into ES5. I don't want to hook something big like Traceur in because there are projects that compile out specific features better. For example, I want to use regenerator to compile out generators and then defs to compile out let. I've always enjoyed this post about ClojureScript's compilation pipeline, and I'm reminded of it when I think of this strategy of incrementally compiling an AST. Ideally, we will pass an AST around, but we'll see if the tools are good enough for that yet.

Of course, I'm a big fan of sweet.js so that will be the first compilation phase. I may compile out some ES6 features with the es6-macros project, but the reality is that the JS community has written mature ES6 transformations in the form of compilers, so it might make sense just to use them. I will still use macros for user-land syntax extensions, which I'll talk more about in future posts.

The Core Problem

I think the core problem is that the client and server are very different beasts. Node requires CommonJS and modules separated out into individual files. Browsers don't have modules and it's desirable to bundle everything together into a single JS file to deploy. To make things harder, everything should be sourcemapped.

The first question to ask is how a build system can help. Since we want to work with modules, we need support for N:M files at each build step. That means that given N files, a build step can produce M files. For example, given 1 file, a module plugin will return 10 files (all the dependencies), and then the next step could bundle them all together into 1 file.

This is important for watching and incremental builds. If a dependency changes, even if it's not listed directly in the files to watch, the build system should recompile. Additionally, it should only recompile the necessary changes, so it should cache each dependency, even if it's not explicitly listed in the original sources.

The second question to ask is what tools are out there for working with modules. The build system is the backbone, but we need plugins for actually doing things with modules. How well the build system supports N:M files affects how much the module loaders need to do.

Lastly, there's one more desirable feature. There are several transformations I want to do to my code (like sweet.jsregeneratordefs). It would be far better to pass an AST through this process rather than passing strings. This means we probably don't want to hook up this whole pipeline through whatever build system we choose, but wrap it up into a single plugin.

Gulp + Webpack

Gulp is a build system built around streams. One thing I like is that it's very simple to use and define new tasks. (Note: I'm going to skip over grunt because its config syntax is really bad and I just don't like it.)

Gulp supports the N:M file builds in the form of stream events. A plugin can take a single file from a stream and output multiple files. If you add a caching layer with gulp-cache, and use the more advanced gulp-watch, you could effectively pass in one JS file and have it watch and rebuild all of its dependencies.

I'm not sure a lot of people understand that you can do this, which emits 2 files for every file that comes down the stream:

function explode() {
  return es.through(function(file) {
    this.emit('data', new gutil.File({
      base: file.base,
      cwd: file.cwd,
      path: path.join(file.base, 'foo.js'),
      contents: new Buffer('boo')
    }));

    this.emit('data', file);
  });
}

gulp.task("explode", function() {
  gulp.src('input/main.js')
    .pipe(explode())
    .pipe(gulp.dest('output'));
});

Not very many projects use this to help with module bundling, though. There is one project, amd-optimize, that does basic dependency tracing for AMD modules. Still, the more sophisticated gulp-watch is needed if you want to watch new files from the stream (you could apply it after explode()); it is not builtin. Generally, there is very little mature code that integrates a module bundler into gulp. You have to work at it. So this doesn't really solve our problem of compiling modules for client-side. Everyone just uses browserify or webpack.

Additionally, you really only care about your local dependencies, not ones pulled from npm. You don't need to run your code transformations on npm dependencies. So it's really easy to just give the native watch all of your modules by just doing gulp.src('src/**/*.js'). Because of this, and the fact that server-side code doesn't require module bundling, gulp works well for transforming server-side code. This code transforms each file from src and generates files in the build folder with sourcemaps.

function makeNodeStream(src, withoutSourcemaps) {
  var stream = src.pipe(cache('src'))
      .pipe(sourcemaps.init())
      .pipe(sweetjs({
        readableNames: true,
        modules: ['es6-macros']
      }))
      .pipe(regenerator())
      .pipe(jsheader('var wrapGenerator = require("regenerator/runtime/dev").wrapGenerator;'))
      .pipe(jsheader('require("source-map-support");'));

  if(!withoutSourcemaps) {
    stream = stream.pipe(sourcemaps.write('.'));
  }
  return stream;
}

gulp.task("src", function(cb) {
  es.merge(
    makeNodeStream(gulp.src('src/**/*.js'))
      .pipe(gulp.dest('build')),
    makeNodeStream(gulp.src('static/js/shared/**/*.js'))
      .pipe(gulp.dest('build/shared')),
    gulp.src(['src/**/*', '!src/**/*.js']).pipe(gulp.dest('build'))
  ).on('end', function() {
    nodemon.restart();
    cb();
  });
});

An additional complexity is that I have a shared folder that also needs to be transformed and output to a different directory. As for as I could tell, I couldn't combine that into a single gulp.src and gulp.dest, so I created makeNodeStream to run it on both. I also copy anything that's not a JS file from src to the build folder. Lastly, when it's finished it restarts the node process using nodemon.

My transformation pipeline here goes like this: sweet.js → regenerator → header append. I will likely add more steps in the future. This is passing around strings, which I talked about before, when we really should pass around ASTs. One thing I could do is use esnext instead and integrate sweet.js with it, and then do a single pipe to it. It would probably be much faster.

It takes about 2 seconds to compile my whole src directory, which is a bunch of code. But who cares? You don't need to recompile everything when just one file changes! Note that I use the cache('src') step first from gulp-cached; this will cache all files coming through the stream, and only re-emit files that have changed. That means we only transform new files, and it only takes a few hundred ms now.

Client-side

What about client-side code? As mentioned before, even though gulp could be used as a module bundler, nobody does that since mature projects like browserify and webpack exist. I chose to use webpack since I like the API and documentation better (and it has more features).

This basically requires me to use CommonJS modules for the browser. This route is well-established in the JS community so I benefit from mature tools. Eventually I'd like to use ES6 modules, but the ecosystem isn't quite there yet. I'm being conservative here so that I don't spend too much time on my tools.

Now that I'm using webpack, all of my problems for client-side development are solved. It has everything, from code splitting to hot module replacement. Here is my webpack config:

var config = {
  cache: true,
  entry: './static/js/main.js',
  output: {
    filename: './static/js/bundle.js'
  },
  resolve: {
    extensions: ['', '.js', '.sjs'],
    fallback: __dirname
  },
  module: {
    loaders: [
      {test: /\.js$/,
       exclude: [/static\/js\/lib\/.*\.js$/,
                 /node_modules\/.*/],
       loader: 'regenerator!sweetjs?modules[]=es6-macros'},
      {test: /\.less$/, loader: "style!css!less"},
      {test: /\.css$/, loader: "style!css"}
    ]
  }
};

Webpack is explicitly a module bundler, so all it needs is just one file and it will walk the dependencies. Everything will be bundled together into a single file bundle.js. This happens by default, so you can see why this doesn't work for server-side code where we just need a 1:1 file mapping.

This uses a loader on JS files to run them through sweet.js and regenerator. Again, I really should look into esnext so that I don't keep re-parsing the code.

It also uses some really cool loaders to deal with stylesheets. less-loader compiles out lesscss. css-loader is an awesome loader that converts all @import and url statements to require so that everything is resolved the same way, and lets you apply loaders on those resources being loaded, allowing things like inlining the url content straight into the stylesheet. Having everything go through the same mechanism (and able to pull from npm dependencies) is extremely liberating.

To top it all off, style-loader is a loader that automatically adds a style tag to the page when the css file is requireed. It also inlines all the CSS into your JavaScript bundle, but you can also make it reference an external CSS file. Either way, all you have to do is require('css/main.css') in your JavaScript and it just works.

There are a few other things I do with gulp and webpack, mostly to get integration with a few modules pulled down from npm (like React) working. I also have a run task that starts my app and uses nodemon to track it so it can be restarted whenever a change happens.

View my final setup on github.

Broccoli + ES6 modules

Broccoli is a rather new build tool that operates on tree structures, so it gets good incremental rebuilds and watches for free. See the annoucement blog post for more details.

I'm not sure if broccoli competes more with gulp or webpack. It sits somewhere in the middle. It doesn't have any concept of tasks, so I can't make a run task that restarts my server on changes. But it's also not nearly as specific as webpack, and doesn't dictate anything specific about modules or how things are bundled.

I think broccoli makes it a lot easier to write something like webpack, and that's the idea. Basically, in broccoli plugins are always passing around whole trees of files, and a plugin can easily expand a tree into a much bigger tree if needed. This makes it easy to expand dependencies but still leverage the build system to handle them. So watching for changes in dependencies works great, and incremental builds are really fast because it can easily figure out what to do. Webpack has to figure all of this stuff out itself.

I like the idea of broccoli, and because working with modules is easy people are doing a lot of great work to get a workflow for compiling ES6 modules. This plugin integrates es6-module-transpiler with broccoli and does all the dependency stuff.

The thing broccoli could solve for me is not only using ES6 modules, but also to unify the JS transformation between server-side and client-side. Using gulp and webpack, I have two completely separate processes.

This was my first Brocfile.js to see how it would work out:

var pickFiles = require('broccoli-static-compiler');
var sweetjs = require('broccoli-sweetjs');
var transpileES6 = require('broccoli-es6-module-transpiler');

var src = pickFiles('src', {
  srcDir: '/',
  destDir: '/'
});

src = sweetjs(src, {
  modules: ['es6-macros']
});

src = transpileES6(src, { type: 'cjs' });
module.exports = src;

Unfortunately, I immediately ran into a bug and it wouldn't compile my code. Somehow I was using an older version that didn't work with nested yields (I guess a newer version needs to be pushed to npm). These kinds of bugs can easily be fixed.

I also ran into a bigger issue though: that project does not have a good story for integration with npm dependencies yet (more discussion here). With webpack, I could require just require dependencies and it would look in node_modules, and it worked awesomely. I don't know why we can't do something similar with ES6 modules.

There was also another big issue in general with broccoli: sourcemaps. The sourcemap story for broccoli is very vague (es6-module-transpiler supports them just fine, but I don't know how to expand with sweet.js and pass it the result & sourcemaps and make it combine them). The standard project broccoli-filter which is supposed to be used by plugins that simply map files 1:1 states right in the README that is does not support sourcemaps. That is insane to me and I can't think about using broccoli until sourcemaps are deeply integrated through and through. Also see this discussion.

In gulp, it's really easy with the awesome gulp-sourcemaps project. You just hook into the stream and write sourcemaps to a directory:

src.pipe('src/**/*.js')
  .pipe(sourcemaps.init())
  .pipe(sweetjs())
  .pipe(regenerator())
  .pipe(sourcemaps.write('.'));

Plugins have a standard method of applying sourcemaps. The sourcemap is attached to the File instances that are passed through the stream, and combined using vinyl-sourcemaps-apply. It looks like this:

var applySourceMap = require('vinyl-sourcemaps-apply');
// ...
if(myGeneratedSourceMap) {
  applySourceMap(file, myGeneratedSourceMap);
}

That incrementally combines sourcemaps as they are applied through the streams. It has worked out really well for me.

Even without all these problems, the story in general for browser-side module bundling isn't nearly as strong as browserify or webpack, which have tons of features specific for browser modules. So until we get a solid build system that has plugins that implement most of those features of a module bundler, right now using gulp/broccoli + browserify/webpack works pretty darn well.

Most likely, I will switch my project to ES6 modules when can find a good cross-compiler that works well with CommonJS and my current build system.

I could use broccoli and webpack, but at this point I'm just going to stick with gulp. It's easy to use and works really well with server-side transformation and sourcemaps. As for broccoli, I understand the design and I like it, but it does make plugin development very complicated and I'm not entirely sold on it, especially when you can do N:M compilations with gulp. Lastly, it uses temporary files so gulp is potentially faster with streams.

Stream of Thought EOF

There are several other build systems out there and a million ways to combine them. I can't possibly cover all of them, but I hope this gave some insight into my process for researching. I have something that works well, and the only thing I'll improve in the future is using ES6 modules instead of CJS.

View the full repo to see all the glorious code. Specifically, check out the full gulpfile.js and webpack.config.js. What's neat about this set up is I can run webpack from the CLI like normal, but it's also defined as a task so gulp webpack will work and it can be used as a task dependency (for tasks like gulp all). I can switch between the systems easily.

I'm sure I have made some errors in this post, as it was mostly stream of thought as I was doing my research. If something is completely off, let me know.

]]>
Blog Rebuild: A Fresh Start 2014-07-30T00:00:00Z 2014-07-30T00:00:00Z http://jlongster.com/Blog-Rebuild--A-Fresh-Start About two years ago I wanted to start blogging more seriously, focusing on in-depth tech articles and tutorials. Since then I've successfully made several posts like the one about games and another about react.js.

I decided to write my own blog from scratch to provide a better blogging experience, and it has served me well. I didn't want something big and complicated to maintain like Wordpress, and I had used static generators before but in my opinion you sacrifice a lot, and there's too much friction for writing and updating posts.

Back then I wanted to learn more about node.js, redis, and a few other things. So I wrote a basic redis-backed node.js blogging engine. In a few months (working here and there), I had a site with all the basic blog pages, a markdown editor with live preview, autosaving, unpublished drafts, tags, and some basic layout options. Here is the current ugly editor:

Redis is an in-memory data store, and node handles multiple connections well by default, so my simple site scales really well. I've have posts reach #1 on hacker news with ~750 visitors at the same time for hours (reaching about 60,000 views) with no problem at all. It may also help that my linode instance has 8 cores and I load up 4 instances of node to serve the site.

]]>
About two years ago I wanted to start blogging more seriously, focusing on in-depth tech articles and tutorials. Since then I've successfully made several posts like the one about games and another about react.js.

I decided to write my own blog from scratch to provide a better blogging experience, and it has served me well. I didn't want something big and complicated to maintain like Wordpress, and I had used static generators before but in my opinion you sacrifice a lot, and there's too much friction for writing and updating posts.

Back then I wanted to learn more about node.js, redis, and a few other things. So I wrote a basic redis-backed node.js blogging engine. In a few months (working here and there), I had a site with all the basic blog pages, a markdown editor with live preview, autosaving, unpublished drafts, tags, and some basic layout options. Here is the current ugly editor:

Redis is an in-memory data store, and node handles multiple connections well by default, so my simple site scales really well. I've have posts reach #1 on hacker news with ~750 visitors at the same time for hours (reaching about 60,000 views) with no problem at all. It may also help that my linode instance has 8 cores and I load up 4 instances of node to serve the site.

You may wonder why I don't just use something like ghost, a modern blogging platform already written in node. I tried ghost for a while but it's early software, includes complex features like multiple users which I don't need, and most importantly it was too difficult to implement my ideas. This is the kind of thing where I really want my site to be my code; it's my area to play, my grand experiment. For me, it's been working out really well (check out all of my posts).

But the cracks are showing. The code is JavaScript as I wrote it 2 years ago: ugly callbacks, poor modularity, no tests, random jQuery blobs to make the frontend work, and more. The site is stable and writing blog posts works, but implementing new features is pretty much out of the question. Since this is my site and I can do whatever I want, I'm going to commit the cardinal sin and rewrite it from scratch.

I've learned a ton over the past two years, and I'm really excited to try out some new techniques. I have a lot of the infrastructure set up already, which uses the following software:

  • react — for building user interfaces seamlessly between client/server
  • react-router — advanced route handling for react components
  • js-csp — CSP-style channels for async communication
  • mori — persistent data structures
  • gulp — node build system
  • webpack — front-end module bundler
  • sweet.js — macros
  • es6-macros — several ES6 features as macros
  • regenerator — compile generators to ES5

Check out the new version of the site at new.jlongster.com. You can see my progress there (right now it's just a glimpse of the current site). I will put it up on github soon.

I thought it would also be interesting to blog throughout the development process. I'm using some really interesting libraries in ways that are very new, so I'm eager to dump my thoughts quite often. You can expect a post a week, explaining what I worked on and how I'm using a library in a certain way. It will touch on everything such as build systems and cross-compiling, testing, front-end structuring. Others might learn something new as well.

Next time, I'll talk about build systems and cross-compiling infrastructure. See you then!

]]>