<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" media="screen" href="/~d/styles/atom10full.xsl"?><?xml-stylesheet type="text/css" media="screen" href="http://feeds.feedburner.com/~d/styles/itemcontent.css"?><feed xmlns="http://www.w3.org/2005/Atom">
 
 <title>Draconian Overlord</title>
 
 <link href="http://draconianoverlord.com/" />
 <updated>2013-04-13T23:08:03-07:00</updated>
 <id>http://draconianoverlord.com/</id>
 <author>
   <name>Stephen Haberman</name>
   <email>stephen@exigencecorp.com</email>
 </author>

 
 <atom10:link xmlns:atom10="http://www.w3.org/2005/Atom" rel="self" type="application/atom+xml" href="http://feeds.feedburner.com/draconianoverlord" /><feedburner:info xmlns:feedburner="http://rssnamespace.org/feedburner/ext/1.0" uri="draconianoverlord" /><atom10:link xmlns:atom10="http://www.w3.org/2005/Atom" rel="hub" href="http://pubsubhubbub.appspot.com/" /><feedburner:emailServiceId xmlns:feedburner="http://rssnamespace.org/feedburner/ext/1.0">draconianoverlord</feedburner:emailServiceId><feedburner:feedburnerHostname xmlns:feedburner="http://rssnamespace.org/feedburner/ext/1.0">http://feedburner.google.com</feedburner:feedburnerHostname><entry>
   <title type="html">Services Should Come with Stubs</title>
   <link href="http://draconianoverlord.com/2013/04/13/services-should-come-with-stubs.html" />
   <updated>2013-04-13T00:00:00-07:00</updated>
   <id>http://draconianoverlord.com/2013/04/13/services-should-come-with-stubs</id>
   <content type="html">&lt;h1 id='services_should_come_with_stubs'&gt;Services Should Come with Stubs&lt;/h1&gt;

&lt;p&gt;At &lt;a href='http://www.bizo.com'&gt;Bizo&lt;/a&gt;, we do our fair share of service-oriented development, where instead of one big monolithic application, we have lots of small applications that talk to each other.&lt;/p&gt;

&lt;p&gt;The services might integrate with each other via JSON or Thrift or what not, almost always via HTTP, but whatever the underlying wire format/protocol is, it is always hidden from the client code behind a service interface.&lt;/p&gt;

&lt;p&gt;For example, let&amp;#8217;s imagine a very trivial data service that provides an interface:&lt;/p&gt;

&lt;pre class='brush:scala'&gt;&lt;code&gt;trait DataService {
  def saveData(id: String, data: Array[Byte])
  def getData(id: String): Array[Byte])
}&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Real services would of course have more interesting contracts, but this is good enough for illustration purposes.&lt;/p&gt;

&lt;p&gt;So, the &lt;code&gt;DataService&lt;/code&gt; codebase is going to ship a jar, say &lt;code&gt;data-service.jar&lt;/code&gt;, with both it&amp;#8217;s &lt;code&gt;DataService&lt;/code&gt; interface in it, as well as the implementation, say, &lt;code&gt;DataServiceJsonImpl&lt;/code&gt;:&lt;/p&gt;

&lt;pre class='brush:scala'&gt;&lt;code&gt;class DataServiceJsonImpl(server: String) extends DataService {
  override def saveData(id: String, data: Array[Byte]) = {
    // do JSON serialization
  }
  override def getData(id: String): Array[Byte]) = {
    // do JSON serialization
  }
}&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;So, this is all well and good; the downstream client codebase can program against the &lt;code&gt;DataService&lt;/code&gt; contract, and while testing use a fake, and in production use the real &lt;code&gt;DataServiceJsonImpl&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;Okay, so let&amp;#8217;s look at the fake implementation&amp;#8230;what should it look like? Per some &lt;a href='http://www.draconianoverlord.com/2010/07/09/why-i-dont-like-mocks.html'&gt;other posts&lt;/a&gt;, I generally prefer stubs, so we might write:&lt;/p&gt;

&lt;pre class='brush:scala'&gt;&lt;code&gt;class DataServiceStub extends DataService {
  private val data = Map[String, Array[Byte]]()
  override def saveData(id: String, data: Array[Byte]) = {
    data.put(id, data)
  }
  override def getData(id: String): Array[Byte]) = {
    data.get(id).getOrElse { sys.error(&amp;quot;Not found&amp;quot;) }
  }
}&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Great! We&amp;#8217;re done.&lt;/p&gt;

&lt;p&gt;So, the interesting thing is that the stub we wrote, &lt;code&gt;DataServiceStub&lt;/code&gt;, is actually fairly generic; that is nothing specific about our client code in it (as written anyway). And the &amp;#8220;fake&amp;#8221; semantics actually very accurately mimic the real semantics (of course, as otherwise the implementation of the &lt;code&gt;DataService&lt;/code&gt; contract would be nonsensical and our tests would be much less coherent).&lt;/p&gt;

&lt;p&gt;What we&amp;#8217;ve been doing recently is realizing that, instead of each client rewriting its own stub implementations of the same contract, the upstream service should just ship it&amp;#8217;s own stub &lt;code&gt;DataService&lt;/code&gt; implementation.&lt;/p&gt;

&lt;p&gt;There are a few interesting aspects of this:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;Obviously code reuse, as each downstream project can reuse the &lt;code&gt;DataServiceStub&lt;/code&gt;.&lt;/p&gt;
&lt;/li&gt;

&lt;li&gt;
&lt;p&gt;Since the stub implementation is shared, the pooled effort of maintaining just one fake implementation leads to a higher quality stub that covers more of the API (instead of only the parts that each particular project needed).&lt;/p&gt;
&lt;/li&gt;

&lt;li&gt;
&lt;p&gt;The upstream developers, who are implementing &lt;code&gt;DataServiceJsonImpl&lt;/code&gt; can also implement &lt;code&gt;DataServiceStub&lt;/code&gt;, which makes sense as they will be most familiar with the semantics of the &lt;code&gt;DataService&lt;/code&gt; contract.&lt;/p&gt;
&lt;/li&gt;

&lt;li&gt;
&lt;p&gt;Per a comment by &lt;a href='http://markdietz.wordpress.com/'&gt;Mark Dietz&lt;/a&gt;, we could envision an upstream service shipping the stub &lt;em&gt;before&lt;/em&gt; the actual implementation, allowing downstream projects to start integration sooner, and using the stub&amp;#8217;s fake-but-accurate semantics to flush out assumptions in both the upstream and downstream projects.&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;So, that&amp;#8217;s the idea; I&amp;#8217;d be delighted if I could pull a service-oriented project&amp;#8217;s jar from Maven central and have stubs included in the jar and ready to go.&lt;/p&gt;

&lt;p&gt;As it is today, we end up building our own stubs for a number of services&amp;#8211;basically any internal or external service/API we touch. Some are simple, others are more work. Either way our implementations (which we open source like &lt;a href='https://github.com/stephenh/fakesdb'&gt;fakesdb&lt;/a&gt; when possible) are rarely application-specific, so it a shame we can&amp;#8217;t reuse existing ones.&lt;/p&gt;

&lt;p&gt;Somewhat tangentially, I also think services coming with/sharing stubs would help tilt the scales towards stubs over mocks&amp;#8211;stubs obviously require some up-front investment, where as mocks are cheap right out of the box. But I think once you have more than a few tests, stubs start paying off. And so if sharing stubs could spread out this up-front cost, perhaps more projects would choose to test with stubs.&lt;/p&gt;

&lt;p&gt;So, that&amp;#8217;s what we&amp;#8217;ve been trying, and I think we&amp;#8217;ve liked it so far. I don&amp;#8217;t know that I&amp;#8217;ve really seen another projects do this before, which makes me think it&amp;#8217;s somewhat novel.&lt;/p&gt;

&lt;p&gt;Thinking about it more; I suppose most services that are exposed via HTTP (or what not) are meant to be used in a language agnostic manner; which usually means there are either multiple language-specific client binding implementations (e.g. a Ruby client implementation, a Java client implementation, etc.), all of varying quality, or no client bindings at all, and writing them is left as an exercise to the user.&lt;/p&gt;

&lt;p&gt;Given this, where even language-specific client bindings are hit and miss, perhaps it is not surprising that few/if any projects also take the time to write language-specific stub implementations (as, the nature of stubs is that they are in-memory and so use each language&amp;#8217;s given idioms and data structures).&lt;/p&gt;

&lt;p&gt;Bizo is a JVM shop, so it makes sense that, for us, writing a language-specific &lt;code&gt;DataServiceStub&lt;/code&gt; implementation in the upstream project is worth it because we know it will be consumed downstream by a Java/Scala project.&lt;/p&gt;

&lt;p&gt;So perhaps our situation is unique. But I would think that surely highly-used client bindings (like the Amazon &lt;a href='http://aws.amazon.com/sdkforjava/'&gt;aws-java-sdk&lt;/a&gt;) should have enough users to support a shared stub implementation.&lt;/p&gt;</content>
 </entry>
 
 <entry>
   <title type="html">East-Oriented Programming</title>
   <link href="http://draconianoverlord.com/2013/04/12/east-oriented-programming.html" />
   <updated>2013-04-12T00:00:00-07:00</updated>
   <id>http://draconianoverlord.com/2013/04/12/east-oriented-programming</id>
   <content type="html">&lt;h1 id='eastoriented_programming'&gt;East-Oriented Programming&lt;/h1&gt;

&lt;p&gt;In chatting with a coworker today about mocks vs. stubs (prompted by Fowler&amp;#8217;s &lt;a href='http://martinfowler.com/articles/mocksArentStubs.html'&gt;Mocks Aren&amp;#8217;t Stubs&lt;/a&gt;), I was reminded about the concept of &lt;a href='http://jamesladdcode.com/2011/06/12/more-east-2/'&gt;&amp;#8220;east&amp;#8221;-oriented programming&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;To understand &amp;#8220;east&amp;#8221;, think of a map, like north, south, east, west; the idea, as far as I understand, is that west-oriented is stateful (you call methods and work on the return values), while east-oriented is stateless (you pass lambdas or interface implementations).&lt;/p&gt;

&lt;p&gt;As an example, this is normal/stateful/return-oriented programming:&lt;/p&gt;

&lt;pre class='brush:scala'&gt;&lt;code&gt;class Customer {
  val name: String
  val description: String 
}

// print a customer
println(customer.name)
println(customer.description))&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Where as this is stateless/east-oriented programming:&lt;/p&gt;

&lt;pre class='brush:scala'&gt;&lt;code&gt;class Customer {
  private val name: String
  private val description: String 
  def printOn(writer: CustomerWriter) {
    writer.printName(name)
    writer.printDescription(description)
  }
}

trait CustomerWriter {
  def printName(name: String)
  def printDescription(name: Description)
}

// print a customer
customer.printOn(new CustomerWriter() {
  override def printName(name: String) {
    println(name)
  }

  override def printDescription(desc: String) {
    println(desc)
  }
});
// could have a suite of CustomerWriter implementations,
// e.g. System.out/JSON/etc., decorators, etc.&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;The topic of east-oriented programming came up because east-oriented programming is the one style where I think you could use a mocking library extensively and not have crappy tests.&lt;/p&gt;

&lt;p&gt;To explain, and as the &lt;a href='http://jamesladdcode.com/2011/06/12/more-east-2/'&gt;More East&lt;/a&gt; post notes, as soon as you start wanting your program to use return values, mocking starts to suck. Each individual test needs to start anticipating ahead of time (when setting up the mock &lt;code&gt;expects&lt;/code&gt;/etc. clauses) the needed return values.&lt;/p&gt;

&lt;p&gt;Taken to an extreme, each test starts re-implementing the stateful behavior of the collaborators, but via the mocking library&amp;#8217;s DSL, and things just get verbose and ugly. In my opinion, this is when stubs start looking awfully nice (you can implement the fake stateful behavior simply/directly in the language itself, and reuse it across all of your tests).&lt;/p&gt;

&lt;p&gt;But, if like east-orientation, your style avoids/limits return values, and uses more event-/interaction-style contracts between collaborators, then mocks aren&amp;#8217;t so bad.&lt;/p&gt;

&lt;p&gt;I believe east-oriented programming is also the style preached (perhaps not with that name) by the &lt;a href='http://www.growing-object-oriented-software.com/'&gt;Growing Object Oriented Software with Tests&lt;/a&gt; book, whose mock-heavy style I&amp;#8217;m not a huge fan of, but whose authors I really respect because they write good code and are the only other people in Java land these days that eschew DI containers.&lt;/p&gt;

&lt;p&gt;As far as my personal opinion of east-oriented programming, to me it looks like you&amp;#8217;re just moving the coupling around&amp;#8211;instead of the coupling being in return values, the coupling is in your interface contracts.&lt;/p&gt;

&lt;p&gt;Granted, east-orientation means you get to say nifty things like &amp;#8220;the internal implementation details of the object are protected&amp;#8221; (e.g. there are no getters), but is calling a getter really any different than being passed the same value as a method parameter? Especially on some object that suddenly starts looking like a visitor (translation, that is not a good thing)?&lt;/p&gt;

&lt;p&gt;Furthermore, in the above example, the &lt;code&gt;CustomerWriter&lt;/code&gt; API and is now dictating how your client code must look (it must be spread through the &lt;code&gt;printXxx&lt;/code&gt; methods) instead of being arranged however naturally fits your problem (where you could call getters on a whim).&lt;/p&gt;

&lt;p&gt;Thinking more about it, I think this &amp;#8220;replace getters with an &lt;code&gt;XxxWriter&lt;/code&gt;&amp;#8221; example probably is not doing east-orientation justice. If I end up re-reading GooS again, I will keep an eye out for examples I can past in here.&lt;/p&gt;

&lt;p&gt;Nonetheless, I&amp;#8217;m left thinking that east-orientation is great &lt;em&gt;if&lt;/em&gt; you really want to use a mocking library. Then it will make your tests suck less. But, in my opinion, stubs will also make your tests suck less, and not require absolving your code of return values.&lt;/p&gt;

&lt;p&gt;(One flaw of stubs if that you have to be careful to not start having unit tests rely on having &amp;#8220;the whole system but just with stubs&amp;#8221; to test their individual component, and instead keep things isolated. Mocks force you to do this because mocking out this much behavior would be impossible. With stubs, it is possible, but just takes discretion to not abuse it.)&lt;/p&gt;

&lt;p&gt;Anyway, I really enjoy reading topics like east-orientation, that try to find &amp;#8220;a better way&amp;#8221; of structuring systems. I&amp;#8217;m glad it&amp;#8217;s working out for some, and since it came up again, I&amp;#8217;m going to try and keep it in mind while I&amp;#8217;m writing code to see if I see any patterns emerge. But, for now, I don&amp;#8217;t think it&amp;#8217;s quite the game-changing idiom that I was thinking it might be when I originally read about it.&lt;/p&gt;</content>
 </entry>
 
 <entry>
   <title type="html">Google's Build System is a Giant Maven Repo</title>
   <link href="http://draconianoverlord.com/2013/03/23/google-giant-maven-repo.html" />
   <updated>2013-03-23T00:00:00-07:00</updated>
   <id>http://draconianoverlord.com/2013/03/23/google-giant-maven-repo</id>
   <content type="html">&lt;h1 id='googles_build_system_is_a_giant_maven_repo'&gt;Google&amp;#8217;s Build System is a Giant Maven Repo&lt;/h1&gt;

&lt;p&gt;At &lt;a href='http://www.bizo.com'&gt;Bizo&lt;/a&gt;, we occasionally mull over our build system and how it could be improved.&lt;/p&gt;

&lt;p&gt;Tangentially, right now our build is based on Ant and Ivy. Yes, yes, I was admittedly skeptical at first too, but our per-project &lt;code&gt;build.xml&lt;/code&gt; files are less code than most Maven &lt;code&gt;pom.xml&lt;/code&gt; files I&amp;#8217;ve seen. We use an internal fork of Spring&amp;#8217;s common-build project, and it is a surprisingly nice setup.&lt;/p&gt;

&lt;p&gt;Anyway, I occasionally think about Google&amp;#8217;s internal build system, and what ideas/approaches we could potentially reuse if we were to revamp our own build system. Their build system isn&amp;#8217;t open source, but every once in awhile a blog post goes by, e.g. this &lt;a href='http://google-engtools.blogspot.com/2011/08/build-in-cloud-how-build-system-works.html'&gt;Build in the Cloud&lt;/a&gt; post.&lt;/p&gt;

&lt;p&gt;These Google blog posts describe an interesting setup where:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;Conceptually, there is one large &lt;code&gt;trunk/&lt;/code&gt; directory that everyone checks out and builds &lt;em&gt;all&lt;/em&gt; of their project&amp;#8217;s upstream dependencies locally.&lt;/p&gt;
&lt;/li&gt;

&lt;li&gt;
&lt;p&gt;In reality, they cache builds (based on hashes of the input files/compile settings) across machines so you only build locally what has changed.&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;At first I thought this was a huge breakthrough (always build from trunk!), and while I still think it is, what&amp;#8217;s interesting is that you can conceptually model it like a huge Maven repository.&lt;/p&gt;

&lt;p&gt;Look what happens in Maven:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;You check out &lt;code&gt;trunk/&lt;/code&gt; and download dependencies where &lt;code&gt;org=foo.com&lt;/code&gt;, &lt;code&gt;module=blah&lt;/code&gt;, and &lt;code&gt;version=X.Y&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This is basically the same thing as in Google, except:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;You check out &lt;code&gt;trunk/&lt;/code&gt; and download dependencies where &lt;code&gt;org=google&lt;/code&gt;, &lt;code&gt;module=path/to/BUILD&lt;/code&gt;, and &lt;code&gt;version=(hash of file inputs)&lt;/code&gt;.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Google has basically automated the process of keeping &lt;code&gt;version&lt;/code&gt; up to date, where each time you &lt;code&gt;svn up&lt;/code&gt;, you&amp;#8217;re updating the &lt;code&gt;version&lt;/code&gt;/input hash of your upstream dependencies, and so will now download (or build) the new version.&lt;/p&gt;

&lt;p&gt;(I&amp;#8217;m sure there are a lot of intricacies here that I&amp;#8217;m missing, e.g. how their FUSE file system only downloads the files you actually need, etc., but I believe this is conceptually correct.)&lt;/p&gt;

&lt;p&gt;What&amp;#8217;s interesting is that Maven&amp;#8217;s &lt;code&gt;SNAPSHOT&lt;/code&gt; and Ivy&amp;#8217;s &lt;code&gt;latest.integration&lt;/code&gt; (which we use) basically try to do the same thing (&amp;#8220;always give me the latest upstream dependency&amp;#8221;), but have to rely on expensive network pings to check &amp;#8220;what&amp;#8217;s the latest version?&amp;#8221; each time you build (or they only check every N minutes, which means you risk stale results).&lt;/p&gt;

&lt;p&gt;(Of course, the Maven/Ivy approach is very understandable, given they don&amp;#8217;t have the benefit of a canonical, company-wise source repo.)&lt;/p&gt;

&lt;p&gt;And, granted, as a whole, the Google build system is very different from Maven proper (cross-technology, can locally build anything missing, other things I don&amp;#8217;t have a clue about), but I was just surprised that the &amp;#8220;repository in the cloud&amp;#8221; aspect is not as different as I had originally thought.&lt;/p&gt;

&lt;p&gt;The main innovation of their repository approach is leveraging their unique position of having all source code locally to use hashes for identifying artifacts.&lt;/p&gt;</content>
 </entry>
 
 <entry>
   <title type="html">Stay in the Language</title>
   <link href="http://draconianoverlord.com/2013/03/08/stay-in-the-language.html" />
   <updated>2013-03-08T00:00:00-08:00</updated>
   <id>http://draconianoverlord.com/2013/03/08/stay-in-the-language</id>
   <content type="html">&lt;h1 id='stay_in_the_language'&gt;Stay in the Language&lt;/h1&gt;

&lt;p&gt;I made a &amp;#8220;gumpy old man&amp;#8221; comment, on a G+ post from &lt;a href='https://plus.google.com/u/0/109101057654691472275/posts/97b5HKdkMkS'&gt;Mike Brock&lt;/a&gt; this morning.&lt;/p&gt;

&lt;p&gt;The topic was some new IntelliJ support for &lt;a href='http://www.jboss.org/errai'&gt;Errai&lt;/a&gt;, JBoss&amp;#8217;s GWT framework. The new IDE support allows cool stuff like auto-completion in templates based on the user&amp;#8217;s annotated view classes.&lt;/p&gt;

&lt;p&gt;It does this because these annotations/glue code are filled in later at GWT compile time, so it&amp;#8217;s very useful to provide the user with instant/up-front feedback that something will break later on. Which is a great goal. Faster feedback cycles.&lt;/p&gt;

&lt;p&gt;Here&amp;#8217;s my comment:&lt;/p&gt;
&lt;hr /&gt;
&lt;p&gt;&amp;#8220;This reminds me of Groovy, where the language was dynamic, but, contrary to the delusions of the Groovy committers, in reality the majority of Groovy users really did want (in 90% of their code) static type checking/intellisense/etc.&lt;/p&gt;

&lt;p&gt;So, since it couldn&amp;#8217;t be in the language, they built static checking into the IDEs.&lt;/p&gt;

&lt;p&gt;IntelliJ is really awesome at this game, of finding languages and frameworks that are dynamic enough that users love the static tooling they can bring to the table.&lt;/p&gt;

&lt;p&gt;&amp;#8230;but at the end of the day, it seems like all of this inference/etc. should be in the language proper, instead of glued on the side and rewritten by each IDE (IntelliJ/Eclipse/Netbeans).&lt;/p&gt;

&lt;p&gt;Granted, templates are a different beast, and I&amp;#8217;m sure this is a boon to Errai users, so that&amp;#8217;s awesome. The IntelliJ guys are amazing at what they do.&lt;/p&gt;

&lt;p&gt;But, personally, I just like approaches to templates, and frameworks in general, that, as soon as possible, handle control back over to language proper, so that the default tooling/language semantics can take over.&amp;#8221;&lt;/p&gt;
&lt;hr /&gt;
&lt;p&gt;This ties in a lot with my feelings of code generation&amp;#8211;to me, code generation &amp;#8220;done well&amp;#8221;, scans external artifacts (templates, databases, schemas, etc.) and turns them into representations that the user-code can program directly against.&lt;/p&gt;

&lt;p&gt;This is a great approach, because as soon as the user-code has an in-language representation (some generated interface/class), it&amp;#8217;s &amp;#8220;business as normal&amp;#8221;, and all of the existing tools (auto-complete, type checking, etc.), &amp;#8220;just work&amp;#8221;.&lt;/p&gt;

&lt;p&gt;In contrast, code generation approaches that are &amp;#8220;post-compile&amp;#8221; (e.g. GWT&amp;#8217;s deferred binding, which runs after the user&amp;#8217;s code is compiled, or most annotation/CDI/runtime bytecode-generated based approaches) do not create in-language representations, or at least ones that users can program directly against.&lt;/p&gt;

&lt;p&gt;The result is that built-in tooling doesn&amp;#8217;t work, so you need to rely on IDEs plugins that re-implement/re-use 90% of the code generation logic, but this time only to provide extra hints that tell the user &amp;#8220;oh right, when code generation eventually runs, this is going to fail&amp;#8221;.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;In my opinion, if we run the code generation up front, a framework should get all of that error checking for free.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;This opinion is the basis for a lot of my open source projects:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href='http://joist.ws'&gt;Joist&lt;/a&gt; (an ORM that does all code generation up-front off the schema),&lt;/li&gt;

&lt;li&gt;&lt;a href='http://www.tessell.org'&gt;Tessell.org&lt;/a&gt; (GWT framework that, among other things, parses templates up-front), and&lt;/li&gt;

&lt;li&gt;&lt;a href='http://www.dtonator.org'&gt;dtonator&lt;/a&gt; (DTO generator that parses a YAML file up-front, plus some reflection, to pre-generate DTOs + mapping code).&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;I won&amp;#8217;t say these any of these projects are perfect, but in my experience the approach has worked well.&lt;/p&gt;

&lt;p&gt;I am biased, and of course intimately familiar with these projects, but the approach seems both simpler to implement and simpler to debug. Because if/when things go wrong, you have visible output on your hard drive to &amp;#8220;control-click&amp;#8221; through, instead of &amp;#8220;crazy stuff happened at runtime&amp;#8221;.&lt;/p&gt;

&lt;p&gt;Of course, YMMV, there are multiple ways to skin a cat, etc.&lt;/p&gt;</content>
 </entry>
 
 <entry>
   <title type="html">What Makes Spark Exciting</title>
   <link href="http://draconianoverlord.com/2013/01/22/what-makes-spark-exciting.html" />
   <updated>2013-01-22T00:00:00-08:00</updated>
   <id>http://draconianoverlord.com/2013/01/22/what-makes-spark-exciting</id>
   <content type="html">&lt;h1 id='what_makes_spark_exciting'&gt;What Makes Spark Exciting&lt;/h1&gt;

&lt;p&gt;(&lt;a href='http://dev.bizo.com/2013/01/what-makes-spark-exciting.html'&gt;Cross-posted&lt;/a&gt; on the Bizo &lt;a href='http://dev.bizo.com/'&gt;dev blog&lt;/a&gt;.)&lt;/p&gt;

&lt;p&gt;At &lt;a href='http://www.bizo.com'&gt;Bizo&lt;/a&gt;, we&amp;#8217;re currently evaluating/prototyping &lt;a href='http://www.spark-project.org'&gt;Spark&lt;/a&gt; as a replacement for &lt;a href='http://hive.apache.org/'&gt;Hive&lt;/a&gt; for our batch reports.&lt;/p&gt;

&lt;p&gt;As a brief intro, Spark is an alternative to Hadoop. It provides a cluster computing framework for running distributed jobs. Similar to Hadoop, you provide Spark with jobs to run, and it handles splitting up the job into small tasks, assigning those tasks to machines (optionally with Hadoop-style data locality), issuing retries if tasks fail transiently, etc.&lt;/p&gt;

&lt;p&gt;In our case, these jobs are processing a non-trivial amount of data (log files) on a regular basis, for which we currently use Hive.&lt;/p&gt;

&lt;h2 id='why_replace_hive'&gt;Why Replace Hive?&lt;/h2&gt;

&lt;p&gt;Admittedly, Hive has served us well for quite awhile now. (One of our engineers even built a custom &amp;#8220;Hadoop on demand&amp;#8221; framework for running periodic on-demand Hadoop/Hive jobs in EC2 several months before &lt;a href='http://aws.amazon.com/elasticmapreduce/'&gt;Amazon Elastic Map Reduce&lt;/a&gt; came out.)&lt;/p&gt;

&lt;p&gt;Without Hive, it would have been hard for us to provide the same functionality, probably at all, let alone in the same time frame.&lt;/p&gt;

&lt;p&gt;That said, it has gotten to the point where Hive is more frequently invoked in negative contexts (&amp;#8220;damn it, Hive&amp;#8221;) than positive.&lt;/p&gt;

&lt;p&gt;Personally, I admittedly even try to avoid tasks that involve working with Hive. I find it to be frustrating and, well, just not a lot of fun. Why? Two primary reasons:&lt;/p&gt;

&lt;h4 id='1_hive_jobs_are_hard_to_test'&gt;1. Hive jobs are hard to test&lt;/h4&gt;

&lt;p&gt;Bizo has a culture of excellence, and for engineering one of the things this means is testing. We really like tests. Especially unit tests, which are quick to run and enable a fast TDD cycle.&lt;/p&gt;

&lt;p&gt;Unfortunately, Hive makes unit testing basically impossible. For several reasons:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;Hive scripts must be run in a local Hadoop/Hive installation.&lt;/p&gt;

&lt;p&gt;Ironically, very few developers at Bizo have local Hadoop installations. We are admittedly spoiled by Elastic Map Reduce, such that most of us (myself anyway) wouldn&amp;#8217;t even know how to setup Hadoop off the top of our heads. We just fire up an EMR cluster.&lt;/p&gt;
&lt;/li&gt;

&lt;li&gt;
&lt;p&gt;Hive scripts have production locations embedded in them.&lt;/p&gt;

&lt;p&gt;Both our log files and report output are stored in S3, so our Hive scripts end up with lots of &amp;#8220;s3://&amp;#8221; paths scattered throughout in them.&lt;/p&gt;

&lt;p&gt;While we do run dev versions of reports with &amp;#8220;-dev&amp;#8221; S3 buckets, still relying on S3 and raw log files (that are usually in a compressed/binary-ish format) is not conducive to setting up lots of really small, simplified scenarios to unit test each boundary case.&lt;/p&gt;
&lt;/li&gt;

&lt;li&gt;
&lt;p&gt;Hive scripts do not provide any abstraction&amp;#8211;they are just one big HiveQL file. This means its hard to break up a large report into small, individually testable steps.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Despite these limitations, about a year ago we had a developer dedicate some effort to prototyping an approach that would run Hive scripts within our CI workflow. In the end, while his prototype worked, the workflow was wonky enough that we never adopted it for production projects.&lt;/p&gt;

&lt;p&gt;The result? Our Hive reports are basically untested. This sucks.&lt;/p&gt;

&lt;h4 id='2_hive_is_hard_to_extend'&gt;2. Hive is hard to extend&lt;/h4&gt;

&lt;p&gt;Extending Hive via custom functions (UDFs and UDAFs) is possible, and we do it all the time&amp;#8211;but it&amp;#8217;s a pain in the ass.&lt;/p&gt;

&lt;p&gt;Perhaps this is not Hive&amp;#8217;s fault, and it&amp;#8217;s some Hadoop internals leaking into Hive, but the various &lt;a href='http://hive.apache.org/docs/r0.5.0/api/org/apache/hadoop/hive/serde2/objectinspector/ObjectInspector.html'&gt;ObjectInspector&lt;/a&gt; hoops, to me, always seemed annoying to deal with.&lt;/p&gt;

&lt;p&gt;Given these shortcomings, Bizo has been looking for a Hive-successor for awhile, even going so far as to prototype &lt;a href='https://github.com/aboisvert/revolute'&gt;revolute&lt;/a&gt;, a Scala DSL on top of &lt;a href='http://www.cascading.org/'&gt;Cascading&lt;/a&gt;, but had not yet found something we were really excited about.&lt;/p&gt;

&lt;h2 id='enter_spark'&gt;Enter Spark!&lt;/h2&gt;

&lt;p&gt;We had heard about Spark, but did not start trying it until being so impressed by the Spark presentation at AWS re:Invent (the talk received &lt;a href='https://amplab.cs.berkeley.edu/news/sparkshark-a-big-hit-at-aws-reinvent/'&gt;the highest rating of all non-keynote sessions&lt;/a&gt;) that we wanted to learn more.&lt;/p&gt;

&lt;p&gt;One of Spark&amp;#8217;s touted strengths is being able to load and keep data in memory, so your queries aren&amp;#8217;t always I/O bound.&lt;/p&gt;

&lt;p&gt;That is great, but the exciting aspect for us at Bizo is how Spark, either intentionally or serendipitously, addresses both of Hive&amp;#8217;s primary shortcomings, and turns them into huge strengths. Specifically:&lt;/p&gt;

&lt;h4 id='1_spark_jobs_are_amazingly_easy_to_test'&gt;1. Spark jobs are amazingly easy to test&lt;/h4&gt;

&lt;p&gt;Writing a test in Spark is as easy as:&lt;/p&gt;

&lt;pre class='brush:scala'&gt;&lt;code&gt;class SparkTest {
  @Test
  def test() {
    // this is real code...
    val sc = new SparkContext(&amp;quot;local&amp;quot;, &amp;quot;MyUnitTest&amp;#39;)
    // and now some psuedo code...
    val output = runYourCodeThatUsesSpark(sc)
    assertAgainst(output)
  }
}&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;(I will go into more detail about &lt;code&gt;runYourCodeThatUsesSpark&lt;/code&gt; in a future post.)&lt;/p&gt;

&lt;p&gt;This one liner starts up a new &lt;a href='http://spark-project.org/docs/latest/api/core/index.html#spark.SparkContext'&gt;SparkContext&lt;/a&gt;, which is all your program needs to execute Spark jobs. There is no local installation required (just have the Spark jar on your classpath, e.g. via Maven or Ivy), no local server to start/stop. It just works.&lt;/p&gt;

&lt;p&gt;As a technical aside, this &amp;#8220;local&amp;#8221; mode starts up an in-process Spark instance, backed by a thread-pool, and actually opens up a few ports and temp directories, because it&amp;#8217;s a real, live Spark instance.&lt;/p&gt;

&lt;p&gt;Granted, this is usually more work than you want to be done in an unit test (which ideally would not hit any file or network I/O), but the redeeming quality is that it&amp;#8217;s &lt;em&gt;fast&lt;/em&gt;. Tests run in ~2 seconds.&lt;/p&gt;

&lt;p&gt;Okay, yes, this is slow compared to pure, traditional unit tests, but is such a huge revolution compared to Hive that we&amp;#8217;ll gladly take it.&lt;/p&gt;

&lt;h4 id='2_spark_is_easy_to_extend'&gt;2. Spark is easy to extend&lt;/h4&gt;

&lt;p&gt;Spark&amp;#8217;s primary API is a Scala DSL, oriented around what they call an &lt;a href='http://www.spark-project.org/docs/0.6.0/api/core/#spark.RDD'&gt;&lt;code&gt;RDD&lt;/code&gt;&lt;/a&gt;, or Resilient Distributed Dataset, which is basically a collection that only supports bulk/aggregate transforms (so methods like &lt;code&gt;map&lt;/code&gt;, &lt;code&gt;filter&lt;/code&gt;, and &lt;code&gt;groupBy&lt;/code&gt;, which can be seen as transforming the entire collection, but no methods like &lt;code&gt;get&lt;/code&gt; or &lt;code&gt;take&lt;/code&gt; which assume in-memory/random access).&lt;/p&gt;

&lt;p&gt;Some really short, made up example code is:&lt;/p&gt;

&lt;pre class='brush:scala'&gt;&lt;code&gt;// RDD[String] is like a collection of lines
val in: RDD[String] = sc.textFile(&amp;quot;s3://bucket/path/&amp;quot;)
// perform some operation on each line
val suffixed = in.map { line =&amp;gt; line + &amp;quot;some suffix&amp;quot; }
// now save the new lines back out
suffixed.saveAsTextFile(&amp;quot;s3://bucket/path2&amp;quot;)&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Spark&amp;#8217;s job is to package up your &lt;code&gt;map&lt;/code&gt; closure, and run it against that extra large text file across your cluster. And it does so by, after shuffling the code and data around, &lt;em&gt;actually calling your closure&lt;/em&gt; (i.e. there is no &lt;a href='http://msdn.microsoft.com/en-us/library/vstudio/bb397926.aspx'&gt;LINQ&lt;/a&gt;-like introspection of the closure&amp;#8217;s AST).&lt;/p&gt;

&lt;p&gt;This may seem minor, but it&amp;#8217;s huge, because it means there is no framework code or APIs standing between your running closure and any custom functions you&amp;#8217;d want to run. Let&amp;#8217;s say you want to use &lt;code&gt;SomeUtilityClass&lt;/code&gt; (or the venerable &lt;a href='http://commons.apache.org/lang/api-2.5/org/apache/commons/lang/StringUtils.html'&gt;&lt;code&gt;StringUtils&lt;/code&gt;&lt;/a&gt;), just do:&lt;/p&gt;

&lt;pre class='brush:scala'&gt;&lt;code&gt;import com.company.SomeUtilityClass
val in: RDD[String] = sc.textFile(&amp;quot;s3://bucket/path/&amp;quot;)
val processed = in.map { line =&amp;gt;
  // just call it, it&amp;#39;s a normal method call
  SomeUtilityClass.process(line) 
}
processed.saveAsTextFile(&amp;quot;s3://bucket/path2&amp;quot;)&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Notice how &lt;code&gt;SomeUtilityClass&lt;/code&gt; doesn&amp;#8217;t have to know it&amp;#8217;s running within a Spark RDD in the cluster. It just takes a String. Done.&lt;/p&gt;

&lt;p&gt;Similarly, Spark doesn&amp;#8217;t need to know anything about the code you use witin the closure, it just needs to be available on the classpath of each machine in the cluster (which is easy to do as part of your cluster/job setup, you just copy some jars around).&lt;/p&gt;

&lt;p&gt;This seamless hop between the RDD and custom Java/Scala code is very nice, and means your Spark jobs end up reading just like regular, normal Scala code (which to us is a good thing!).&lt;/p&gt;

&lt;h2 id='is_spark_perfect'&gt;Is Spark Perfect?&lt;/h2&gt;

&lt;p&gt;As full disclosure, we&amp;#8217;re still in the early stages of testing Spark, so we can&amp;#8217;t yet say whether Spark will be a wholesale replacement for Hive within Bizo. We haven&amp;#8217;t gotten to any serious performance comparisons or written large, complex reports to see if Spark can take whatever we throw at it.&lt;/p&gt;

&lt;p&gt;Personally, I am also admittedly somewhat infutuated with Spark at this point, so that could be clouding my judgement about the pros/cons and the tradeoffs with Hive.&lt;/p&gt;

&lt;p&gt;One Spark con so far is that Spark is pre-1.0, and it can show. I&amp;#8217;ve seen some stack traces that shouldn&amp;#8217;t happen, and some usability warts, that hopefully will be cleared up by 1.0. (That said, even as a newbie I find the codebase small and very easy to read, such that I&amp;#8217;ve had &lt;a href='https://github.com/mesos/spark/pull/352'&gt;several&lt;/a&gt; &lt;a href='https://github.com/mesos/spark/pull/351'&gt;small&lt;/a&gt; &lt;a href='https://github.com/mesos/spark/pull/362'&gt;pull requests&lt;/a&gt; accepted already&amp;#8211;which is a nice consolation compared to the daunting codebases of Hadoop and Hive.)&lt;/p&gt;

&lt;p&gt;We have also seen that, for our first Spark job, moving from &amp;#8220;Spark job written&amp;#8221; to &amp;#8220;Spark job running in production&amp;#8221; is taking longer than expected. But given that Spark is a new tool to us, we expect this to be a one-time cost.&lt;/p&gt;

&lt;h2 id='more_to_come'&gt;More to Come&lt;/h2&gt;

&lt;p&gt;I have a few more posts coming up which explain our approach to Spark in more detail, for example:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Testing best practices&lt;/li&gt;

&lt;li&gt;Running Spark in EMR&lt;/li&gt;

&lt;li&gt;Accessing partitioned S3 logs&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;To see those when they come out, make sure to subscribe to the blog, or, better yet, &lt;a href='http://bizo.theresumator.com/'&gt;come work at Bizo&lt;/a&gt; and help us out!&lt;/p&gt;</content>
 </entry>
 
 <entry>
   <title type="html">DHH vs. Fowler</title>
   <link href="http://draconianoverlord.com/2013/01/07/dhh-vs-fowler.html" />
   <updated>2013-01-07T00:00:00-08:00</updated>
   <id>http://draconianoverlord.com/2013/01/07/dhh-vs-fowler</id>
   <content type="html">&lt;h1 id='dhh_vs_fowler'&gt;DHH vs. Fowler&lt;/h1&gt;

&lt;p&gt;Just a quick note on a topic going around recently; David Heinemeier Hansson first asserted &lt;a href='http://david.heinemeierhansson.com/2012/dependency-injection-is-not-a-virtue.html'&gt;Dependency injection is not a virtue&lt;/a&gt;, calling into question a practice that is in some circles sacrosanct.&lt;/p&gt;

&lt;p&gt;Marcel Weiher responded that, no, &lt;a href='http://blog.metaobject.com/2013/01/dependency-injection-is-virtue.html'&gt;Dependency Injection is a Virtue&lt;/a&gt;. And HN commenter smoyer &lt;a href='http://news.ycombinator.com/item?id=5020525'&gt;commented&lt;/a&gt; &amp;#8221;I&amp;#8217;ll choose MF (Martin Fowler) over DHH anytime both have an opinion&amp;#8221;.&lt;/p&gt;

&lt;p&gt;So, the discussion has devolved to &amp;#8220;DI sucks! Nu-huh! Uh-huh!&amp;#8221;&lt;/p&gt;

&lt;p&gt;What I believe is missing in the conversation is context. From my perspective, DHH/Rails and Fowler address two very separate types of systems.&lt;/p&gt;

&lt;p&gt;DHH and Rails are all about pragmatic choices for fairly simple, maybe a little complex webapps built by 1-5-ish programmers (I&amp;#8217;m sure there are large Rails projects, but I think they&amp;#8217;re a minority).&lt;/p&gt;

&lt;p&gt;The whole point of Rails is that it makes choices for you, guided by the assumption you&amp;#8217;re making &amp;#8220;just yet another webapp&amp;#8221;. One of these choices, IMO, is playing a little fast and loose with sacred practices like decoupling (which is fine, &lt;em&gt;all decoupling has a cost&lt;/em&gt;, which most people don&amp;#8217;t realize).&lt;/p&gt;

&lt;p&gt;DHH understands Rail&amp;#8217;s (admittedly very large) niche:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&amp;#8220;There’s nothing wrong or shameful with nailing a single use case, like VB did for Windows desktop or PHP for web scripts. It’s beautiful!&amp;#8221; (&lt;a href='https://twitter.com/dhh/status/284952366317461504'&gt;on twitter&lt;/a&gt;)&lt;/li&gt;

&lt;li&gt;&amp;#8220;Rails is omakase&amp;#8221; (&lt;a href='http://david.heinemeierhansson.com/2012/rails-is-omakase.html'&gt;blog post&lt;/a&gt;)&lt;/li&gt;

&lt;li&gt;&amp;#8220;This nonsense (using transaction scripts) is what happens when you actually start believing that “Rails should be a detail” in your Rails app&amp;#8221; (&lt;a href='https://twitter.com/dhh/status/282965246547750912'&gt;on twitter&lt;/a&gt;)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;DHH&amp;#8217;s opinions, and Rails&amp;#8217;s approach here, are perfectly fine&amp;#8211;they&amp;#8217;re directly responsible for the huge productivity wins that comes from using Rails for your &amp;#8220;just yet another webapp&amp;#8221;. If your app fits that sweet spot, it&amp;#8217;s great.&lt;/p&gt;

&lt;p&gt;Fowler, on the other hand, is more old-school/enterprise, where applications are not &amp;#8220;just a webapp&amp;#8221; (but probably have a webapp in them), and are built by teams an order of magnitude larger (10-20 programmers) than most Rails projects.&lt;/p&gt;

&lt;p&gt;Fowler&amp;#8217;s realm is where decoupling, dependency injection, etc., are held sacred, and more rightly so (even if I do think &lt;a href='http://draconianoverlord.com/2011/03/17/frameworkless-di.html'&gt;DI frameworks are too complex&lt;/a&gt;, I appreciate the problem of global state). This is where having global &lt;code&gt;Time&lt;/code&gt; variables starts to suck, because there are so many a person can&amp;#8217;t keep them in their head.&lt;/p&gt;

&lt;p&gt;(And I do not think it&amp;#8217;s a language issue; Java can and is used wrong, to build systems with overly complex abstractions that cost more than they are worth, but it can have a singleton &lt;code&gt;Time.now&lt;/code&gt; that defers to a stub-able implementation without DI frameworks or reams of XML.)&lt;/p&gt;

&lt;p&gt;So, I don&amp;#8217;t find it very surprising that both camps have different tools and tastes.&lt;/p&gt;

&lt;p&gt;It seems like DHH gets pissed off when people try to apply Fowler-/enterprise-patterns to Rails, which is fine, and I understand why&amp;#8211;but by the same token, I think he should understand why people get pissed off at him for trashing (without context, i.e. he portrays them as fundamentally flawed) the patterns they&amp;#8217;ve found to work well in their own contexts.&lt;/p&gt;</content>
 </entry>
 
 <entry>
   <title type="html">Spark Test</title>
   <link href="http://draconianoverlord.com/2012/12/29/spark-test.html" />
   <updated>2012-12-29T00:00:00-08:00</updated>
   <id>http://draconianoverlord.com/2012/12/29/spark-test</id>
   <content type="html">&lt;h1 id='spark_test'&gt;Spark Test&lt;/h1&gt;

&lt;p&gt;At &lt;a href='http://www.bizo.com'&gt;Bizo&lt;/a&gt; we&amp;#8217;re currently evaluating &lt;a href='http://www.spark-project.org'&gt;Spark&lt;/a&gt; as a replacement for Hive for reporting.&lt;/p&gt;

&lt;p&gt;So far we&amp;#8217;re still in the prototype stage, but it&amp;#8217;s looking promising.&lt;/p&gt;

&lt;p&gt;As an example of some Spark code, I thought I&amp;#8217;d copy/paste one of my &amp;#8220;let&amp;#8217;s try Spark&amp;#8221; unit tests:&lt;/p&gt;

&lt;pre class='brush:scala'&gt;&lt;code&gt;object SparkTest {
  val sc = new SparkContext(&amp;quot;local&amp;quot;, &amp;quot;unit test&amp;quot;)
}

class SparkTest extends ShouldMatchers {
  import SparkTest._
  import SparkContext._

  @Test
  def test() {
    // make collection (table) a, with 3 rows, each
    // row is a tuple of (key, value), or (Int, String)
    val a = sc.parallelize(List((1, &amp;quot;a&amp;quot;), (1, &amp;quot;b&amp;quot;), (2, &amp;quot;c&amp;quot;)))

    // make collection (table) b, also with 3 rows
    val b = sc.parallelize(List((1, 5.00), (1, 6.00), (3, 7.00)))

    // typical join, all that match in both a and b
    a.join(b).collect() should be === Array(
      (1, (&amp;quot;a&amp;quot;, 5.00)),
      (1, (&amp;quot;a&amp;quot;, 6.00)),
      (1, (&amp;quot;b&amp;quot;, 5.00)),
      (1, (&amp;quot;b&amp;quot;, 6.00)))

    // typical left join, include all from a
    a.leftOuterJoin(b).collect() should be === Array(
      (1, (&amp;quot;a&amp;quot;, Some(5.00))),
      (1, (&amp;quot;a&amp;quot;, Some(6.00))),
      (1, (&amp;quot;b&amp;quot;, Some(5.00))),
      (1, (&amp;quot;b&amp;quot;, Some(6.00))),
      (2, (&amp;quot;c&amp;quot;, None)))

    // typical right join, include all from b
    a.rightOuterJoin(b).collect() should be === Array(
      (3, (None, 7.00)),
      (1, (Some(&amp;quot;a&amp;quot;), 5.00)),
      (1, (Some(&amp;quot;a&amp;quot;), 6.00)),
      (1, (Some(&amp;quot;b&amp;quot;), 5.00)),
      (1, (Some(&amp;quot;b&amp;quot;), 6.00)))

    // cogroup, which is the primitive, and returns each
    // key with the key&amp;#39;s elements from both a and b
    a.cogroup(b).collect() should be === Array(
      (3, (Seq(), Seq(7.00))),
      (1, (Seq(&amp;quot;a&amp;quot;, &amp;quot;b&amp;quot;), Seq(5.00, 6.00))),
      (2, (Seq(&amp;quot;c&amp;quot;), Seq())))
  }
}&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;A few things to note:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;Spark&amp;#8217;s primary API is a Scala DSL, with a collection-like &lt;a href='http://www.spark-project.org/docs/0.6.0/api/core/index.html#spark.RDD'&gt;&lt;code&gt;RDD&lt;/code&gt;&lt;/a&gt; type, so our reports are mostly &lt;code&gt;map&lt;/code&gt;, &lt;code&gt;filter&lt;/code&gt;, and &lt;code&gt;join&lt;/code&gt; calls.&lt;/p&gt;
&lt;/li&gt;

&lt;li&gt;
&lt;p&gt;Holy shit this is a unit test&amp;#8211;completely unlike Hive, Spark can run locally (see the &lt;code&gt;&amp;quot;local&amp;quot;&lt;/code&gt; parameter to &lt;code&gt;SparkContext&lt;/code&gt;) very trivially, by being embedded in a unit test.&lt;/p&gt;

&lt;p&gt;The above test takes ~2 seconds to spin up Spark in local mode (which is not actually in-process) and execute the tests. Amazing!&lt;/p&gt;

&lt;p&gt;This is a &lt;em&gt;huge&lt;/em&gt; reason as to why I am liking Spark much more than Hive, where our reports were basically not under test.&lt;/p&gt;
&lt;/li&gt;

&lt;li&gt;
&lt;p&gt;The above test is just kicking the tires of &lt;code&gt;cogroup&lt;/code&gt;, &lt;code&gt;join&lt;/code&gt;, &lt;code&gt;leftOuterJoin&lt;/code&gt;, and &lt;code&gt;rightOuterJoin&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;While the &lt;code&gt;join&lt;/code&gt; flavors are recognizable from SQL, it is interesting that &lt;code&gt;cogroup&lt;/code&gt; is actually the primitive operation, and the &lt;code&gt;join&lt;/code&gt; operations are implemented as secondary reorganizations of the data (via &lt;code&gt;map&lt;/code&gt;/&lt;code&gt;flatMap&lt;/code&gt; calls) returned from &lt;code&gt;cogroup&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;Refreshingly, you can read the Spark implementation of the various &lt;code&gt;join&lt;/code&gt; operations, and they really are just 4-5 lines of very readable Scala code.&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;This is obviously a small example, where I was just loading some very small, hard-coded data into Spark (via the &lt;code&gt;parallelize&lt;/code&gt; calls). Eventually I hope to post more Spark-related techniques/results as the report we&amp;#8217;re working on gets more flushed out.&lt;/p&gt;</content>
 </entry>
 
 <entry>
   <title type="html">High Level Assertions</title>
   <link href="http://draconianoverlord.com/2012/12/26/high-level-assertions.html" />
   <updated>2012-12-26T00:00:00-08:00</updated>
   <id>http://draconianoverlord.com/2012/12/26/high-level-assertions</id>
   <content type="html">&lt;h1 id='high_level_assertions'&gt;High Level Assertions&lt;/h1&gt;

&lt;p&gt;It is often said that we should treat test code like production code, and have it be DRY, readable, well refactored, etc.&lt;/p&gt;

&lt;p&gt;But I think, compared to production code, there is generally less discussion about how to actually do this within test code. (Well, maybe not, if you count all of the TDD/BDD posts, but those tend to be more process oriented that code oriented.)&lt;/p&gt;

&lt;p&gt;Anyway, the pattern I strive for with most tests is (ha) BDD inspired:&lt;/p&gt;

&lt;pre class='brush:java'&gt;&lt;code&gt;public void someTest() {
  // given &amp;lt;some business condition&amp;gt;
  setupTheBusinessCondition();

  // when &amp;lt;some business event happens&amp;gt;
  invokeTheEvent();

  // then &amp;lt;some business artifact is observable&amp;gt;
  assertSomething();
}&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Where ideally the comments of &amp;#8220;given, when, then&amp;#8221; document the why, the meaning of what&amp;#8217;s going on, so that 6 months from now, any programmer, yourself included, could glance at the comments and follow what is going on.&lt;/p&gt;

&lt;p&gt;(Unfortunately I don&amp;#8217;t know any programming language that is so fluent/high-level that comments are unnecessary; given that mainstream programming languages always instruct &amp;#8220;how&amp;#8221;, I think comments of &amp;#8220;why&amp;#8221; will always have their place.)&lt;/p&gt;

&lt;p&gt;Okay, to the point, within this idiom, it&amp;#8217;s import for each step (given, when, then) to be as small as possible, so that the reader only has to deal with &lt;a href='http://en.wikipedia.org/wiki/The_Magical_Number_Seven,_Plus_or_Minus_Two'&gt;7+/-2&lt;/a&gt; lines of code to comprehend the test.&lt;/p&gt;

&lt;p&gt;Of course, this 7+/-2 limitation is hard to accomplish.&lt;/p&gt;

&lt;h2 id='approach_for_assertions'&gt;Approach for Assertions&lt;/h2&gt;

&lt;p&gt;One trick I&amp;#8217;ve occasionally used, specifically within the &amp;#8220;then&amp;#8221; assertion section, is to make custom, higher-/business-level assertions.&lt;/p&gt;

&lt;p&gt;E.g. often you&amp;#8217;ll see code that wants to assert &amp;#8220;there are two $50 credit transactions&amp;#8221;, but it takes 5 lines of code to accomplish:&lt;/p&gt;

&lt;pre class='brush:java'&gt;&lt;code&gt;assertThat(txns.getSize(), is(5));
assertThat(txns.get(0).getAmount(), is(Money.dollars(50.00));
assertThat(txns.get(0).getType(), is(CREDIT));
assertThat(txns.get(1).getAmount(), is(Money.dollars(50.00));
assertThat(txns.get(1).getType(), is(CREDIT));&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;We just ate more than half of our 7 LOC budget.&lt;/p&gt;

&lt;p&gt;So, in thinking how we can make this assertion simpler, and more direct, I&amp;#8217;ve wound up occasionally using strings (gasp) to encode a high-/business-level meaning, e.g.:&lt;/p&gt;

&lt;pre class='brush:java'&gt;&lt;code&gt;assertTxns(txns, &amp;quot;1/1 $50 CREDIT&amp;quot;, &amp;quot;1/1 $50 CREDIT&amp;quot;);&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Or, if you have attributes about an account (like overpaid or overdue), you might do:&lt;/p&gt;

&lt;pre class='brush:java'&gt;&lt;code&gt;// then the 1st account is bad
assertAccount(account1, &amp;quot;#1234 CHECKING (overpaid, overdue)&amp;quot;);
// and the 2nd account is okay
assertAccount(account2, &amp;quot;#5678 CHECKING&amp;quot;);&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Note how ideally information you normally wouldn&amp;#8217;t care about, like an account being overpaid (which is hopefully unusual), is not included in the default description, as it would be noise that most test cases don&amp;#8217;t care about.&lt;/p&gt;

&lt;h2 id='pros'&gt;Pros&lt;/h2&gt;

&lt;p&gt;I think the benefit of this approach is that you&amp;#8217;re able to pack a lot of information into a single LOC. It&amp;#8217;s like making a mini-DSL for your assertions.&lt;/p&gt;

&lt;p&gt;It is similar to &lt;a href='http://fit.c2.com/'&gt;FIT&lt;/a&gt; table-based assertions, where the assertions are declarations of desired output that a business person would understand, and not manual, imperative checks.&lt;/p&gt;

&lt;p&gt;I think these high-level assertions can be very nice to read, which is useful especially if you&amp;#8217;re quickly scanning tests, trying to understand each boundary condition.&lt;/p&gt;

&lt;h2 id='cons'&gt;Cons&lt;/h2&gt;

&lt;p&gt;Granted, there are some things to watch out for:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;You have to carefully pick and choose what the relevant information you include in the description.&lt;/p&gt;

&lt;p&gt;If you choose too little information, the &lt;code&gt;assertAccount&lt;/code&gt; will not be useful, and tests will have to fall back to low-level assertions.&lt;/p&gt;

&lt;p&gt;If you choose too much information, then each &lt;code&gt;assertAccount&lt;/code&gt; call has a lot of extra noise, which is distracting to read (&amp;#8220;Do we really care about attribute X for this test case? Or is it just in the description by default?&amp;#8221;).&lt;/p&gt;

&lt;p&gt;You also risk having a whole lot of assertions fail if they include information they don&amp;#8217;t technically care about, but somehow changes due to an otherwise unrelated change.&lt;/p&gt;
&lt;/li&gt;

&lt;li&gt;
&lt;p&gt;These are strings, so are hidden from refactoring and &amp;#8220;find caller&amp;#8221; references, and can generally be a pita to update if they have to change.&lt;/p&gt;

&lt;p&gt;(As a minor justification for this, I can at least know that all of these strings are evaluated at test time.)&lt;/p&gt;
&lt;/li&gt;

&lt;li&gt;
&lt;p&gt;Making a DSL has a cost that will only pay off if you have a lot of similar assertions.&lt;/p&gt;

&lt;p&gt;This likely means it&amp;#8217;s only worthwhile for larger, more complex projects that have many use cases for the same entities. E.g. a banking system would surely have an awful lot of assertions against account balances/attributes.&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;

&lt;h2 id='potential_alternative'&gt;Potential Alternative&lt;/h2&gt;

&lt;p&gt;I haven&amp;#8217;t tried it yet, but I could see addressing the cons with an assertion method that used optional parameters, e.g. in Scala:&lt;/p&gt;

&lt;pre class='brush:scala'&gt;&lt;code&gt;assertAccount(account1, id = 1234, type = Checking);

// here&amp;#39;s the custom assertion method:
def assertAccount(
  account: Account,
  id: Int = 0,
  type: AccountType = null,
  overpaid: Boolean = null) {
  // only assert against parameters that were provided
  if (overpaid != null) {
    assertThat(account.getOverpaid, is (overpaid))
  }
}&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;This way each test method could opt-in to only asserting the attributes it cares about. &amp;#8230;although this sounds very similar to the imperative approach, and is just kind of hacking it to be one 1 line.&lt;/p&gt;

&lt;p&gt;I&amp;#8217;m also not sure how this would work with multiple entities (like asserting against a list of accounts), although maybe the compromise is that you just have to have 1 assertion per entity.&lt;/p&gt;

&lt;h2 id='conclusion'&gt;Conclusion&lt;/h2&gt;

&lt;p&gt;Anyway, my basic thought is that &amp;#8220;then&amp;#8221; assertion sections of tests can easily become too many lines of code, with too many repetitive, imperative assertions, and so it&amp;#8217;s worth keeping in mind how/when you could switch over to a more declarative DSL-/FIT-style approach, even within the confines of regular xUnit tests.&lt;/p&gt;</content>
 </entry>
 
 <entry>
   <title type="html">Good Test, Bad Test</title>
   <link href="http://draconianoverlord.com/2012/12/15/good-test-bad-test.html" />
   <updated>2012-12-15T00:00:00-08:00</updated>
   <id>http://draconianoverlord.com/2012/12/15/good-test-bad-test</id>
   <content type="html">&lt;h1 id='good_test_bad_test'&gt;Good Test, Bad Test&lt;/h1&gt;

&lt;p&gt;I caught myself writing a bad test yesterday, but while setting up the code review thought, er, if I was reviewing this code, I would be less than thrilled.&lt;/p&gt;

&lt;p&gt;So, I refactored the test and updated the code review to include it.&lt;/p&gt;

&lt;p&gt;The difference in test quality was so much better, IMO, that I thought I&amp;#8217;d post it here as an example.&lt;/p&gt;

&lt;h2 id='the_code'&gt;The Code&lt;/h2&gt;

&lt;p&gt;So, what I am testing is a &lt;code&gt;substringBetween&lt;/code&gt; utility method, with just finds two tokens and replaces what is between them with a new string.&lt;/p&gt;

&lt;p&gt;Note that the places we use this are very limited/special, so we&amp;#8217;re assuming the tokens will always exist, and so ignoring some boundary conditions that would otherwise need handled.&lt;/p&gt;

&lt;p&gt;Anyway, here&amp;#8217;s the code:&lt;/p&gt;

&lt;pre class='brush:java'&gt;&lt;code&gt;public static String replaceBetween(
    final String s,
    final String startToken,
    final String endToken,
    final String replacement) {
  final int i = s.indexOf(startToken);
  final int j = s.indexOf(endToken, i + startToken.length());
  return s.substring(0, i + startToken.length())
    + replacement
    + s.substring(j);
}&lt;/code&gt;&lt;/pre&gt;

&lt;h2 id='bad_test'&gt;Bad Test&lt;/h2&gt;

&lt;p&gt;So, this is the test I started out with:&lt;/p&gt;

&lt;pre class='brush:java'&gt;&lt;code&gt;@Test
public void testReplaceBetween() {
  assertThat(
    replaceBetween(&amp;quot;one two three four&amp;quot;, &amp;quot;two&amp;quot;, &amp;quot;three&amp;quot;, &amp;quot;!&amp;quot;),
    is(&amp;quot;one two!three four&amp;quot;));
  assertThat(
    replaceBetween(&amp;quot;aaaabccaa&amp;quot;, &amp;quot;b&amp;quot;, &amp;quot;a&amp;quot;, &amp;quot;CC&amp;quot;),
    is(&amp;quot;aaaabCCaa&amp;quot;));
  assertThat(
    replaceBetween(&amp;quot;starttoken ... tend&amp;quot;, &amp;quot;starttoken&amp;quot;, &amp;quot;t&amp;quot;, &amp;quot; !!! &amp;quot;),
    is(&amp;quot;starttoken !!! tend&amp;quot;));
}&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;This test covers 3 boundary cases that I cared about, but while setting up the review, I found several things confusing about it:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The strings/tokens are non-sensical to each case&lt;/li&gt;

&lt;li&gt;The strings/tokens change arbitrarily between each case&lt;/li&gt;

&lt;li&gt;The strings/tokens are arbitrarily longer/shorter between each case&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The result is that it&amp;#8217;s very hard to quickly glance at these tests and tell what is actually being tested.&lt;/p&gt;

&lt;h2 id='good_test'&gt;Good Test&lt;/h2&gt;

&lt;p&gt;This is what I ended up refactoring the test to:&lt;/p&gt;

&lt;pre class='brush:java'&gt;&lt;code&gt;@Test
public void testReplaceBetween() {
  assertThat(
    replaceBetween(&amp;quot;start middle end&amp;quot;, &amp;quot;start&amp;quot;, &amp;quot;end&amp;quot;, &amp;quot;!&amp;quot;),
    is(&amp;quot;start!end&amp;quot;));
}

@Test
public void testWhenEndTokenIsAlsoBeforeTheStartToken() {
  assertThat(
    replaceBetween(&amp;quot;end start middle end&amp;quot;, &amp;quot;start&amp;quot;, &amp;quot;end&amp;quot;, &amp;quot;!&amp;quot;),
    is(&amp;quot;end start!end&amp;quot;));
}

@Test
public void testWhenEndTokenIsWithinTheStartToken() {
  assertThat(
    replaceBetween(&amp;quot;startendstart middle end&amp;quot;, &amp;quot;startendstart&amp;quot;, &amp;quot;end&amp;quot;, &amp;quot;!&amp;quot;),
    is(&amp;quot;startendstart!end&amp;quot;));
}&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;I think this version is demonstrably better:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The boundary cases are highlighted in the method names&lt;/li&gt;

&lt;li&gt;The strings under test and tokens are very simple and align with the concepts they are testing&lt;/li&gt;

&lt;li&gt;The strings under test contain minimal noise that is not directly related to the boundary cases being tested&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;A big improvement.&lt;/p&gt;

&lt;p&gt;I think this is important because the function of unit tests is not just to put boundary cases under test, it&amp;#8217;s also to explain these boundary cases to future maintainers so they can understand why the code (and tests) work the way they do.&lt;/p&gt;

&lt;p&gt;Perhaps the bad test, for this small/simple code snippet, is not going to ruin our codebase. But as the code being tested starts getting larger/more complex, and the tests start getting larger/more complex, I think the time taken to make the tests just as clean and readable as the main codebase is time well spent.&lt;/p&gt;

&lt;p&gt;(Obviously this is not a new concept; hopefully I am preaching to the choir.)&lt;/p&gt;</content>
 </entry>
 
 <entry>
   <title type="html">Scala Implicit Conversion with Tuples</title>
   <link href="http://draconianoverlord.com/2012/12/14/scala-implicit-conversion-with-tuples.html" />
   <updated>2012-12-14T00:00:00-08:00</updated>
   <id>http://draconianoverlord.com/2012/12/14/scala-implicit-conversion-with-tuples</id>
   <content type="html">&lt;h1 id='scala_implicit_conversion_with_tuples'&gt;Scala Implicit Conversion with Tuples&lt;/h1&gt;

&lt;p&gt;I had not seen this before, but while reading a blog post about this &lt;a href='http://spray.io/blog/2012-12-13-the-magnet-pattern/'&gt;Magnet Pattern&lt;/a&gt;, they showed how Scala&amp;#8217;s implicit conversions will convert a parameter list into a tuple while searching for implicit conversions. E.g.:&lt;/p&gt;

&lt;pre class='brush:scala'&gt;&lt;code&gt;object ImplicitTest {
  // we have some type Foo
  case class Foo(o: AnyRef)

  // and a method that only takes Foos
  def needsAFoo(foo: Foo) = foo.toString

  // and we can convert tuples to Foo
  implicit def tupleToFoo(t: (Int, Int)) = Foo(t)

  def main(args: Array[String]) {
    // obviously, calling foo with a tuple works, as the
    // Scala compiler looks for implicit defs that would
    // take this arg that doesn&amp;#39;t compile (Tuple2[Int, Int])
    // and returns it as a type that would compile, i.e. Foo
    println(needsAFoo((1, 2)))
    // becomes:
    // needsAFoo(tupleToFoo(new Tuple2(1, 2))))

    // but we can also pretend we&amp;#39;re passing the arguments
    // directly to needsAFoo, and the Scala compiler will
    // group &amp;quot;1, 2&amp;quot; together into a Tuple2[Int, Int], and
    // find the same implicit def as before
    println(needsAFoo(1, 2))
  }
}&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;This is pretty neat, and something I&amp;#8217;ll try and remember.&lt;/p&gt;

&lt;p&gt;The Magnet pattern looks pretty cool, too, but I haven&amp;#8217;t had a lot of time to study it yet.&lt;/p&gt;</content>
 </entry>
 
 <entry>
   <title type="html">One Click VNC</title>
   <link href="http://draconianoverlord.com/2012/12/01/one-click-vnc.html" />
   <updated>2012-12-01T00:00:00-08:00</updated>
   <id>http://draconianoverlord.com/2012/12/01/one-click-vnc</id>
   <content type="html">&lt;h1 id='one_click_vnc'&gt;One Click VNC&lt;/h1&gt;

&lt;p&gt;I did some of the usual family tech support over the Thanksgiving holiday, this time for a family member who lives in another state.&lt;/p&gt;

&lt;p&gt;While the &amp;#8220;issue&amp;#8221; (too many icons in the IE favorites bar) was taken care of in short order, it seemed inevitable that something else would come up, and I was not looking forward to remote phone support.&lt;/p&gt;

&lt;p&gt;Initially, I was quite willing to sign up for Fog Creek&amp;#8217;s Copilot program, but ran into two issues:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;It seems geared towards on-the-fly support, with the &amp;#8220;here, we&amp;#8217;ll email them a code and they can download it&amp;#8221; approach.&lt;/p&gt;

&lt;p&gt;For my situation, that was too many steps&amp;#8211;instead, I wanted to set it up on the remote computer ahead of time, for free, and then be able to use it for a nominal daily fee if ever needed.&lt;/p&gt;

&lt;p&gt;(Perhaps Copilot actually allows this, and I didn&amp;#8217;t RTFM, but&amp;#8230;)&lt;/p&gt;
&lt;/li&gt;

&lt;li&gt;
&lt;p&gt;They don&amp;#8217;t support Linux.&lt;/p&gt;

&lt;p&gt;Um&amp;#8230;wtf? All Fog Creek did was pay some intern to bundle a VNC client/server into Window EXEs, hook up some admittedly vary useful firewall poking scheme via their remote servers, and hey, they have a product.&lt;/p&gt;

&lt;p&gt;Which, fair enough, smart business execution&amp;#8230;but Copilot &lt;em&gt;is&lt;/em&gt; VNC and VNC is from the Linux ecosystem&amp;#8211;so, seriously, just let me fire up TightVNC, even if on my end I have to, oh no, type in a crazy host name and port number. Whatever.&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;So, that led to an hour or so of poking around, before coming across exactly what I was looking for: &lt;a href='http://www.vncscan.com/vs/oneclickVNC.htm'&gt;One-Click VNC&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;With One-Click VNC, you can setup the relative&amp;#8217;s computer a head of time, with a simple &amp;#8220;here, when I tell you to, just click on this desktop icon&amp;#8221;.&lt;/p&gt;

&lt;p&gt;When they click on it, One-Click starts a VNC server on their local computer, and then, via a config file, calls out to your computer&amp;#8217;s IP address, so your waiting VNC client can accept the incoming connection.&lt;/p&gt;

&lt;p&gt;And, tada!, you see their screen. Awesome.&lt;/p&gt;

&lt;p&gt;Note that the server establishing the connection is not how VNC normally works, but it&amp;#8217;s great as it means One-Click will poke a hole through the remote computer&amp;#8217;s firewall, NAT, ISP, etc.&lt;/p&gt;

&lt;p&gt;Which leads to the only wrinkle&amp;#8211;your VNC client won&amp;#8217;t have poked a hole in your firewall, it&amp;#8217;s just waiting for an incoming connection, so you pretty much have to have a static IP (since it is hard-coded in the One-Click config file) which can forward the One-Click 5500 port back to your machine.&lt;/p&gt;

&lt;p&gt;Which, hey, I have a static IP, so good enough.&lt;/p&gt;

&lt;p&gt;Getting this to work if you have a dynamic IP is left as an exercise to the reader, but leave a comment if you figure it out (other than &amp;#8220;just use Copilot on a Mac&amp;#8221;).&lt;/p&gt;</content>
 </entry>
 
 <entry>
   <title type="html">Reinvent 2012 Bezos Notes</title>
   <link href="http://draconianoverlord.com/2012/11/30/reinvent-bezos.html" />
   <updated>2012-11-30T00:00:00-08:00</updated>
   <id>http://draconianoverlord.com/2012/11/30/reinvent-bezos</id>
   <content type="html">&lt;h1 id='reinvent_2012_bezos_notes'&gt;Reinvent 2012 Bezos Notes&lt;/h1&gt;

&lt;p&gt;As a continuation of my &lt;a href='https://reinvent.awsevents.com/'&gt;AWS re:Invent&lt;/a&gt; keynote notes, here are some from Werner Vogel chatting with Jeff Bezos.&lt;/p&gt;

&lt;p&gt;Bezos: we only win when our customers win, for example:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Kindle hardware is break even, so we only make money when our customers actually &lt;em&gt;use&lt;/em&gt; it, not just have it sit around in a drawer.&lt;/li&gt;

&lt;li&gt;This aligns our goals with the customer&amp;#8217;s&amp;#8211;enjoy it as much as possible, by encouraging them with useful content, even if it&amp;#8217;s an ancient 1st gen device.&lt;/li&gt;

&lt;li&gt;(My comment: this is a less evil view than &amp;#8220;oh, they just want to make more money on the consumption of content&amp;#8221;, but if the customer is choosing to consume the content anyway, and this alignment means amazon.com is incentived to line up more content for the consumer to consume, vs. letting their device rot away un-updated on Android 3.x &lt;em&gt;cough&lt;/em&gt; carriers &lt;em&gt;cough&lt;/em&gt;, then perhaps the customer is actually getting a better deal, and more realized value, out of it than an up-front cost of the device anyway, even if the content cost ends up being higher over time.)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;AWS utility pricing is the same way, aligned with customers. Helps customers drive their own costs down&amp;#8211;that will lead to more usage, more adoption, more revenue in the long run.&lt;/p&gt;

&lt;p&gt;Flywheels:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Don&amp;#8217;t ask &amp;#8220;what&amp;#8217;s going to change in 10 years&amp;#8221;&amp;#8230;ask &amp;#8220;what&amp;#8217;s &lt;em&gt;not&lt;/em&gt; going to change in the next 10 years&amp;#8221;?&lt;/li&gt;

&lt;li&gt;That&amp;#8217;s what you can build a business around: stability.&lt;/li&gt;

&lt;li&gt;For retail this is stuff like low prices, a huge catalog.&lt;/li&gt;

&lt;li&gt;Customers will always want low cost goods. Impossible to imagine customers wishing for slower delivery.&lt;/li&gt;

&lt;li&gt;So spinning up, investing in, these sorts of fly wheels is worth the long-term investment.&lt;/li&gt;

&lt;li&gt;No one will say AWS should be less reliable, or more expensive, or have less than APIs.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Good ideas are not obvious, and its hard to see the obvious all of the time.&lt;/p&gt;

&lt;p&gt;Principles of innovation:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;It&amp;#8217;s a point of view.&lt;/p&gt;
&lt;/li&gt;

&lt;li&gt;
&lt;p&gt;Select people who want to explore, want to invent (vs. people whose view point is, say, destroying the competition&amp;#8211;that can be successful too, but is different).&lt;/p&gt;
&lt;/li&gt;

&lt;li&gt;
&lt;p&gt;Work backwards from customers&amp;#8211;which customers can we please today.&lt;/p&gt;
&lt;/li&gt;

&lt;li&gt;
&lt;p&gt;What is not as fun:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Willingness to fail, to be misunderstood.&lt;/li&gt;

&lt;li&gt;Both well-meaning and self-interested critics will question you&amp;#8211;that&amp;#8217;s okay and expected.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;

&lt;li&gt;
&lt;p&gt;Ramp up rate of experimentation&amp;#8211;try until you find things that customers care about.&lt;/p&gt;
&lt;/li&gt;

&lt;li&gt;
&lt;p&gt;How do you organize your life and time to increase your rate of experiments?&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Biggest surprise about AWS: early adoption within government, enterprise, and education. Knew startups would adopt it (although much quicker than expected), but thought government would be a hard sell.&lt;/p&gt;

&lt;p&gt;On business models:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Assume customers have access to perfect information.&lt;/li&gt;

&lt;li&gt;Because they do: the Internet empowers users with better information each year (e.g. price comparison&amp;#8211;used to require driving store to store, then phone calls, now just Google)&lt;/li&gt;

&lt;li&gt;Balance of power is shifting from producers of goods to consumers.&lt;/li&gt;

&lt;li&gt;If your business model depends on customers being misinformed, you need a new model.&lt;/li&gt;

&lt;li&gt;AWS is a huge success, with no salesforce like HP/IBM/Oracle&amp;#8211;consumers find new options that work by themselves.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Lean:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;Defect reduction, don&amp;#8217;t let defects move downstream, muda&lt;/p&gt;
&lt;/li&gt;

&lt;li&gt;
&lt;p&gt;But requires information to make decisions.&lt;/p&gt;
&lt;/li&gt;

&lt;li&gt;
&lt;p&gt;Data centers historically have been information free, no visibility&amp;#8211;AWS changes that.&lt;/p&gt;
&lt;/li&gt;

&lt;li&gt;
&lt;p&gt;About line workers &amp;#8220;stopping the line&amp;#8221;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Stopping the line was just not done in Detroit, scene as insane, stopping the output&lt;/li&gt;

&lt;li&gt;But defects were still being made, so instead of being fixed, they just got skipped over and passed along.&lt;/li&gt;

&lt;li&gt;amazon.com customer service rep knew a customer was going to want to return a table (&amp;#8220;everyone returns this table&amp;#8221;)&amp;#8211;but no communication was happening with distrubtion. This sort of communication doesn&amp;#8217;t happen automatically. Now customer service reps can pull products from website until customer complaints are addressed.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;

&lt;li&gt;
&lt;p&gt;Kaizen training: fiery &amp;#8220;insultant&amp;#8221;. Don&amp;#8217;t sweep, eliminate the source of the dirt.&lt;/p&gt;
&lt;/li&gt;

&lt;li&gt;
&lt;p&gt;High margins cover a lot of sins, so you don&amp;#8217;t need to be efficient. And it&amp;#8217;s more fun to align with customers anyway.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;How has entrepreneurship changed since founding Amazon:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;Rate of change is even faster.&lt;/p&gt;
&lt;/li&gt;

&lt;li&gt;
&lt;p&gt;The heart of entrepreneurship is still risk taking:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Entrepreneurs are all about taking and then reducing risk&lt;/li&gt;

&lt;li&gt;Take a business concept and drive the risk about of it until it&amp;#8217;s sustaintable.&lt;/li&gt;

&lt;li&gt;Systematically identifying risk.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;

&lt;li&gt;
&lt;p&gt;Empowering people with new tools can lead to new interesting results (AWS utility model enabled many, many unforeseen uses and adoption.)&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Can AWS serve startups and enterprise at the same time? Of course, it already does.&lt;/p&gt;

&lt;p&gt;The 10,000 year clock:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;In a mountain in Texas, it should run for 10k years.&lt;/li&gt;

&lt;li&gt;The symbol is about long term thinking.&lt;/li&gt;

&lt;li&gt;Allows solving things that you wouldn&amp;#8217;t even consider otherwise. Time horizons matter.&lt;/li&gt;

&lt;li&gt;If asked to solve world hunger in 5 years, it&amp;#8217;s impossible. But in 100 years? You can start envisioning gradual, long-term change.&lt;/li&gt;

&lt;li&gt;Ask amazon.com team: could you deliver a book to Mars in 8-10 years? Super-excited to answer. What about Mongolia? &amp;#8220;Oh right.&amp;#8221;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Blue Origin: you can&amp;#8217;t practice with non-reusable vehicles, they will never be low cost.&lt;/p&gt;

&lt;p&gt;Advice:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Never chase the hot thing, you need to be there already to catch the wave when it starts.&lt;/li&gt;

&lt;li&gt;So make sure you do what you are passionate about, so you can do it anyway, before it&amp;#8217;s hot, and even if it never does get hot.&lt;/li&gt;

&lt;li&gt;Prefer missionaries (doing something they love anyway) over mercenaries (chasing the cash). Ironically, missionaries usually end up with more money anyway.&lt;/li&gt;

&lt;li&gt;Start with the customer and work backwards.&lt;/li&gt;
&lt;/ul&gt;</content>
 </entry>
 
 <entry>
   <title type="html">Reinvent 2012 Keynote Notes</title>
   <link href="http://draconianoverlord.com/2012/11/29/reinvent-keynotes.html" />
   <updated>2012-11-29T00:00:00-08:00</updated>
   <id>http://draconianoverlord.com/2012/11/29/reinvent-keynotes</id>
   <content type="html">&lt;h1 id='reinvent_2012_keynote_notes'&gt;Reinvent 2012 Keynote Notes&lt;/h1&gt;

&lt;p&gt;The &lt;a href='http://www.bizo.com'&gt;Bizo&lt;/a&gt; engineering team is attending &lt;a href='https://reinvent.awsevents.com/'&gt;AWS re:Invent&lt;/a&gt; this year, which has generally been awesome.&lt;/p&gt;

&lt;p&gt;I thought I&amp;#8217;d post the meandering notes I took during the keynotes. My disclaimer is that I&amp;#8217;m erring on the side of posting incomplete/off-the-cuff notes vs. a more curated, edited version, as the latter just wouldn&amp;#8217;t happen.&lt;/p&gt;

&lt;h2 id='andy_jassy_senior_vp_keynote'&gt;Andy Jassy, Senior VP, Keynote&lt;/h2&gt;

&lt;p&gt;Every day AWS is adding enough capacity to run all of amazon.com circa 2003, which was then a $5 billion business.&lt;/p&gt;

&lt;p&gt;Segment showing how the NASA&amp;#8217;s Mars Curiosity lander used AWS to stream video from the lander to JPL into AWS through Simple Workflow, S3, EC2, to their website. Self-hosted NASA website went down, so they redirected all traffic directly to the AWS-based Curiosity video feed (which is where all the users wanted to go anyway), which scaled and stayed up.&lt;/p&gt;

&lt;p&gt;Pitching AWS/the cloud:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Allows trading CapEx (large, up-front server costs) for OpEx (on-going costs), with a smaller OpEx than up-front servers anyway.&lt;/li&gt;

&lt;li&gt;Economies of scale (within cloud provider) as adoption increases.&lt;/li&gt;

&lt;li&gt;AWS has lowered prices 23x, 25% off S3. Utility pricing. No contracts.&lt;/li&gt;

&lt;li&gt;AWS is extremely disruptive to the economics of traditional hardware companies (Oracle, HP, IBM) which are used to 60-80% margins. AWS is low margin, high volume.&lt;/li&gt;

&lt;li&gt;Easy to experiment (new apps, new services) with low risk (not stuck with later-unneeded, wasted infrastructure).&lt;/li&gt;

&lt;li&gt;Don&amp;#8217;t waste time on non-differentiating infrastructure.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Conversation with Reed Hastings:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;Aside about Nicolas Carr, there used to be VPs of Electricity, as each company generated its own electricity. VPs of Data Centers/etc. will go the same way.&lt;/p&gt;
&lt;/li&gt;

&lt;li&gt;
&lt;p&gt;Netflix went from 1 million to 1 billion hours streamed/month (1000x increase) in 4 years.&lt;/p&gt;
&lt;/li&gt;

&lt;li&gt;
&lt;p&gt;Reed&amp;#8217;s future trends:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Picking instance types should go away, just like picking register allocation went away with compilers&lt;/li&gt;

&lt;li&gt;Moving running instances at scale is hard, but will happen&lt;/li&gt;

&lt;li&gt;Trend of cloud-assisted personal computers, e.g. Siri, taking input and using the cloud to make it useful.&lt;/li&gt;

&lt;li&gt;Netflix&amp;#8217;s UI has a 10&amp;#8221; window through which customers must choose from 50k videos&amp;#8211;ranking and suggestions (first two home pages) are extremely important to their business&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;New service: RedShift, data warehouse.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Reduces costs from ~$19k/TB/year for traditional setup to $1k/TB/year.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id='werner_vogel_keynote'&gt;Werner Vogel Keynote&lt;/h2&gt;

&lt;p&gt;Amazon.com being a retail, low margin business meant having 75% unused capacity was too expensive. (E.g. handling Black Friday spike of 4x traffic with traditional hardware means other 364 days of the year, only 1/4th of the traditional server capacity would be used.)&lt;/p&gt;

&lt;p&gt;Led to avoiding business decisions, like time-based promotions (millions of users hitting F5 at 11:59pm), because hardware capacity wasn&amp;#8217;t unavailable.&lt;/p&gt;

&lt;p&gt;Research for fault tolerance was done, complete, 15 years ago, but only people at scale could implement it. So no one did.&lt;/p&gt;

&lt;p&gt;Werner asserts that dynamic resources will reduce risk, complexity, etc., and increase the historically bad (30% etc.) software project completion rates. (I am skeptical.)&lt;/p&gt;

&lt;p&gt;Commandants for 21st century architectures:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;Controllable: small, loosely coupled, stateless building blocks.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;These blocks are the basis of your scale and recovery.&lt;/p&gt;
&lt;/li&gt;

&lt;li&gt;
&lt;p&gt;Example of architecture inversion:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;amazon.com had been calling IMDB for DVD stats with a sync call&lt;/li&gt;

&lt;li&gt;Meant amazon.com load affects IMDB load&lt;/li&gt;

&lt;li&gt;Change to IMDB pushing HTML into S3 buckets, amazon.com pulls from S3, now decoupled&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;

&lt;li&gt;
&lt;p&gt;Architect with cost in mind&amp;#8211;not big O, but real dollars. Utility pricing. Cost grows inline with business.&lt;/p&gt;
&lt;/li&gt;

&lt;li&gt;
&lt;p&gt;Demo graphing cost back to a business goal like &amp;#8220;cost per image uploaded&amp;#8221;&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;

&lt;li&gt;
&lt;p&gt;Resilient&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;Protecting customers is the first priority (privacy, encryption, HTTPS all the time).&lt;/p&gt;
&lt;/li&gt;

&lt;li&gt;
&lt;p&gt;Use availability zones.&lt;/p&gt;
&lt;/li&gt;

&lt;li&gt;
&lt;p&gt;Integrate security from the ground up (firewalls are not enough anymore, we don&amp;#8217;t use moats), ports are closed by default.&lt;/p&gt;
&lt;/li&gt;

&lt;li&gt;
&lt;p&gt;Build, test, deploy continuously&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;amazon.com deploys every 10 seconds, 30k instances receiving updates&lt;/li&gt;

&lt;li&gt;amazon.com used to do phased deployments, roll a new version gradually through each availability zone&lt;/li&gt;

&lt;li&gt;Now flips back/forth. Rollback is possible as the old machines are still around.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;

&lt;li&gt;
&lt;p&gt;Don&amp;#8217;t think about single failures. Don&amp;#8217;t treat failure as an exception.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;

&lt;li&gt;
&lt;p&gt;Adaptive * &amp;#8220;Use occam&amp;#8217;s razor&amp;#8221; and &amp;#8220;Assume nothing&amp;#8221;&amp;#8211;what?&lt;/p&gt;
&lt;/li&gt;

&lt;li&gt;
&lt;p&gt;Data driven: instrument everything, all the time.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Track business metrics, not just JVM heap size.&lt;/li&gt;

&lt;li&gt;Average sucks&amp;#8230;it means half of your customers are getting something worse. * Look at the whole distribution, look at the 99th percentile. It will raise all boats.&lt;/li&gt;

&lt;li&gt;Put everything in logs, including business information.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Some other pithy quotes:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Thou shalt use new concepts to build new applications.&lt;/li&gt;

&lt;li&gt;Thou shalt automate your applications and processes. * If you have to log in to an instance, you&amp;#8217;re doing it wrong. Business metrics (latency) should drive scaling, not ops.&lt;/li&gt;

&lt;li&gt;Thou shalt turn off the lights. * SimpleGeo graphed total cost of running clusters (prod, test, qa) on a dashboard&amp;#8211;meant devs actually shut down test cluters when they went home.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Segment on Amazon S3, the &amp;#8220;10th world wonder&amp;#8221;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;At large scale even low probability events happen a lot.&lt;/li&gt;

&lt;li&gt;1 in a billion means a few 1000 times per day.&lt;/li&gt;

&lt;li&gt;Runs per AZ, and PUTs handle syncing across regions?&lt;/li&gt;

&lt;li&gt;Rolled out new storage back end incrementally, everyone got it for free. No &amp;#8220;S3 v2&amp;#8221;.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;New instance types:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;240gb of RAM, 240gb of SSDs&lt;/li&gt;

&lt;li&gt;48TB local storage.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;New service: AWS data pipeline, automates moving data around. Scheduling, retry.&lt;/p&gt;</content>
 </entry>
 
 <entry>
   <title type="html">Joist Execute Class</title>
   <link href="http://draconianoverlord.com/2012/11/02/joist-execute.html" />
   <updated>2012-11-02T00:00:00-07:00</updated>
   <id>http://draconianoverlord.com/2012/11/02/joist-execute</id>
   <content type="html">&lt;h1 id='joist_execute_class'&gt;Joist Execute Class&lt;/h1&gt;

&lt;p&gt;I was porting a Ruby script to Scala today, and was reintroduced to how shockingly low-level the Java process APIs are.&lt;/p&gt;

&lt;p&gt;For example, things you probably take for granted, like:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Resolving a simple executable name to an absolute path via &lt;code&gt;PATH&lt;/code&gt; variable&lt;/li&gt;

&lt;li&gt;Passing the current process&amp;#8217;s environment variables on to the child process&lt;/li&gt;

&lt;li&gt;Managing the stdin, stdout, stderr buffers to ensure they don&amp;#8217;t get full (which blocks the child process)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Are not done by Java&amp;#8217;s &lt;code&gt;java.lang.Runtime&lt;/code&gt; class out-of-the-box. (Which I suppose is understandable, given the APIs must be cross-platform, so cannot make per-platform assumptions.)&lt;/p&gt;

&lt;p&gt;This was nonetheless surprising for me, as some of these things I assumed were so basic to process management that it was the OS performing the logic, so they would just magically happen. But no.&lt;/p&gt;

&lt;p&gt;So, to the point, &lt;a href='http://www.joist.ws'&gt;Joist&lt;/a&gt; has a utility class that is perfect for this: &lt;a href='https://github.com/stephenh/joist/blob/master/util/src/main/java/joist/util/Execute.java'&gt;Execute&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;I originally wrote it to invoke PgAdmin/MySQL commands to backup/restore the developer&amp;#8217;s local database, but since it has come in handy for a number of other things.&lt;/p&gt;

&lt;p&gt;The usage is very straight forward:&lt;/p&gt;

&lt;pre class='brush:scala'&gt;&lt;code&gt;val result = new Execute(&amp;quot;ls&amp;quot;)
  .arg(&amp;quot;-l&amp;quot;)
  .arg(&amp;quot;/some/path&amp;quot;)
  .toSystemOut()
println(result.exitCode)&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;So, if you&amp;#8217;re doing similar things, you might find it useful. You can check out:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href='https://github.com/stephenh/joist/blob/master/util/src/main/java/joist/util/Execute.java'&gt;Execute.java&lt;/a&gt;&lt;/li&gt;

&lt;li&gt;&lt;a href='https://github.com/stephenh/joist/blob/master/util/src/test/java/joist/util/ExecuteTest.java'&gt;ExecuteTest.java&lt;/a&gt;&lt;/li&gt;

&lt;li&gt;&lt;a href='http://repo.joist.ws/joist/joist-util/'&gt;joist-util&lt;/a&gt; in the Joist maven repo&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;In other news, the ~70 lines of Ruby code I was ported ended up being ~50 lines of Scala code. Nice!&lt;/p&gt;

&lt;p&gt;Admittedly, I cheated and put some abstractions that were used by another script into a trait, but that&amp;#8217;s even better anyway.&lt;/p&gt;</content>
 </entry>
 
 <entry>
   <title type="html">Strangeloop 2012--VoltDB</title>
   <link href="http://draconianoverlord.com/2012/10/14/strangeloop-voltdb.html" />
   <updated>2012-10-14T00:00:00-07:00</updated>
   <id>http://draconianoverlord.com/2012/10/14/strangeloop-voltdb</id>
   <content type="html">&lt;h1 id='strangeloop_2012voltdb'&gt;Strangeloop 2012&amp;#8211;VoltDB&lt;/h1&gt;

&lt;p&gt;I really enjoyed going to Strangeloop this year. The signal-to-noise ratio of the presentations and conversations with other attendees was great.&lt;/p&gt;

&lt;p&gt;Embarrassingly, I&amp;#8217;m finally getting around to writing some blog posts about it. I had wanted to do a single review post, but that turned out to be unrealistic, so I&amp;#8217;ll do a series of smaller ones about the specific topics/presentations I enjoyed.&lt;/p&gt;

&lt;p&gt;The opening keynote was &lt;a href='http://www.voltdb.com'&gt;VoltDB&lt;/a&gt;, a next-generation &amp;#8220;NewSQL&amp;#8221; relational database built by database industry guru &lt;a href='http://en.wikipedia.org/wiki/Michael_Stonebraker'&gt;Michael Stonebraker&lt;/a&gt;.&lt;/p&gt;

&lt;h2 id='a_look_at_oldsql_architecture'&gt;A Look at &amp;#8220;OldSQL&amp;#8221; Architecture&lt;/h2&gt;

&lt;p&gt;I was initially skeptical of this keynote, as keynotes are not usually product-based, but most of Stonebraker&amp;#8217;s talk ended up as a convincing argument that the fundamental architecture of current-generation databases has hit its scalability limits.&lt;/p&gt;

&lt;p&gt;While this is not really news in terms of NoSQL/BigData products these days, it was nonetheless interesting to hear, from a relational database expert, a logical explanation of exactly what parts of current relational database architecture are slow and why.&lt;/p&gt;

&lt;p&gt;For example, based on profiling (covered in &lt;a href='http://nms.csail.mit.edu/~stavros/pubs/OLTP_sigmod08.pdf'&gt;OLTP Through the Looking Glass&lt;/a&gt;), Stonebraker&amp;#8217;s assertion is that only 10% of time is spent doing real work, like updating the core data structures and indexes, and the other 90% is secondary concerns like waiting for disk IO, locking, etc.&lt;/p&gt;

&lt;p&gt;Which means that traditional ways to speed up databases (novel B-tree algorithms/etc.) really can&amp;#8217;t do much if the rest of the incidental cruft remains.&lt;/p&gt;

&lt;h2 id='deadpans_nosql'&gt;Deadpans NoSQL&lt;/h2&gt;

&lt;p&gt;Given Stonebraker is an old-school RDBMS guy, he also had a unique perspective on NoSQL.&lt;/p&gt;

&lt;p&gt;As he tells it, he lived through a phase in the industry (late 60s/70s?), before the relational model &amp;#8220;won&amp;#8221;, and before ACID was taken for granted, where there were competing technologies and data models, each vying for prominence.&lt;/p&gt;

&lt;p&gt;His point was that, looking back, relational- and ACID-based models won for a reason&amp;#8211;they greatly simplify life for the application programmer.&lt;/p&gt;

&lt;p&gt;So, he asserts that NoSQL-style eventual consistency isn&amp;#8217;t the answer, it&amp;#8217;s actually a step backwards, with a cheeky quote that &amp;#8220;eventual consistency means &amp;#8216;creates garbage&amp;#8217;&amp;#8221; (ha!).&lt;/p&gt;

&lt;p&gt;I am inclined to agree, and have hoped for awhile that eventual consistency will be a passing fade until database technology catches up today&amp;#8217;s operational constraints. Products like VoltDB and Google&amp;#8217;s &lt;a href='http://research.google.com/archive/spanner.html'&gt;Spanner&lt;/a&gt; make me optimistic that this will be the case.&lt;/p&gt;

&lt;h2 id='voltdb_as_newsql'&gt;VoltDB as &amp;#8220;NewSQL&amp;#8221;&lt;/h2&gt;

&lt;p&gt;So, finally, Stonebraker talked about VoltDB as a &amp;#8220;NewSQL&amp;#8221; approach, which forgoes the traditional architecture by being single-threaded within a partition (no locking), in-memory (no disk, network-based replicas), and moving the computation to the data (stored procedures).&lt;/p&gt;

&lt;p&gt;By doing this, VoltDB still provides ACID, but with impressive vertical scalability improvements, and, via partitioning, horizontal scalability as well.&lt;/p&gt;

&lt;p&gt;I have to admit that the architecture sounds pretty sexy&amp;#8211;it has all of the things you look for in a system that can scale&amp;#8211;in-memory, little contention, horizontal scaling.&lt;/p&gt;

&lt;h2 id='wheres_the_middleware'&gt;Where&amp;#8217;s the Middleware?&lt;/h2&gt;

&lt;p&gt;My biggest concern about VoltDB is that it requires a huge change to how applications are currently built&amp;#8211;you can only invoke stored procedures.&lt;/p&gt;

&lt;p&gt;This is because you no longer get cross-wire call transactions, like in SQL/JDBC where you can do &amp;#8220;begin transaction, select &amp;#8230;, select &amp;#8230;, update &amp;#8230;, commit&amp;#8221;, interspersing business logic with your SQL calls, and still have it all complete transactionally and with some amount of read isolation.&lt;/p&gt;

&lt;p&gt;Instead, with VoltDB, every wire-call to the database is its own transaction. That&amp;#8217;s it. This is where VoltDB gets some of its big wins, because it doesn&amp;#8217;t have to do any locking/versioning to keep a transaction &amp;#8220;in flight&amp;#8221; while waiting for your application&amp;#8217;s next wire call (which may take awhile or never come back).&lt;/p&gt;

&lt;p&gt;Which makes sense&amp;#8211;but it&amp;#8217;s a huge change to today&amp;#8217;s N-tier/middleware-based architectures, as you have to move any logic that must be transactional into VoltDB&amp;#8217;s stored procedures.&lt;/p&gt;

&lt;p&gt;This is a tough pill to swallow; any sort of domain model/ORM-based architecture goes away, as they are usually predicated on a chatty SQL connection that still provides cross-wire call transactions.&lt;/p&gt;

&lt;h2 id='moving_computation_to_data'&gt;Moving Computation to Data?&lt;/h2&gt;

&lt;p&gt;Stonebraker asserts this requirement for stored procedures is &amp;#8220;moving the computation to the data&amp;#8221;, which is a popular approach to BigData; instead of moving TBs of data to your client/middleware machine, you ship your code directly to the database machine.&lt;/p&gt;

&lt;p&gt;But VoltDB seems different&amp;#8211;the types of computation stored procedures allow you do are not general purpose computations, e.g. making calls to other systems (you can&amp;#8217;t do anything that will block), nor do any real heavy calculations (again would block), it&amp;#8217;s just a way to batch a few SQL operations together.&lt;/p&gt;

&lt;p&gt;I suppose this is similar to Hive, but the limitation seems more natural for Hive because your almost always doing just variation SQL-ish transformations on your data, and not real business logic.&lt;/p&gt;

&lt;p&gt;Anyway, perhaps it is just my bias, but I&amp;#8217;d prefer to keep computation at the middleware layer.&lt;/p&gt;

&lt;h2 id='potential_compromise'&gt;Potential Compromise&lt;/h2&gt;

&lt;p&gt;If I were to pick up VoltDB, I think I would try to keep a traditional domain model/ORM-ish architecture, and just use optimistic locking to enforce transaction isolation.&lt;/p&gt;

&lt;p&gt;So, if my middleware did something like:&lt;/p&gt;

&lt;pre class='brush:scala'&gt;&lt;code&gt;orm.beginTxn();

// one call to VoltDB
val b1 = BankAccount.load(1);
b1.balance += 10;

// another call to VoltDB
val b2 = BankAccount.load(2);
b2.balance -= 10;

// sends update b1 and b2 as 1 call/transaction
orm.commitTxn();&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;The UPDATEs for &lt;code&gt;b1&lt;/code&gt; and &lt;code&gt;b2&lt;/code&gt; would happen atomically.&lt;/p&gt;

&lt;p&gt;But what about read isolation? I think optimistic locking would work for this, e.g. the SQL on the wire would really be:&lt;/p&gt;

&lt;pre class='brush:sql'&gt;&lt;code&gt;-- b1 = BankAccount.load(1)
SELECT id, balance, version FROM bank_account WHERE id = 1;
-- b1.balance += 10

-- b2 = BankAccount.load(2)
SELECT id, balance, version FROM bank_account WHERE id = 2;
-- b2.balance += 10

-- orm.commitTxn
UPDATE bank_account SET balance = 20, version = 2
  WHERE id = 1 AND version = 1;
UPDATE bank_account SET balance = 0, version = 3
  WHERE id = 2 AND version = 2;&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;So, now if anyone else has touched either &lt;code&gt;bank_account&lt;/code&gt; in between my read and my write, the &lt;code&gt;version = 2&lt;/code&gt; clause will fail, and I&amp;#8217;d know the data is stale.&lt;/p&gt;

&lt;p&gt;The trick would be that I&amp;#8217;d need VoltDB to fail the whole transaction if the &lt;code&gt;UPDATE&lt;/code&gt; modified count for any of the statements was zero.&lt;/p&gt;

&lt;p&gt;This is basically moving isolation enforcement to the client, meaning it would have to fail or retry if the optimistic lock failed. I would be fine with that though, as optimistic locking is easy to build into an ORM.&lt;/p&gt;

&lt;p&gt;With a bit of work, I could see an ORM like &lt;a href='http://joist.ws'&gt;Joist&lt;/a&gt; supporting VoltDB as a backend just like the traditional MySQL/Postgres backends.&lt;/p&gt;

&lt;p&gt;Unfortunately, I don&amp;#8217;t think VoltDB can do this today&amp;#8211;the client API can only invoke stored procedures, so it would require a sort of meta-stored procedure that took a list of tables/values to update and iteratively &lt;code&gt;eval&lt;/code&gt;&amp;#8216;d them.&lt;/p&gt;

&lt;h2 id='operational_concerns'&gt;Operational Concerns&lt;/h2&gt;

&lt;p&gt;My only other concern with VoltDB is that it&amp;#8217;s a new piece of infrastructure software that requires learning the ins/outs of. And since it owns your data, you want to make especially sure you don&amp;#8217;t mess something up.&lt;/p&gt;

&lt;p&gt;This is not VoltDB&amp;#8217;s fault, it&amp;#8217;s just the reality of relying on a new software package. I&amp;#8217;ve seen a few bad things happen before when deploying new software that were not the software&amp;#8217;s fault, but a configuration misunderstanding or error. You just hope that you catch these sort of things before your data is gone.&lt;/p&gt;

&lt;p&gt;In that regard, I&amp;#8217;d enjoy seeing a RDS-style offering for VoltDB&amp;#8211;I don&amp;#8217;t want to log into servers, configure clusters or logging, or whatever, I just want a GUI that says &amp;#8220;give me X many servers, go!&amp;#8221;.&lt;/p&gt;

&lt;p&gt;Tangentially, it would be awesome if Amazon RDS let vendors build their own integrations, so you really could have a &amp;#8220;VoltDB Engine&amp;#8221; drop-down option in RDS, but supported by VoltDB instead of Amazon staff.&lt;/p&gt;

&lt;p&gt;It is not realistic for Amazon RDS themselves to support/integration all of the myriad of new databases available these days, so it would be great if RDS was more of an open marketplace through which vendors themselves could make their databases available as a PaaS.&lt;/p&gt;

&lt;h2 id='conclusion'&gt;Conclusion&lt;/h2&gt;

&lt;p&gt;I&amp;#8217;ve really enjoyed looking into VoltDB.&lt;/p&gt;

&lt;p&gt;I haven&amp;#8217;t had time to play with the community edition locally yet, but I&amp;#8217;m going to try that soon and see how it goes. I&amp;#8217;m not really sure what the application development/test/deploy cycle will look like, which I&amp;#8217;m sure one of their tutorials will cover.&lt;/p&gt;

&lt;p&gt;So, in retrospect I was surprised, but I thought the Stonebraker&amp;#8217;s VoltDB keynote was very good, and it has me checking out their product. I definitely recommend watching the video, which I&amp;#8217;ll link to here in a few weeks when Strangeloop makes it available online.&lt;/p&gt;</content>
 </entry>
 
 <entry>
   <title type="html">Brief Dependency Injection Skeptic</title>
   <link href="http://draconianoverlord.com/2012/10/07/brief-di-skeptic.html" />
   <updated>2012-10-07T00:00:00-07:00</updated>
   <id>http://draconianoverlord.com/2012/10/07/brief-di-skeptic</id>
   <content type="html">&lt;h1 id='brief_dependency_injection_skeptic'&gt;Brief Dependency Injection Skeptic&lt;/h1&gt;

&lt;p&gt;I saw this go by on Twitter the other day:&lt;/p&gt;

&lt;p&gt;&amp;#8220;Someone needs to know how my collaborators get wired up, but it doesn&amp;#8217;t need to be me. Now you&amp;#8217;re ready for Dependency Injection.&amp;#8221; (&lt;a href='https://twitter.com/dws/status/25391735782993510://twitter.com/dws/status/253917357829935105'&gt;link&lt;/a&gt;)&lt;/p&gt;

&lt;p&gt;This leads to a very succinct description of why I&amp;#8217;m a DI-skeptic: I actually do care about how my collaborators get wired up.&lt;/p&gt;

&lt;p&gt;Perhaps I&amp;#8217;m naive, or haven&amp;#8217;t worked in a system large enough, or complicated enough, or whatever, but so far my opinion and experience has been any that overhead or tedium that comes from manual wiring (which is little if &lt;a href='http://www.draconianoverlord.com/2011/03/17/frameworkless-di.html'&gt;done well&lt;/a&gt;, IMO) is a wash compared to fighting a framework to get it to do the same thing for you.&lt;/p&gt;

&lt;p&gt;This is especially true of legacy DI containers, e.g. Spring.&lt;/p&gt;

&lt;p&gt;Perhaps Guice is better, but I am still skeptical.&lt;/p&gt;

&lt;p&gt;&lt;a href='https://github.com/square/dagger'&gt;Dagger&lt;/a&gt; has potential though. The biggest reason I think so is that code generation makes the magic visible. And statically verified at build-time. Then both myself and the compiler can see what is happening. And trust it.&lt;/p&gt;</content>
 </entry>
 
 <entry>
   <title type="html">Aggregate Roots in SQL</title>
   <link href="http://draconianoverlord.com/2012/09/16/aggregate-roots-in-sql.html" />
   <updated>2012-09-16T00:00:00-07:00</updated>
   <id>http://draconianoverlord.com/2012/09/16/aggregate-roots-in-sql</id>
   <content type="html">&lt;h1 id='aggregate_roots_in_sql'&gt;Aggregate Roots in SQL&lt;/h1&gt;

&lt;p&gt;I&amp;#8217;m reading through a paper on Google&amp;#8217;s &lt;a href='http://static.googleusercontent.com/external_content/untrusted_dlcp/research.google.com/en/us/archive/spanner-osdi2012.pdf'&gt;Spanner&lt;/a&gt;, a distributed database that maintains transactions.&lt;/p&gt;

&lt;p&gt;One interesting snippet is how they&amp;#8217;ve added aggregate roots as a first-class concern in the schema, which is still SQL-eque, e.g.:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;CREATE TABLE Users {
  uid INT64 NOT NULL,
  email STRING
} PRIMARY KEY (uid), DIRECTORY;

CREATE TABLE Albums {
  uid INT64 NOT NULL,
  aid INT64 NOT NULL,
  name STRING
} PRIMARY KEY (uid, aid),
INTERLEAVE IN PARENT Users ON DELETE CASCADE;&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Where &lt;code&gt;DIRECTORY&lt;/code&gt; declares &lt;code&gt;Users&lt;/code&gt; as a top-level aggregate root, and &lt;code&gt;INTERLEAVE IN&lt;/code&gt; declares &lt;code&gt;Albums&lt;/code&gt; as a member of the &lt;code&gt;Users&lt;/code&gt; aggregate root.&lt;/p&gt;

&lt;p&gt;(I&amp;#8217;m using &amp;#8220;aggregate root&amp;#8221;, as terminology from Evan&amp;#8217;s &lt;a href='http://en.wikipedia.org/wiki/Domain-driven_design'&gt;Domain Driven Design&lt;/a&gt;.)&lt;/p&gt;

&lt;p&gt;I think this is a pretty exciting, albeit somewhat obvious, evolution of relational schemas to handle NoSQL-scale datasets.&lt;/p&gt;

&lt;p&gt;Obviously people have been doing this by hand for a long time, but moving it into the database just makes sense. (It&amp;#8217;s likely other systems I&amp;#8217;m not aware of had already done this; this is just the first I&amp;#8217;ve seen, and especially one that builds it into the SQL schema).&lt;/p&gt;

&lt;p&gt;I haven&amp;#8217;t finished the paper yet, so I don&amp;#8217;t know if they support cross-aggregate root joins, but they seem to definitely support cross-root transactions.&lt;/p&gt;

&lt;p&gt;So, this has me wondering: if giving the database locality hints works so well, why wasn&amp;#8217;t this done before? E.g. for just regular/pre-NoSQL/on-spinning-disks databases?&lt;/p&gt;

&lt;p&gt;Even without being distributed, it seems like it&amp;#8217;d make sense for a database to be able to layout related data (all of the records within an aggregate root) close together on disk, to minimize IO.&lt;/p&gt;

&lt;p&gt;E.g. if all of an employee&amp;#8217;s transactions where in basically-sequential order on disk, retrieving them would be cache-/IO-friendly and so very fast.&lt;/p&gt;

&lt;p&gt;A few thoughts:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;With SSDs removing the spinning disk, I suppose physical layout on disk isn&amp;#8217;t that important anymore, at least for single-machine databases. Although that doesn&amp;#8217;t answer why it wasn&amp;#8217;t done before&amp;#8230;&lt;/p&gt;
&lt;/li&gt;

&lt;li&gt;
&lt;p&gt;It would require the database to constantly reorder data to fit the aggregate root grouping; this would likely be expensive, especially to do real-time.&lt;/p&gt;

&lt;p&gt;However, this reminds me of Google&amp;#8217;s &lt;a href='http://static.googleusercontent.com/external_content/untrusted_dlcp/research.google.com/en/us/archive/bigtable-osdi06.pdf'&gt;BigTable&lt;/a&gt;, which, after committing the data to a redo log to ensure consistency, delays the larger on-disk data reordering/compaction so that it&amp;#8217;s only done infrequently and not as part of a client request.&lt;/p&gt;
&lt;/li&gt;

&lt;li&gt;
&lt;p&gt;Perhaps this is actually doable and pretty routine/old-hat with various databases out there?&lt;/p&gt;

&lt;p&gt;I haven&amp;#8217;t noticed it before, but that does not mean much.&lt;/p&gt;
&lt;/li&gt;

&lt;li&gt;
&lt;p&gt;Perhaps it actually would not help that much.&lt;/p&gt;

&lt;p&gt;I&amp;#8217;m not an expert on RDMBS literature, or a wide variety of systems, so perhaps this was tried already and deemed not worth it. Seems unlikely to my current intuition, but I suppose I should trust 30 years of RDMBS research over my 5 minutes of musing.&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Seems like a good question for an RDMBS implementation expert, not that I know any to go bug them about it..&lt;/p&gt;</content>
 </entry>
 
 <entry>
   <title type="html">Building Sane Rich User Interfaces</title>
   <link href="http://draconianoverlord.com/2012/09/15/sane-rich-uis.html" />
   <updated>2012-09-15T00:00:00-07:00</updated>
   <id>http://draconianoverlord.com/2012/09/15/sane-rich-uis</id>
   <content type="html">&lt;h1 id='building_sane_rich_user_interfaces'&gt;Building Sane Rich User Interfaces&lt;/h1&gt;

&lt;p&gt;I&amp;#8217;ve been meaning to write a &amp;#8220;lessons learned&amp;#8221; or &amp;#8220;best practice&amp;#8221; sort of post on how I think rich UI applications can be made simpler and less buggy by changing your mindset from an imperative one to a declarative one.&lt;/p&gt;

&lt;p&gt;But, until I do that, I thought it might be better to start with a concrete example. Sometimes its easier to think about something small and concrete before abstracting into the theoretical.&lt;/p&gt;

&lt;p&gt;So, do that, I thought I&amp;#8217;d show an example refactoring of changing some UI code from an imperative approach to a declarative approach.&lt;/p&gt;

&lt;h1 id='the_example_problem'&gt;The Example Problem&lt;/h1&gt;

&lt;p&gt;We&amp;#8217;ll use a relatively small problem that I think is still a good illustration: managing a list of tabs.&lt;/p&gt;

&lt;p&gt;E.g. the tabs might be &amp;#8220;Tab A&amp;#8221;, &amp;#8220;Tab B&amp;#8221;, and &amp;#8220;Tab C&amp;#8221;, and clicking on each tab hides the previous tab&amp;#8217;s content and shows the new tab&amp;#8217;s content.&lt;/p&gt;

&lt;p&gt;Instead of just building this functionality into a larger page, we&amp;#8217;ll break it out into a separate component, &lt;code&gt;Tabs&lt;/code&gt;.&lt;/p&gt;

&lt;h1 id='the_imperative_approach'&gt;The Imperative Approach&lt;/h1&gt;

&lt;p&gt;Jumping straight to the code, this is a slightly simplified version of an imperative approach to building the tabs:&lt;/p&gt;

&lt;pre class='brush:java'&gt;&lt;code&gt;public class Tabs extends CompositeIsWidget {

  private IsTabsView view = newTabsView();
  private ArrayList&amp;lt;IsWidget&amp;gt; panels = new ArrayList&amp;lt;IsWidget&amp;gt;();

  public Tabs() {
    setWidget(view);
  }

  public void addTab(String tabName, IsWidget panel) {
    IsTabView itemView = newTabView();
    panels.add(panel);
    if (panels.size() == 1) {
      itemView.listItem().addStyleName(&amp;quot;active&amp;quot;);
      show(panel);
    } else {
      hide(panel);
    }
    itemView.anchor().setText(tabName);
    itemView.anchor().addClickHandler(new ClickHandler() {
      public void onClick(ClickEvent event) {
        for (IsWidget p : panels) {
          hide(p);
        }
        for (int i = 0; i &amp;lt; view.list().getWidgetCount(); i++) {
          view.list().getIsWidget(i).removeStyleName(&amp;quot;active&amp;quot;);
        }
        show(panel);
        itemView.listItem().addStyleName(&amp;quot;active&amp;quot;);
      }
    });
    view.list().add(itemView);
  }
}&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;This code might be a little foreign if you&amp;#8217;re not used to GWT/Tessell development, but I think in general it&amp;#8217;s pretty easy to follow.&lt;/p&gt;

&lt;p&gt;The &lt;code&gt;newXxxView&lt;/code&gt; calls are instantiating templates from GWT&amp;#8217;s &lt;code&gt;ui.xml&lt;/code&gt; UiBinder files, so &lt;code&gt;newTabsView()&lt;/code&gt; returns a styled &lt;code&gt;ul&lt;/code&gt; tag, and &lt;code&gt;newTabView()&lt;/code&gt; returns a styled &lt;code&gt;li&lt;/code&gt; tag for each tab.&lt;/p&gt;

&lt;p&gt;In general, I don&amp;#8217;t think there is anything terribly wrong with this code; the variable names are good, the methods aren&amp;#8217;t egregiously long, and it&amp;#8217;s bug free.&lt;/p&gt;

&lt;p&gt;However, it is very imperative, in that it&amp;#8217;s reactive to user behavior instead of being proactive. For example, what makes it seem imperative is code like:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Whether a panel is hidden or shown is done reactively in 4 places (show+hide on initial add, show+hide on each click).&lt;/li&gt;

&lt;li&gt;The &lt;code&gt;active&lt;/code&gt; CSS class is added in two places, and removed in one place.&lt;/li&gt;

&lt;li&gt;We have an anonymous inner class that does multiple things in response to a click&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The result is actually not that bad in this case, but I think the approach, when used in larger components and applications, eventually leads to spaghetti code.&lt;/p&gt;

&lt;p&gt;You end up with lots of &amp;#8220;oh, remember to do this here and there and there&amp;#8221; lines in each place that responds to user input or logic changes. It becomes easy to miss places where updates need to happen, or not make the updates consistent, and results in bugs.&lt;/p&gt;

&lt;h1 id='the_declarative_approach'&gt;The Declarative Approach&lt;/h1&gt;

&lt;p&gt;So, now let&amp;#8217;s try and refactor this code to being more declarative.&lt;/p&gt;

&lt;p&gt;In taking on a declarative mindset, we want to think, without reacting to user input, how can we declare that some behavior of our UI should just happen?&lt;/p&gt;

&lt;p&gt;For example, let&amp;#8217;s think of the behaviors of our tabs:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;When I am the current tab, show my panel&lt;/li&gt;

&lt;li&gt;When I am the current tab, set &lt;code&gt;active&lt;/code&gt; on my &lt;code&gt;li&lt;/code&gt; tag&lt;/li&gt;

&lt;li&gt;When I am clicked, make myself the current tab&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Looking at these behaviors, the notion of a &amp;#8220;current tab&amp;#8221; is pretty apparent. Really all of our behaviors are based around which tab is the current tab.&lt;/p&gt;

&lt;p&gt;So, let&amp;#8217;s pull out that notion into an abstraction; let&amp;#8217;s make a &lt;code&gt;Tab&lt;/code&gt; and a &lt;code&gt;currentTab&lt;/code&gt;:&lt;/p&gt;

&lt;pre class='brush:java'&gt;&lt;code&gt;public class Tabs extends CompositeIsWidget {

  private IsTabsView view = newTabsView();
  private Tab currentTab;

  public void addTab(String tabName, IsWidget panel) {
    Tab tab = new Tab(tabName, panel);
    if (currentTab == null) {
      setCurrentTab(tab);
    }
    view.list().add(tab.view);
  }

  private void setCurrentTab(Tab tab) {
    if (currentTab != null) {
      // unstyle old active tab;
    }
    // style new tab
    currentTab = tab;
  }

  private class Tab {
    private IsTabView view = newTabView();
    private Tab(String tabName, IsWidget panel) {
      view.addClickHandler(new ClickHandler() {
        public void onClick(ClickEvent e) {
          setCurrentTab(Tab.this);
        }
      });
    }
  }&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;This is better. We&amp;#8217;ve moved the styling updates into one place, &lt;code&gt;setCurrentTab&lt;/code&gt;, so things are not as spread out.&lt;/p&gt;

&lt;p&gt;However, we&amp;#8217;re still stuck in the reactive model; we&amp;#8217;re having to remember to call &lt;code&gt;setCurrentTab&lt;/code&gt; in all the right places. (Which, granted, in this small example is only two places).&lt;/p&gt;

&lt;p&gt;The stumbling block is that &lt;code&gt;currentTab&lt;/code&gt; is just a field&amp;#8211;we can&amp;#8217;t react to its changes unless we manually add code before/after our own setting of the field.&lt;/p&gt;

&lt;p&gt;Rich UI frameworks, Tessell included, solve this by promoting simple fields into stateful properties, which can watch their current value, and call observers when it is changed. The property is the foundation for models in traditional MVC.&lt;/p&gt;

&lt;p&gt;So, let&amp;#8217;s change &lt;code&gt;currentTab&lt;/code&gt; to a property:&lt;/p&gt;

&lt;pre class='brush:java'&gt;&lt;code&gt;private BasicProperty&amp;lt;Tab&amp;gt; currentTab =
  basicProperty(&amp;quot;currentTab&amp;quot;);&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Since we have an abstraction around the value instead of just the value, we can now setup declarations around the abstraction, the property, and not just the value itself.&lt;/p&gt;

&lt;p&gt;To see how well this works out, we can re-examine our 3 behaviors, and see how they can be translated into declarations. (We&amp;#8217;ll use Tessell&amp;#8217;s DSL, but the same abstractions could be done in any MVC framework.)&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;&amp;#8220;When I am the current tab, show my panel&amp;#8221; can look like:&lt;/p&gt;

&lt;pre class='brush:java'&gt;&lt;code&gt;binder.when(currentTab).is(this).show(panel);&lt;/code&gt;&lt;/pre&gt;
&lt;/li&gt;

&lt;li&gt;
&lt;p&gt;&amp;#8220;When I am the current tab, set &lt;code&gt;active&lt;/code&gt; on my &lt;code&gt;li&lt;/code&gt; tag&amp;#8221; can look like:&lt;/p&gt;

&lt;pre class='brush:java'&gt;&lt;code&gt;binder.when(currentTab).is(this).set(active).on(view.li());&lt;/code&gt;&lt;/pre&gt;
&lt;/li&gt;

&lt;li&gt;
&lt;p&gt;&amp;#8220;When I am clicked, make myself the current tab&amp;#8221; can look like:&lt;/p&gt;

&lt;pre class='brush:java'&gt;&lt;code&gt;binder.onClick(view.anchor()).set(currentTab).to(this);&lt;/code&gt;&lt;/pre&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;And that&amp;#8217;s it.&lt;/p&gt;

&lt;p&gt;With a property-based DSL, like in Tessell, we have a 1-to-1 mapping between a behavior and 1 line of code.&lt;/p&gt;

&lt;p&gt;This behavior should then &amp;#8220;just work&amp;#8221; as the program runs. Whether in response to user input, or other business logic code changing the model, we don&amp;#8217;t care; our behavior should implicitly stay correctly applied.&lt;/p&gt;

&lt;h1 id='the_final_code'&gt;The Final Code&lt;/h1&gt;

&lt;p&gt;So, here&amp;#8217;s the full refactored code example:&lt;/p&gt;

&lt;pre class='brush:java'&gt;&lt;code&gt;public class Tabs extends CompositeIsWidget {

  private IsTabsView view = newTabsView();
  private Binder binder = new Binder();
  private BasicProperty&amp;lt;Tab&amp;gt; currentTab = basicProperty(&amp;quot;currentTab&amp;quot;);

  public Tabs() {
    setWidget(view);
  }

  public void addTab(String tabName, IsWidget panel) {
    Tab tab = new Tab(tabName, panel);
    currentTab.setIfNull(tab);
    view.list().add(tab.view);
  }

  private class Tab {
    private IsTabView view = newTabView();

    private Tab(String tabName, IsWidget panel) {
      view.anchor().setText(tabName);
      binder.when(currentTab).is(this).show(panel);
      binder.when(currentTab).is(this).set(&amp;quot;active&amp;quot;).on(view.listItem());
      binder.onClick(view.anchor()).set(currentTab).to(this);
    }
  }
}&lt;/code&gt;&lt;/pre&gt;

&lt;h1 id='when_this_works'&gt;When This Works&lt;/h1&gt;

&lt;p&gt;Obviously the success of this approach depends on the robustness of the binding DSL, as the DSL has to support the various behaviors you want to perform in a generic way.&lt;/p&gt;

&lt;p&gt;Specifically for Tessell, I won&amp;#8217;t assert that Tessell&amp;#8217;s DSL is as refined as something like Hamcrest, which I think is the prototypical DSL in the Java world. But it supports a pretty wide array of behaviors, and is getting more when they are added as needed.&lt;/p&gt;

&lt;p&gt;Besides the DSL, you also have to represent your problem domain as these stateful properties, basically models, to allow them to be passed declaratively into the DSL and for the DSL to be able to register observers and react to them.&lt;/p&gt;

&lt;p&gt;I don&amp;#8217;t think either of these are large stumbling blocks, but they do require massaging your codebase to fit the MVC/property/DSL approach, instead of just using raw DTOs.&lt;/p&gt;

&lt;h1 id='rules_of_thumb'&gt;Rules of Thumb&lt;/h1&gt;

&lt;p&gt;I have a few rules of thumb for thinking declaratively:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;If you find yourself both &lt;em&gt;doing&lt;/em&gt; a behavior and later &lt;em&gt;undoing&lt;/em&gt; that behavior (like applying then unapplying a style), this is a smell there there&amp;#8217;s some higher level abstraction that you could setup declaratively instead of performing it imperatively in multiple places.&lt;/p&gt;
&lt;/li&gt;

&lt;li&gt;
&lt;p&gt;Manual event listeners (e.g. anonymous inner classes for &lt;code&gt;addClickListener&lt;/code&gt; in GWT, or anonymous functions for &lt;code&gt;element.onclick&lt;/code&gt; in JavaScript) should be used sparingly, and, if really needed, be as tiny as possible.&lt;/p&gt;

&lt;p&gt;Anonymous inner classes (or callback functions in JavaScript) are a smell that you&amp;#8217;re doing something reactively, e.g. responding to specific UI input and events, that you&amp;#8217;ll probably have to react to elsewhere in a different way, and so there might be an abstraction you could pull out.&lt;/p&gt;

&lt;p&gt;(Tangentially, I think this minimizes the downside of using the Java language, which lacks closures, for rich UI development. Even if you had closures for reacting to user input, it&amp;#8217;s still a reactive model, and so will lead to the same spaghetti code, albeit with less anonymous inner class boilerplate.)&lt;/p&gt;
&lt;/li&gt;

&lt;li&gt;
&lt;p&gt;Try and think &amp;#8220;1 behavior == 1 line of code&amp;#8221;, and then extend your DSL as necessary to fit that.&lt;/p&gt;

&lt;p&gt;Even if your behavior is not terribly generic, I think having it in your DSL and out of your controller/presenter code is a better separation of concerns anyway.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;h1 id='conclusion'&gt;Conclusion&lt;/h1&gt;

&lt;p&gt;So, those are my thoughts on building a rich UI in a sane way.&lt;/p&gt;

&lt;p&gt;I don&amp;#8217;t think this is terribly novel, as MVC is old-hat 1980s/90s stuff, albeit reinvented in Web 2.0 apps, and I think everyone generally acknowledges that declarative programming can be, for the right problems, much more succinct than imperative programming.&lt;/p&gt;

&lt;p&gt;But I think, even if its not novel, its easy for developers to stay in an imperative mindset if you&amp;#8217;re not consciously thinking of looking for abstractions and pulling out declarations as they become apparent.&lt;/p&gt;</content>
 </entry>
 
 <entry>
   <title type="html">Screencasts in Linux</title>
   <link href="http://draconianoverlord.com/2012/09/14/screencasts-in-linux.html" />
   <updated>2012-09-14T00:00:00-07:00</updated>
   <id>http://draconianoverlord.com/2012/09/14/screencasts-in-linux</id>
   <content type="html">&lt;h1 id='screencasts_in_linux'&gt;Screencasts in Linux&lt;/h1&gt;

&lt;p&gt;A few times a year, I try and record a screencast for one of my open source projects, and I always waste the first 15-30 minutes re-learning how to do screencasts in Linux.&lt;/p&gt;

&lt;h2 id='using_ffmpeg'&gt;Using ffmpeg&lt;/h2&gt;

&lt;p&gt;After trying a variety of GUI programs (like &lt;code&gt;gtk-recordMyDesktop&lt;/code&gt;) with various degrees of success, I&amp;#8217;ve had the best results just running &lt;code&gt;ffmpeg&lt;/code&gt; from the command line. Which, given this is Linux, I guess shouldn&amp;#8217;t be too surprising.&lt;/p&gt;

&lt;p&gt;For my own future reference, the magic incantation I&amp;#8217;ve been using is:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;ffmpeg
  -f alsa -ac 2 -ar 44100 -i pulse
  -f x11grab -r 15 -s 1920x1080 -i :0.0
  -qscale 4 output.flv&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;This records directly to Flash, which, yeah, yeah, isn&amp;#8217;t HTML5, but means, for now, it just works in all browsers.&lt;/p&gt;

&lt;p&gt;Two tips for getting high quality screencasts with &lt;code&gt;ffmpeg&lt;/code&gt; are:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;Ensure you have a recent version of ffmpeg.&lt;/p&gt;

&lt;p&gt;I have had good luck with &lt;code&gt;0.10.4-6:0.10.4-0ubuntu0jon2~oneiric1&lt;/code&gt;, which, IIRC, is newer than the one in the default Ubuntu repos, which I was originally having quality issues with.&lt;/p&gt;
&lt;/li&gt;

&lt;li&gt;
&lt;p&gt;Ensure your mic isn&amp;#8217;t over-amplified.&lt;/p&gt;

&lt;p&gt;I was getting terrible sounding audio for a long time before I realized my mic was over-amplified. I&amp;#8217;m not sure how it got that way; it seems like some applications take it upon themselves to adjust it.&lt;/p&gt;

&lt;p&gt;You can check your mic settings in Sound Settings &amp;gt; Input, where there is a slider. I find &amp;#8220;Unamplified&amp;#8221; is too low, but &amp;#8220;100%&amp;#8221; is way too high&amp;#8211;you get a constant buzz throughout the screencast.&lt;/p&gt;

&lt;p&gt;Putting the slider at ~20% works best for me, but it should be easy to find the ideal setting for you by doing a few test screencasts.&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;

&lt;h2 id='using_flowplayer'&gt;Using FlowPlayer&lt;/h2&gt;

&lt;p&gt;&lt;a href='http://flowplayer.org/'&gt;FlowPlayer&lt;/a&gt; is a free Flash movie player that I admittedly struggled to setup. Especially in pages that are generated by jekyll, as FlowPlayer is very finicky about the HTML being exactly what it expected.&lt;/p&gt;

&lt;p&gt;So, along with the regular setup instructions of including their JavaScript file/etc., my current incantation that works in a jekyll/Markdown page is:&lt;/p&gt;

&lt;pre class='brush:html'&gt;&lt;code&gt;&amp;lt;p&amp;gt;
  &amp;lt;a href=&amp;quot;http://.../screencast.flv&amp;quot; style=&amp;quot;display:block;width:520px;height:330px;margin-left:1em;&amp;quot; id=&amp;quot;player&amp;quot;&amp;gt; &amp;lt;/a&amp;gt;
  &amp;lt;script type=&amp;quot;text/javascript&amp;quot;&amp;gt;&amp;lt;!-- 
    flowplayer(&amp;quot;player&amp;quot;, &amp;quot;casts/flowplayer-3.2.7.swf&amp;quot;, { clip: { autoPlay: false } });
  --&amp;gt;&amp;lt;/script&amp;gt;
&amp;lt;/p&amp;gt;&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;The subtle but important thing to note here is the space before the closing &lt;code&gt;&amp;lt;/a&amp;gt;&lt;/code&gt; tag; this keeps jekyll from munging the HTML into something that the FlowPlayer JavaScript doesn&amp;#8217;t like.&lt;/p&gt;

&lt;h2 id='good_luck'&gt;Good Luck&lt;/h2&gt;

&lt;p&gt;So, that is what works for me. YMMV. If you end up recording any screenshots of your programming projects with this, post a link in the comments, I&amp;#8217;d enjoy checking it out.&lt;/p&gt;</content>
 </entry>
 
 <entry>
   <title type="html">Tessell Gets Member Changed Events</title>
   <link href="http://draconianoverlord.com/2012/08/30/tessell-gets-member-changed-events.html" />
   <updated>2012-08-30T00:00:00-07:00</updated>
   <id>http://draconianoverlord.com/2012/08/30/tessell-gets-member-changed-events</id>
   <content type="html">&lt;h1 id='tessell_gets_member_changed_events'&gt;Tessell Gets Member Changed Events&lt;/h1&gt;

&lt;p&gt;Tessell got a feature I&amp;#8217;d been meaning to add for awhile: member changed events.&lt;/p&gt;

&lt;p&gt;Tessell has always had property changed events, e.g. &lt;code&gt;StringProperty&lt;/code&gt; gets a new value, so it fires a change event, which, via data-binding, auto-updates the view:&lt;/p&gt;

&lt;pre class='brush:java'&gt;&lt;code&gt;binder.bind(someStringProperty).to(view.someNameField);&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Which has handled a surprising number of use cases over Tessell&amp;#8217;s lifetime.&lt;/p&gt;

&lt;p&gt;But the other hip JavaScript MVC frameworks usually have this notion of &amp;#8220;member changed&amp;#8221; (my made-up term), where if you have a model:&lt;/p&gt;

&lt;pre class='brush:java'&gt;&lt;code&gt;class EmployeeModel {
  StringProperty name;
  StringProperty city;
}&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;You can, besides listening to &lt;code&gt;name&lt;/code&gt;/&lt;code&gt;city&lt;/code&gt; change events, listen to change events on the model, and get notified when any of the properties of the model changes.&lt;/p&gt;

&lt;p&gt;This also typically works recursively up a model tree, e.g. if you add a parent/child model:&lt;/p&gt;

&lt;pre class='brush:java'&gt;&lt;code&gt;class EmployeeModel {
  StringProperty name;
  List&amp;lt;AddressModel&amp;gt; addresses;
}

class AddressModel {
  StringProperty city;
}&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Then any change of an address&amp;#8217;s &lt;code&gt;city&lt;/code&gt; property will not only fire the event on &lt;code&gt;AddressModel&lt;/code&gt;, but also on the parent &lt;code&gt;EmployeeModel&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;Well, after thinking Tessell should have that for awhile, I finally got around to it:&lt;/p&gt;

&lt;p&gt;&lt;a href='https://github.com/stephenh/tessell/commit/47b0c12fe864a7ba1848fece486a13d8d01242f5'&gt;Add MemberChangedEvent to percolate changes up the model tree.&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Coincidentally, I had a chance to use this today, with some data-binding:&lt;/p&gt;

&lt;pre class='brush:java'&gt;&lt;code&gt;binder.onMemberChange(employeeModel).execute(saveCommand);&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;So that any time any property on the &lt;code&gt;EmployeeModel&lt;/code&gt; changes, we send it to the server (yes, doing this manually is terribly unhip, compared to transparent persistence frameworks like Meteor).&lt;/p&gt;

&lt;p&gt;&amp;#8230;in retrospect, &lt;code&gt;ModelChangedEvent&lt;/code&gt; probably would have been a better name. Although &lt;code&gt;ListProperty&lt;/code&gt; also fires these &amp;#8220;something I own changed&amp;#8221; events, so that is probably why I shied away from &amp;#8220;model&amp;#8221;.&lt;/p&gt;

&lt;p&gt;I had also thought of reusing &lt;code&gt;PropertyChangedEvent&lt;/code&gt;, instead of creating a new event type. This seems to be what some of the other frameworks do, e.g. Backbone, which just have &lt;code&gt;change&lt;/code&gt;, which is used to denote either a property itself changing or the model implicitly changing. Perhaps this would have been better, but for now I thought the semantic distinction seemed worth making it a separate event type.&lt;/p&gt;</content>
 </entry>
 
 
</feed>
