john pignata

Talk: Asynchronous Service Oriented Design

2013-07-01T00:00:00+00:00

Slides

Video

Talk: Code Smells: Your Refactoring Cheat Codes

2013-04-27T00:00:00+00:00

Sure, the TDD cycle is red-green-refactor but what exactly are we refactoring? We just wrote the code, it's green, and it seems reasonable to us. Let's move onto the next test. We have a deadline, remember?

Whether we're working with code we just wrote or opening up a project for the first time, being able to listen to the hints the code is trying to give you about how it wants to be constructed is the most direct path toward successful refactoring. What the code is telling you nuanced however: no code smell is absolute and none in itself is an indication of a problem. How do we know we need to refactor? What are the code smells telling us to do? How do we balance our short terms needs of shipping our software with the long term maintainability of the code base? In this talk we'll talk through some of the classical code smells and dive into examples of how to put these smells to work for you.

Slides

Video

Hit The Switches

2013-04-14T00:00:00+00:00

jpignata/switches

Ruby developers love continuous deployment. Don't believe me? Just find one and ask them, “say, how often do you deploy a day?” They won't answer, though, since they'll be too busy kicking off another deploy from their phone. The trend is a great thing – many teams measure the time between features and fixes being completed and being live in front of real users in minutes or hours instead of days or weeks.

Flickr talked extensively about using feature flippers to eliminate long running branches while still ensuring the deploy pipeline isn't blocked by work in progress. Using feature flags, Flickr continued to deploy multiple times a day while all work was continuously integrated in the same main branch that was being continuously deployed.

In the intervening years we've gotten more sophisticated in our approach to feature flags. We gate not just on the running code's environment but also down to request-specific parameters such as the current user. In our Ruby projects, using tools like rollout or flipper we can turn a feature on to a specific set of users or to some arbitrary percentage of users. Using these tactics we can conditionally expose a feature to a small subset of users. This can allow us to both get feedback on the feature and see it perform under real-world conditions before going fully live. We can iterate and tune and optimize our feature before lifting the curtain to our full population of users.

Since we often have many application server instances running, these projects will use a backend to coordinate the sharing of feature switches state and each feature switch will result in some kind of query. We read this data far more often than we write this data so it feels like we should be aggressively caching on our application servers. This introduces a trade-off: if we cache aggressively and expire this cache after some time-to-live period, it's possible the nodes won't agree on the state of a feature flag. This could cause a user to see a feature appear and disappear between requests. Ideally the application server nodes would be lazy in fetching new configuration data, but if we decide to make a change it should take effect globally and as close to immediately as possible.

I've published an experimental gem to manage feature switches in a Ruby application. Switches works much the same as existing projects but has the explicit design goal of ensuring the least possible chatter between application server instances and the shared backend. Instead of querying Redis or Mongo for each feature switch we add to a given execution path, switches uses in-memory structures for storing this data. Whenever this data is changed a change notification is delivered via a pub/sub bus which triggers a refresh of its in-memory cache.

# config/initializers/switches.rb
$switches = Switches do |config|
  config.backend = "redis://localhost:6379/0"
end

# app/controllers/posts_controller.rb
def index
  if $switches.feature(:websockets_chat).on?(current_user.id)
    @chat = ChatConnection.new(current_user)
 end

  if $switches.feature(:redesign).on?(current_user.id)
    render :index, layout: :redesign
  end
end

# In an IRB session; once ran a change notification will be broadcast
# to all listening nodes. Each node will then refresh its data for the
# "redesign" feature.

# On for 10% of users
$switches.feature(:websockets_chat).on(10)

# Add user IDs 5, 454, and 2021 to the power_users cohort
$switches.cohort(:power_users).add(5).add(454).add(2021)

# On for users in the power_users cohort
$switches.feature(:redesign).add(:power_users)

Switches uses either Redis or Postgres for storage and coordination. I'm hoping to experiment with other backends soon (Zookeeper and CouchDB's _change feed both seem promising.)

I've not yet gotten this gem production-ready yet, but will continue working on it over the next few weeks. If you're using a feature switch library and are concerned about network chatter get in touch.

Talk: A Brief Aside on Request Queueing

2013-03-12T00:00:00+00:00

Request queueing is the topic du jour amongst Ruby developers operating web systems. What's actually going on in that fat green band in New Relic? In this short talk we'll look at how that number is calculated, what it suggests, and what we should do about it.

Talk: Being for the Benefit of Future Developer

2013-02-15T00:00:00+00:00

Slides

Video

Your Objects, the Unix Way: Applying the Unix Philosophy to Object-Oriented Design

2012-11-28T00:00:00+00:00

This week I wrote a guest post on the Code Climate blog exploring the link between the Unix philosophy and object-oriented design.

In 1964, Doug McIlroy, an engineer at Bell Labs, wrote an internal memo describing some of his ideas for the Multics operating system. The surviving tenth page summarizes the four items he felt were most important. The first item on the list reads:

“We should have some ways of coupling programs like [a] garden hose – screw in another segment when it becomes necessary to massage data in another way.”

This sentence describes what ultimately became the Unix pipeline: the chaining together of a set of programs such that the output of one is fed into the next as input.

Continue reading on Code Climate

Concurrency Patterns in Ruby: Futures

2012-11-19T00:00:00+00:00

A future is a concurrency construct that allows a programmer to schedule work in a background thread while continuing execution of the program. When the value of the calculation is needed it's requested from the future proxy which will either return it or block until it can if the value is not yet available. Futures abstract some of the complexity inherent in scheduling a task to run in the background and polling for its completion. We get asynchronous calculations that happen concurrent to our main program's execution without callback spaghetti and we're guaranteed to be able to get at the result when we require it.

Sometimes it feels like we're supposed to fear concurrency as terrifying complexity, but with the right patterns and practices concurrent programming is more than manageable. We'll look at how a future could be implemented in ruby and then dig into some examples to illustrate where they could be useful.

A Naive Implementation

Let's scratch out a basic implementation of a Future proxy in ruby. In its most basic form, it could look something like this:

class Future < BasicObject
  def initialize(callable)
    @thread ||= ::Thread.new { callable.call }
  end

  def value
    @thread.value
  end

  def inspect
    if @thread.alive?
      "#"
    else
      value.inspect
    end
  end

  def method_missing(method, *args)
    value.send(method, *args)
  end

  def respond_to_missing?(method, include_private = false)
    value.respond_to?(method, include_private)
  end
end

We start with an object that derives from BasicObject and is instantiated with a Proc or other object that responds to call. In the initializer a background thread is created and the callable is called within it. Any methods received by Future will be proxied to Thread#value which is the last returned value from the thread. If the Thread is still working on calculating the value, this call will block until the Thread is finished. This ensures that a caller can always retrieve the value if it's needed. Future also has an inspect method which will return a static string if the Thread is still running or defer to the value if the Thread is finished.

We'll also add a convenience method to Kernel so we can use it anywhere we want.

module Kernel
  def future(&block)
    Future.new(block)
  end
end

This method instantiates a new Future which runs the given block and returns the proxy back to the caller. Now we can use the future method to dispatch arbitrary tasks for background execution and use the returned proxy to access the computed values later in execution.

>> calculation = future { 4 * 4 }
=> #
>> calculation.value
=> 16
>> calculation
=> 16

If we tried to access the result of a long running calculation, it'd block until that value was available:

As Future calls the block in a background thread, execution of multiple futures will happen concurrently:

futures = [
  future { sleep 2 },
  future { sleep 2 },
  future { sleep 2 }
]

futures.each(&:value)

We're building three futures here with a block in each that will sleep for two seconds. If we were executing these serially, we'd expect about six seconds of execution time as the iterator calls and sleeps for two seconds on each execution.

jp@oeuf:~/workspace/tmp$ time ruby futures.rb

real  0m2.032s
user  0m0.024s
sys   0m0.006s

Each future started executing as soon as it was created in the background. Since the three blocks were running at the same time, our dummy script executes in about two seconds.

Service Oriented Design

A practical example of why concurrency is important and where a pattern like this might apply is within service oriented systems. As we continue to break down our monolithic applications into services and daemons, within a web request we may need to make some number of remote service calls in order to render a page of content. A user might be authenticated via a user service, the page's content might be stored in a remote CMS service, and recommendations for a user might be stored within a recommender service. Of those three service calls at least two can be run independently from the output of any other. When you don't use concurrency to make these calls, it's like cooking dinner with only one burner on your stove-top. You're now cooking one dish at a time so it takes longer to get the food on the table.

In an oft-referenced ACM Queue article, Werner Vogels asserted that Amazon could make up to something like 100 service calls to assemble a page for a visitor. In the same year, Amazon published their findings about the relationship between their service's latency and customers' purchasing habits. For every 100ms delay they were able to slice off of page load times, sales increased by 1%. So even if each request took 1 millisecond, if they were all done serially that would cost 100 milliseconds and possibly one percent of sales. Considering the complexity of what these services likely do, it seems reasonable that many take longer to execute than that. Running these requests concurrently is one way to ensure that response time doesn't bloat in line with the number of services that are required to assemble a response.

Using concurrent requests when distributing of responsibilities across network-available services will make the most efficient use of our resources, will reduce wall clock running time for requests, and allow our systems to handle more transactions over a given amount of time.

A Hacker News Crawler

Let's look at a contrived example. How would RMS read Hacker News? Probably from the command line. Maybe even while eating breakfast, but I wouldn't bring it up. Let's build a simple script using Nokogiri to grab some URLs from the first page of Hacker News, fetch each page, and print each article's title and content onto STDOUT.

First, we'll define a simple Page object to represent an HTML document.

require "open-uri"
require "nokogiri"

class Page
  def initialize(url)
    @url = url
  end

  def links
    document.css("a").map { |anchor| anchor["href"] }
  end

  def paragraphs
    document.css("p").map { |paragraph| paragraph.text }
  end

  def title
    node = document.css("title")
    node && node.text
  end

  def get
    document
    self
  end

  private

  def document
    @document ||= Nokogiri::HTML(content)
  end

  def content
    open(@url)
  end
end

A Page is instantiated with a URL and a call to get will pre-load the document. There are a couple of accessor methods for what we suspect we'll need such as the title, all of the links on a page, and all of the content stored in paragraphs.

We'll create an object to represent the Hacker News homepage called Index:

class Index
  URL = "http://news.ycombinator.com"

  def initialize
    @page = Page.new(URL)
  end

  def urls
    links = @page.links.select { |link| link.start_with?("http") }
    links[1..25]
  end
end

This object is composed of a Page and has a method to return 25 of the absolute urls scraped from the HTML. This is a small cheat to exclude internal navigation links from the header.

Next we'll put together a simple Crawler to use these objects to get the content from Hacker News. Our first stab at this will be synchronous so each page will be fetched in serial:

class Crawler
  def initialize(index)
    @index = index
  end

  def crawl
    pages.each do |page|
      Outputter.new(page).output
    end
  end

  private

  def pages
    @index.urls.map do |url|
      Page.new(url).get
    end
  end
end

The Crawler expects an Index object which responds to urls to be passed into its initializer. Once it gets each page it'll pass that to Outputter: a pretty printer used to display our contents neatly to the terminal. We'll use the HighLine gem to handle most of the real work:

require "highline"

class Outputter
  OUTPUT_WIDTH = 79

  def initialize(page)
    @page = page
  end

  def output
    highline.say("-" * OUTPUT_WIDTH)
    highline.say(@page.title)
    highline.say("-" * OUTPUT_WIDTH)

    highline.say(@page.paragraphs.join("\n\n"))
    highline.say("\n\n")
  end

  private

  def highline
    @highline ||= HighLine.new($stdin, $stdout, OUTPUT_WIDTH)
  end
end

Finally, we'll glue together a small script to introduce our Crawler and Index and drive our program:

require "future"
require "crawler"
require "index"
require "page"
require "outputter"

Crawler.new(Index.new).crawl

Since we're doing this synchronously, let's use time to figure out how long this takes to run:

jp@oeuf:~/workspace/crawler$ time ruby ./crawl
-------------------------------------------------------------------------------
The Quiet Ones - NYTimes.com
-------------------------------------------------------------------------------

EVER since I quit hanging out in Baltimore dive bars, the only place where I
still regularly find myself in hostile confrontations with my fellow man is
Amtrak’s Quiet Car. The Quiet Car, in case you don’t know, is usually the first
car in Amtrak’s coach section, right behind business class. Loud talking is
...

real  0m32.405s
user  0m2.403s
sys   0m0.161s

Pretty pokey. Now let's use our Future object to run the requests. We'll modify Crawler to wrap each Page#get call in a future block.

class Crawler
  ...

  private

  def pages
    @index.urls.map do |url|
      future { Page.new(url).get }
    end
  end
end

Let's run it again using time:

jp@oeuf:~/workspace/crawler$ time ruby ./crawl
...

real  0m6.942s
user  0m2.296s
sys   0m0.164s

It's 4.5x faster because each request starts when we call future and happens concurrently. It's like 25 people making a phone call at the same time versus one person making 25 phone calls one at a time.

This finished command line application is on GitHub.

Our Concurrent Futures

The continued emphasis on service-oriented systems and the reality that we're likely to keep getting more cores rather than faster processors in our computers will make concurrency tools even more important when building our applications. Patterns like futures allow us to more easily reason about what's actually happening in a concurrent program. While our naive implementation isn't suitable for real world use due to its lack of any error handling or the absence of a pool of threads, the Celluloid library has a futures implementation that is ready to be used in your production applications.

Being for the Benefit of Future Developer

2012-11-11T00:00:00+00:00

A successful software project is likely to pass between many developers in its lifetime. You are one link in your project's chain of custody and every line of code you commit to your project is an artifact you're leaving to be discovered by Future Developer. Just as you've inherited the decisions of the developers that came before you, other developers will inherit the decisions you're making today. Onto them we bequeath our misunderstandings, our shortcuts, our applications of half-understood patterns and technologies, our inconsistencies, our inattention to detail, our procrastinations, our quick-and-dirty changes, our hidden skeletons, our dirty laundry. More rarely, they will be the beneficiaries of our discipline, deliberation, and preparation.

As a developer on your project you are in the best possible position to empathize with and anticipate the needs of Future Developer. Every good decision we make for our project will have ripple effects on his or her productivity. Why is this important? As Bob Martin asks in Clean Code, “Have you ever been significantly impeded by bad code? So then – why did you write it?” The same strategies to improve the conditions for future generations of teams working on your project will serve your team well in the present. When you come back to some obscure corner of the codebase that you cobbled together six months ago, you're likely to have only a little more context than Future Developer will when he or she sees it or the first time. The clues and polish you've left for other developers will benefit your future self. Projects that are poorly maintained are draining to contribute to and lead to team attrition. Investing in the quality and future maintainability of the software you're creating is an investment in a happy, productive workplace for the present and future.

I'm going to pick a few practices in no particular order that we can use to set Future Developer up for success.

1. Refactor toward Consistency

As projects age and requirements become more complex, we tend to introduce new patterns and designs to manage this complexity. It's hard to tell if a pattern or approach is pulling its weight immediately. Most of the time the feedback that proves or disproves its value comes when another developer has to make a change to that area of the codebase. Sometimes these patterns grow into conventions that we begin to reach for to solve problems.

There's immense benefit to that: conventions communicate intent. If we tend to solve problems in the same sorts of ways in a codebase, Future Developer can start to predict how pieces of the codebase work together reducing the amount of time necessary to diagnose problems and implement changes.

Often what we leave behind is a hodgepodge of patterns which never quite became conventions or have been ignored in the codebase as old cruft. This happens for a variety of reasons: the conventions introduced didn't work well enough to make it into other areas of the codebase or maybe new developers didn't know there was a convention or pattern for handling a given requirement.

Rails' opinions and conventions are powerful. They allow developers to join a project and quickly be productive if they've had any exposure to projects that have used the framework. Sometimes we muddy these conventions and dilute their power. For example, in Rails systems we sometimes see controllers built in many different styles. Some are composed using a project like resource_controller, others follow the standard Rails resources convention while others are junk drawers of random actions. Another common anti-pattern is having configuration data sprinkled and initialized all throughout your system.

Don't have half a dozen different ways of configuring aspects of your system and make it clear how a controller should be built in your system. Once you've experimented for a while and have settled on an approach, take the time to go back to previous work and refactor into the new pattern. This doesn't mean that you should add arbitrary constraints. There's good reason, for example, to have some configuration stored with the project and some stored in the environment to aid in deployment, but there should be one common structure and access pattern for using configuration data.

Add conventions to your README or selected documentation repository. This will give Future Developer a head start on adding functionality to the system and to in understanding how its components are constructed.

2. Prune Dead Code

Another common characteristic of systems that have existed for some time is the collection of barnacles in the form of dead code. These components in your project may have at one time been providing business value but they've been deprecated and hidden from production for months. There's probably even a slew of full-stack acceptance tests validating those parts of the system are functioning and slowing down your test suite.

Sometimes we're reluctant to delete this code because we're not sure if the feature will be resurrected. Your product manager, when asked, might say “no, leave it, we may reuse that one day.” This is a false dilemma – carrying around a slowly rotting section of code for possible future reuse assumes that reusing those parts of the codebase involves just flicking a switch. If we're ignoring it because it's not actually live, it's not likely to be something we can just “turn on” without significant work. You're carrying that old code around like a boat anchor, wasting cycles maintaining it because there's a small chance you may possibly one day need part of it. Maybe. You don't know, but you spent a lot of time building it so rather than deleting it you allow the code to slowly rot in your repository.

What's even more costly is that the continued existence of this code is a possible trap for Future Developer. It detracts attention from the components of your system that are actually live and is a possible red herring when he or she is trying to understand or troubleshoot some aspect of the system. In a system down emergency, old, dead code is noise waiting to waste valuable time. Keeping the amount of code present in your code repository synchronized to the amount of code actually functioning in your live system will reduce overall maintenance costs and allow Future Developer to more quickly understand your entire system.

Delete code that isn't in use with abandon. It'll still be under source control if you need to refer to it later. Don't fall onto the wrong side of the fallacy that you might be able to “turn it back on again later.” If it had any value then why did you turn it off to begin with?

3. Leave a Coherent Paper Trail

Aside from our project itself, some of the tools we use in support of writing code have their own artifacts. For example, there are commonly accepted practices about what constitutes good git commit message hygiene and yet projects continue to accumulate commit histories like this contrived example:

jp@oeuf:~/workspace/blog(master*)$ git log --oneline app/controllers/application_controller.rb
8ec7f99 fuck i dunno lol
ffa919a shut up, atom parser
a33e9fa fixing again
cecc9dc one more time
968a28f fixing
3e3aeb2 ws
1fc597e pagination
edea155 adding dumb feature

When Future Developer ends up inevitably using git blame to get context about a given feature, leave him or her the details they need to understand the churn in the files in question. Use merge --squash, commit --amend, rebase, and friends to massage your commits into a coherent set before integrating your topic branch. Reword your commits after you're done – take a moment to include anything that seems relevant and summarize. Proofread for grammar and spelling; you're publishing something that somebody else will need to read and understand. Do Future Developer a favor and ensure you're leaving behind an intelligible paper trail that contains the right amount of detail.

jp@oeuf:~/workspace/blog(master*)$ git log app/controllers/application_controller.rb
commit 8ec7f998fb74a80886ece47f0a51bd03b0460c7a
Author: John Pignata 
Date:   Sat Nov 3 14:11:12 2012 -0400

    Add Google Analytics helper

commit 968a28f366e959081307e65253118a65301466f2
Author: John Pignata 
Date:   Sat Nov 3 13:49:50 2012 -0400

    Correct ATOM feed validation issues

    Using the W3C Validator (http://validator.w3.org/appc/), a few trivial
    errors were reported:

    *  should have a  node within it and not text
    * Timestamps should be in ISO8601 format

    This change fixes these issues and improves the spec coverage for the XML
    document.

commit 3e3aeb27ea99ecd612c436814c5a2b0dab69c2c3
Author: John Pignata 
Date:   Sat Nov 3 13:46:24 2012 -0400

    Fixing whitespace

    We're no longer indenting methods after `private` or `protected` directives
    as a matter of style. This commit fixes whitespace in all remaining
    classes.

commit 1fc597e788442e8cc774c6d11e7ac5e77b6c6e14
Author: John Pignata 
Date:   Sat Nov 3 12:34:50 2012 -0400

    Implement Kaminari pagination

    Move from will_paginate to kaminari in all controllers. The
    motivation is to be able to paginate simple Array collections
    without the monkey patching that will_paginate uses.

    * Consolidate helpers
    * Clean up whitespace

commit edea15560595bab044143149a7d6e528e8ae65d2
Author: John Pignata 
Date:   Sat Nov 3 12:27:16 2012 -0400

    Add ATOM feed for RSS readers

    * Include Nokogiri in Gemfile for its builder
    * Add AtomFieldBuilder model
    * Add link to feed from index page

4. Polish Your Interfaces

Some Ruby developers eschew method visibility for the methods in their objects. What's the point? Any method is really callable using send anyway. Why bother putting shackles around some methods? Just add that internal method to the pile and if Future Developer wants to use it he or she can! We're all adults, amirite?

If every object in your system is just a junk drawer of methods, it becomes very difficult for anyone (including you) to understand how each object was intended to be used and what messages it's intended to receive. The design of the public interface of an object should make it absolutely obvious how other objects in the system can interact with it. When each object's role and the interactions between the objects in your system are not obvious, it increases the amount of time it takes to understand not only each object but the system in toto.

Hide as much of a component's internals as possible to keep interface small and focused. Put extra energy into making sure your objects' public interfaces are obvious, well named, and consistent. This gives Future Developer clear signals about how you intend each object to be used and will highlight how each can be reused. Use explicit method visibility to communicate this intent and to enforce the surface area of the object's public interface.

5. Leave Comments, Not Too Many, Mostly RDoc

As developers our feelings about code comments can be best described as ambivalent. On one hand comments are extremely helpful in assisting a reader in understanding how a given piece of code works. On the other hand as nothing enforces their correctness, code comments are lies waiting to be told to the future. When asked developers will say they value documentation but often projects have very little beyond a mostly-out-of-date README and maybe a graveyard wiki somewhere. What's more, when working with open source libraries we'll often expect thorough RDoc documentation, an up-to-date README, and good example code and when not present we'll complain bitterly. Scumbag developer: doesn't maintain documentation, expects it from others.

As we pay more attention to things like the Single Responsibility Principle and use patterns to loosen the coupling between objects we start to see systems composed of many small objects wired together at runtime. While this makes systems more pliable and objects more reusable there's a trade-off: understanding an object's place within the larger system may be less obvious and as such take more effort. You can use all of the usual refactorings to eliminate pesky inline comments and make your object as readable as possible but it still might baffle Future Developer as to how the object fits into the system.

RDoc-style documentation can be found in many open source projects. When you're using Google to figure out if update_attribute fires callbacks or not or what the signature for select_tag is, you'll likely land on the extracted RDoc for Ruby on Rails. Writing similar documentation as part of your project will give Future Developer more context when he or she is trying to understand the role of an object in the larger context of your system. Adding a short, declarative sentence to the top of a class and/or method indicating what it does could have substantial value for future readers of the code. That said, without a strong shared culture of keeping these comments up to date they could have negative value and mislead a future reader of the code. The only thing worse than no documentation is incorrect documentation.

6. Write Intention Revealing Tests

One way we provide documentation to a project is through the tests we leave behind. These tests not only describe what the behavior of a given component is but it enforces this documentation is correct as it's executable. Unlike a comment we can't leave future lies in the test suite; it's either green or it isn't. Tools like RSpec and minitest/spec assist us in generating this by-product documentation by encouraging prose within the defining block of the example. Unfortunately we sometimes look past the English words we're typing in our rush to get to the actual code in the red-green-refactor cycle. The result of neglecting the English descriptions is that it's possible our tests are not properly reflecting our objects' behavior as well as we think they might be.

Almost as painful as finding a project with no test suite is finding a project whose test suite doesn't help in understanding how the system works. Tests are code which also needs to be maintained and as such they need to very clearly assert why they exist to a future reader.

it "works" do
  data = File.open("fixtures/projects.txt").read
  index = ProjectIndex.new(data)
  index.should have(40).projects

  last_project = projects.last
  last_project.title.should eq("ORCA")
  last_project.customer.should eq("Romney-Ryan 2012")
  last_project.deploy_date.should eq(Date.parse("2012-11-06"))
end

Well, what works? That one word description is meaningless and the example has multiple assertions which don't provide any context.

In building spec-style tests you should keep the English language descriptions you're writing front and center. One way to do this is to run RSpec with the documentation format:

jp@oeuf:~/workspace/project-manager(master)$ be rspec --format documentation spec
ProjectIndex
  .new
    instantiates an index given the contents of a project CSV file
  #projects
    returns a collection of projects from the index
Project
  #title
    returns the project title
  #customer
    returns the Customer record for the project
  #deploy_date
    calculates the deploy date from the latest project status

Instead of a field of green dots the documentation format outputs the nested descriptions, contexts, and example titles you've been typing. This allows you to skim through to see if your tests reveal actually how the object is intended to behave. Focusing on the output of the documentation formatter can help improve the communicative value of a test suite. Use the refactor step of red-green-refactor to actually make your tests a coherent explanation of why that object exists, how it behaves, and why this behavior exists.

Future Developer, Delighted

These are just a few of the ways we can optimize for change with the reasonable assumption that somebody else will be charged with making those changes. Think about the next sets of eyes that will be responsible for building and operating your current project when you're working on it. We've all felt pangs of guilt about the maintainability or quality of something we've shipped. Instead of feeling sympathy for all of the challenges you've left in the codebase, begin to tally all of the drinks Future Developer will owe you for all of the tidy work you've left behind.

Thanks to Dave Yeu from whom I've co-opted (read: stolen) the term “future developer.”

Multicast in Ruby: Building a Peer-to-Peer Chat System

2012-11-01T00:00:00+00:00

IP multicasting allows a node to send one datagram to multiple interested receivers. Hosts indicate their interest in traffic by subscribing to a multicast address. Datagrams sent to this multicast address will be received by all member nodes on a local network. A multicast address is any host address in the 224/8 - 239/8 range of addresses which is reserved for multicast.

Services that use multicasting are not often found on the public internet due to the complexities involved in sharing this subscription state between neighboring external networks and the lack of incentive for ISPs to support it. You probably don't use multicast directly day-to-day, but if you're using a OS X or Linux system it's likely to be a member of a couple of multicast groups by default.

jp@oeuf:~$ netstat -g
...
IPv4 Multicast Group Memberships

Group               Link-layer Address  Netif
224.0.0.1           1:0:5e:0:0:1        en0
224.0.0.251         1:0:5e:0:0:fb       en0

224.0.0.1 is the All Hosts multicast group. RFC1122 dictates that all hosts that fully support multicasting must always maintain a membership for it. 224.0.0.251 is the mDNS multicast group which OS X uses for DNS resolution of the .local domain.

If we send an ICMP echo request to either of these addresses, we'll get back an ICMP echo reply for each member host:

jp@oeuf:~$ ping 224.0.0.251
PING 224.0.0.251 (224.0.0.251): 56 data bytes
64 bytes from 192.168.1.5: icmp_seq=0 ttl=64 time=71.531 ms
64 bytes from 192.168.1.6: icmp_seq=0 ttl=64 time=75.006 ms

Using tcpdump we can see that while we only send one request we get two replies with the same sequence number:

jp@oeuf:~$ sudo tcpdump -i en0 icmp
20:46:43.659398 IP oeuf.home > 224.0.0.251: ICMP echo request, id 37572, seq 0, length 64
20:46:43.744414 IP ipad.home > oeuf.home: ICMP echo reply, id 37572, seq 0, length 64
20:46:43.744425 IP apple-tv.home > oeuf.home: ICMP echo reply, id 37572, seq 0, length 64

Multicasting in Ruby

Ruby's socket library exposes a wrapper to the underlying operating system socket implementation. Normally we'd be working with abstractions well above socket. It's pretty low-level and isn't the friendliest library to work with, but it allows us to directly manipulate sockets directly to properly bind to the multicast address group.

Here's a basic send/receive example. The first script, send.rb, opens up a UDP socket, sets the multicast TTL of the datagram to 1 to prevent it from being forwarded beyond our local network, and sends whatever the first command line argument passed to the script was across the socket.

require "socket"

MULTICAST_ADDR = "224.0.0.1"
PORT = 3000

socket = UDPSocket.open
socket.setsockopt(:IPPROTO_IP, :IP_MULTICAST_TTL, 1)
socket.send(ARGV[0], 0, MULTICAST_ADDR, PORT)
socket.close

receive.rb also opens a UDP socket but does a little more work to set itself up to receive messages from the multicast address group. It sets two options on the socket: one to add the membership to the IP multicast group and one to allow multiple receivers to bind to the same port. The second option allows two or more programs on the same host to receive messages from the same multicast group. Lastly, it binds to the address and port and then sets up a small loop to block, wait for a message, and print its contents to the terminal.

require "socket"
require "ipaddr"

MULTICAST_ADDR = "224.0.0.1"
BIND_ADDR = "0.0.0.0"
PORT = 3000

socket = UDPSocket.new
membership = IPAddr.new(MULTICAST_ADDR).hton + IPAddr.new(BIND_ADDR).hton

socket.setsockopt(:IPPROTO_IP, :IP_ADD_MEMBERSHIP, membership)
socket.setsockopt(:SOL_SOCKET, :SO_REUSEPORT, 1)

socket.bind(BIND_ADDR, PORT)

loop do
  message, _ = socket.recvfrom(255)
  puts message
end

The socket library is not the easiest to work with and usually involves a lot of man page reading. Previous editions of the pickaxe have a whole appendix for the socket library but pragprog decided to remove it from the book in the latest update. Luckily, they have released its contents for free in PDF and e-reader formats.

Chat, Serverlessly

Articles about building a chat server in a given toolset are a trope of programming writing. Let's embrace the cliche and take that example but implement it in as a peer to peer service using multicast to allow chat clients on different hosts on the same network to exchange messages.

We'll call the project backchannel. Its basic operations are:

When the client receives a message through the socket from another user, draw the message into the window
When the user types in a message and hits return, send that message to other listening clients over the socket

Clients will become a member of a multicast group and use the group to exchange chat messages. I used ruby to pick a random number (rand(10_000)) and drew 6188 so I'll use 224.6.1.88 as the multicast address and bind to port 6188.

In our description we've mentioned three different nouns: a client, a window and a message. Let's start by doing some cocktail napkin design.

Client is responsible for sending and receiving messages. It exposes a listener interface to allow listeners to be alerted to new messages and a transmit method for sending arbitrary content across the socket.

Window is responsible for managing the UI which entails drawing messages into the terminal and capturing our input and sending new messages. It'll require a handle onto the client to allow us to transmit messages and it'll need to keep a backlog of messages to be able to draw chat history.

Message will be transmitted as a human readable JSON objects. It will have three attributes: a client ID, the user's handle and some message content. Let's start with Message since it's a simple value object:

require "json"

class Message
  attr_reader :client_id, :handle, :content

  def self.inflate(json)
    attributes = JSON.parse(json)
    new(attributes)
  end

  def initialize(attributes={})
    @client_id = attributes.fetch("client_id")
    @handle = attributes.fetch("handle")
    @content = attributes.fetch("content")
  end

  def to_json
    { client_id: client_id, handle: handle, content: content }.to_json
  end
end

No surprises there. We define an attr_reader for the properties we're bundling together and some convenience methods for JSON serialization and deserialization.

Next we'll look at Client. It's the object that knows how to send and receive messages from the multicast address group. It exposes a method for sending messages and a hook for allowing another object to listen for new messages. Since it's the object responsible for chat operations, it will also generate and hold a random client_id and hold the user's chosen handle.

require "socket"
require "thread"
require "ipaddr"
require "securerandom"

class Client
  MULTICAST_ADDR = "224.6.8.11"
  BIND_ADDR = "0.0.0.0"
  PORT = 6811

  def initialize(handle)
    @handle    = handle
    @client_id = SecureRandom.hex(5)
    @listeners = []
  end

  def add_message_listener(listener)
    listen unless listening?
    @listeners << listener
  end

  def transmit(content)
    message = Message.new(
      "client_id" => @client_id,
      "handle"    => @handle,
      "content"   => content
    )

    socket.send(message.to_json, 0, MULTICAST_ADDR, PORT)
    message
  end

  private

  def listen
    socket.bind(BIND_ADDR, PORT)

    Thread.new do
      loop do
        attributes, _ = socket.recvfrom(1024)
        message = Message.inflate(attributes)

        unless message.client_id == @client_id
          @listeners.each { |listener| listener.new_message(message) }
        end
      end
    end

    @listening = true
  end

  def listening?
    @listening == true
  end

  def socket
    @socket ||= UDPSocket.open.tap do |socket|
      socket.setsockopt(:IPPROTO_IP, :IP_ADD_MEMBERSHIP, bind_address)
      socket.setsockopt(:IPPROTO_IP, :IP_MULTICAST_TTL, 1)
      socket.setsockopt(:SOL_SOCKET, :SO_REUSEPORT, 1)
    end
  end

  def bind_address
    IPAddr.new(MULTICAST_ADDR).hton + IPAddr.new(BIND_ADDR).hton
  end
end

Much of this code was adapted from the send.rb and receive.rb scripts above but it has some of its own characteristics worth discussing. listen spins up a new Thread. This is necessary because in order to listen for new messages we're using a blocking call. Spinning up a Thread will allow our program to do other work while waiting for new messages.

We've decoupled any interested receivers of messages from Client by adding a hook to allow interested parties to subscribe to messages through the add_message_listener method. Now our Window doesn't need to have any concrete wiring to Client but rather just has to register itself on initialization and implement a new_message method.

Window manages the UI and implements another dusty ruby wrapper – curses. I'm going to elide most of these details as those incantations are obscure and will be the subject of a future article.

require "curses"

class Window
  include Curses

  def initialize(client)
    @client = client
    @messages = []
  end

  def start
    @client.add_message_listener(self)

    loop do
      capture_input
    end
  end

  def new_message(message)
    @messages << message
    redraw
  end

  private

  def capture_input
    content = getstr

    if content.length > 0
      message = @client.transmit(content)
      new_message(message)
    end
  end

  def redraw
    draw_text_field
    draw_messages
    cursor_to_input_line
    refresh
  end
end

This class is fairly simple when most of the presentation layer cruft is set aside. On initialization a Client is passed in and a new array is initialized to store message history.

Once start is called, Window adds itself as a message listener. new_message will be called by Client when a new message is available. That method will add that message to the end of the array and call a redraw method to do the dirty UI details.

User input is captured via a loop using curses' getstr method. We pass the content to Client for transmission over the network. Client passes us back a Message which we add to the collection and redraw the screen.

Finally, we have some glue code to introduce Client and Window and start the program:

require "backchannel/client"
require "backchannel/window"
require "backchannel/message"

class Backchannel
  def self.start(handle)
    client = Client.new(handle)
    window = Window.new(client)

    window.start
  end
end

The result of these three small classes is an IRC-like program that allows any users connected over the same physical network to pass messages. Calling Backchannel.start will draw the screen and wire up the client to the multicast address group.

The full source of the final application is on GitHub and you can play with it by running gem install backchannel and starting backchannel with backchannel . Since we're setting SO_REUSEPORT, multiple programs on the same system can connect to the same chat for demonstration purposes.

I've never used multicasting in a real-world application but will be keeping my eyes open for an opportunity. Since we're all carrying around computers in our pockets now, local, opt-in networks seem applicable to all kinds of things.

Building Real-Time Web Applications with Server-Sent Events

2012-10-29T00:00:00+00:00

In the long, long ago, to add real-time content to a web page your only option was a kludge of a JavaScript timer to poll an HTTP endpoint via XHR and manipulate the page when new data became available. Still common (and even preferred by some), Ajax polling seems inefficient at best and retrograde at worst. Every few seconds we have to spin up a TCP connection, send a full HTTP request, wait for the server to do some kind of work, and snarf back and parse an entire HTTP response.

All of these redundant connections and chatter aren't free. As more traffic moves to mobile clients these inefficiencies have real-world impact on users' device battery life and data transfer costs. Keepalive and If-Modified-Since or ETag request headers might help but your server is still tied up on each request doing redundant work for each client to figure out if there's new content and your clients are still burning cycles spinning through this process. Moreover, HTTP requests often become bloated with attributes like cookies, locale preferences, tweet-sized user-agent strings, etc. that are unnecessarily shoveled across the connection in each request.

Server-Sent Events (SSE)

An alternative to polling is the Server-Sent Events API. SSE provides a simplex connection between a server and a client that allows the server to trigger events in the browser. Web applications can bind a callback to these events via JavaScript.

WebSockets has gotten much more attention than Server-Sent Events. One good reason for this is that WebSockets is much more fully featured than SSE. It's essentially a completely separate protocol from HTTP that provides a full-duplex connection which makes it more attractive for applications that require low-latency bi-directional communication. The trade-off is that since it's not HTTP there's added complexity in implementing it. For example, much of the HTTP infrastructure deployed out in the wild isn't necessarily aware of WebSockets and consequently the protocol's traffic can't traverse it. As it stands today getting a WebSockets-speaking server propped up behind a traditional load balancer can prove to be somewhat painful.

SSE doesn't have any of this overhead as it uses traditional HTTP for transport. It's directed at real-world network environments so it has features like automatic reconnection baked into it. It's exposed in the browser via the EventSource interface so it's trivial to write a shim for browsers that don't support it to fall back to long-polling.

A SSE event only has a couple of attributes and looks something like YAML:

id: 1
event: new-message
data: oh, hi!

event is an optional custom name of the event to trigger. JavaScript applications can bind to specific events or choose to bind to all messages. data is what's passed into the event when triggered. id is an optional unique identifier for the message. If provided, Last-Event-ID will be sent back to the server on reconnect for applications where messages can't be dropped. SSE also allows a server to specify a retry in milliseconds and comments can be sent with a line starting with a colon.

Binding to these events using JavaScript is straight-forward:

var stream = new EventSource("/stream");

stream.addEventListener("new-message", function (event) {
  console.log(event.data);
});

Any new-message events that are transmitted through the connection will now trigger this callback and log the message to the console.

The specification envisions the protocol to be extensible to serve other purposes outside the browser. For example, it's possible to extend it to be used as a transport to deliver push notifications to mobile devices over TCP/IP or SMS networks.

Picture Frame

Let's build a small application to illustrate how Server-Sent Events works. In this example we'll put together a toy application that creates a shared picture frame. Any user can enter a search term which searches Flickr's API for that term, retrieves a random image from the results and broadcasts it via a Server-Sent Event. Each client subscribed will receive the event and update the background of the page.

We'll use small components to keep the example focused and forego using any specific framework. Since we want to be able to service connections concurrently we'll use Thin as an application server. Our implementation of the picture frame will be a Rack application behind HTTP Router for routing between our actions and serving static content. We expect to have at least two actions: one to subscribe to the SSE stream, one to publish new content to the stream, a static HTML page to display the frame and a little JavaScript to act on events from the stream.

If you return a Deferrable as the body of a response, Thin will keep the connection open until the deferred object is complete. Deferrable objects represent an operation in flight and accept two callbacks: a callback which is fired on success and an errback which is fired on failure. We'll create a deferred body which can be used to write to the active connection and to signal to Thin when we want to close the connection.

class Body
  include EM::Deferrable

  def each(&block)
    @callback = block
  end

  def write(data)
    @callback.call(data)
  end
end

When Thin calls each it'll pass a block that can be used to emit data to the connection. We store that block and expose a write method for calling it.

We can use this to build an endpoint that can handle concurrent connections as long as we're careful not to block the reactor. Here's an example rackup file that mounts a small Rack application that returns a ping each second four times and then closes the connection.

app = ->(env) {
  count = 0
  body = Body.new

  timer = EM.add_periodic_timer(1) {
    body.write "ping\n\n"
    count += 1

    if count == 5
      timer.cancel

      body.write "ok, bye!\n\n"
      body.succeed
    end
  }

  [200, {"Content-Type" => "text/plain"}, body]
}

run app

Now when we start the server it'll accept connections on the specified port and only close that connection after four pings.

jp@oeuf:~/workspace/tmp$ thin start -R example.ru -p 4000
>> Thin web server (v1.5.0 codename Knife)
>> Maximum connections set to 1024
>> Listening on 0.0.0.0:4000, CTRL+C to stop

Now to the picture frame. Let's start with the subscribe action. This endpoint will subscribe a user to the stream so it should keep the connection alive and send events when they are triggered by another user. To start we'll build a class that expects to be instantiated with a Rack env and a channel object which will be used to transmit messages to subscribers.

module Actions
  class Subscribe
    HEADERS = {
      "Content-Type"  => "text/event-stream",
      "Connection"    => "keepalive",
      "Cache-Control" => "no-cache, no-store"
    }

    def initialize(env, channel)
      @env = env
      @body = Body.new
      @channel = channel
    end

    def run
      @channel.subscribe do |message|
        body.write("event: picture\n")
        body.write("data: #{data.to_json}\n\n")
      end

      [200, HEADERS, @body]
    end
  end
end

This is the entirety of the server-side code necessary to build a Server-Sent Events stream. We set the Content-Type of the response to text/event-stream and setup our channel subscription to trigger a picture event when a new message is received from the channel.

Next we'll build an endpoint for a user to perform a search. The publish endpoint expects to receive a POST with a keyword parameter. It takes that keyword and uses a FlickrSearch class to get the data to publish back to the channel. For good measure it also sends the result to the original publisher and suceeds the Deferrable which closes the channel. FlickrSearch is a Deferrable that uses em-http-request to fetch data from Flickr and return the result asynchronously. The result is an object that responds to to_json and returns a hash that includes the original keyword that was used for the search and the URL to a random result.

module Actions
  class Publish
    def initialize(env, channel)
      @env = env
      @channel = channel
      @body = Body.new
    end

    def run
      search = FlickrSearch.new(params["keyword"]).get

      search.callback do |result|
        @channel.push(result)

        @body.write(result.to_json)
        @body.succeed
      end

      search.errback { @body.fail }

      [200, {"Content-Type" => "application/json"}, body]
    end

    private

    def params
      @params ||= Rack::Utils.parse_query(@env["rack.input"].read)
    end
  end
end

The only page of the application will be a small, static HTML page to setup the a form to allow searches.


  
    
    
    
  
  
    
       type="text">

To wire up the page to our stream we have a little bit of CoffeeScript glue code.

source = new EventSource("/subscribe")

source.addEventListener "picture", (event) ->
  data = JSON.parse(event.data)
  $("body, input").trigger "changeBackground", [data.url, data.keyword]

jQuery ->
  $("body").bind "changeBackground", (event, url, keyword) ->
    $(this).css(
      "background":              "url(#{url}) no-repeat center center fixed"
      "-webkit-background-size": "cover"
      "-moz-background-size":    "cover"
      "background-size":         "cover"
    )

  $("input").bind "changeBackground", (event, url, keyword) ->
    $(this).val("").attr("placeholder", keyword)

  $("form").submit (event) ->
    event.preventDefault()
    input = $(this).find("input")
    $.post "/publish", keyword: input.val()

The call to EventSource is all that is required to open up the stream. When we receive a picture event, we trigger a changeBackground event on our and elements. The jQuery block sets up the nodes to respond to changeBackground with their respective presentation logic. The input clears what has been typed into it and sets a placeholder with the last search and the body changes its background to the search's returned URL and does some CSS incantations to make the background appear full screen.

We also bind a submit callback to our form to wire it up to POST to our publish endpoint as an XHR request rather than a postback. Since we don't have full-duplex communication via SSE, we're cheating by using Ajax for upstream communications. This SSE Down/ Ajax Up approach is completely acceptable for this application, but if it wasn't for some reason we might consider WebSockets instead.

The Rack application to make these pieces work together is quite small. We're going to inject a memoized EventMachine::Channel into each action to act as the application's event bus and rely on HTTP Router to route requests to our actions and serve our static index page and compiled JavaScript.

class PictureFrame
  def self.app
    @routes ||= HttpRouter.new do
      post("/publish").
        to { |env| Actions::Publish.new(env, PictureFrame.channel).run }
      get("/subscribe").
        to { |env| Actions::Subscribe.new(env, PictureFrame.channel).run }
      add("/").static("public/index.html")
      add("/").static("public")
    end
  end

  def self.channel
    @channel ||= EventMachine::Channel.new
  end
end

Now that everything's stitched together, all users on the page will see its background changed when another user enters a search term. It looks something like this:

event: picture
data: {"url":"//farm1.staticflickr.com/48/177506457_6da382ee6d_z.jpg","keyword":"sushi"}

event: picture
data: {"url":"//farm4.staticflickr.com/3611/3376337890_bdbd465a7b_z.jpg","keyword":"french fries"}

event: picture
data: {"url":"//farm5.staticflickr.com/4082/4861386139_e6d25a7b35_z.jpg","keyword":"staten island ferry"}

event: picture
data: {"url":"//farm4.staticflickr.com/3566/3683645594_f8f9ce7091_z.jpg","keyword":"flatiron building"}

The final application is deployed to Heroku and the source is on GitHub.

Using SSE in Your Applications

If you only need a simplex channel to a web client to update some data, Server-Sent Events is a viable alternative to crufty polling or complex WebSockets. Even with bi-directional requirements SSE Down/Ajax Up might be sufficient and save you the trouble of turning up a WebSockets connection.

Rails 4.0 can be used to build these endpoints. Goliath or Cramp can also be used to implement SSE with a Ruby server. Implementation from scratch with Node.js, Twisted and Python, or your particular weapon of choice is likely just as straight-forward.

On the client side of the wall it's already built into most browsers and easily reusable by your native applications. Since it's a simple DOM interface it's trivial to use SSE to make your dynamic functionality more efficient for sufficiently modern browsers while maintaining chattier long-polling for older browsers.

When Should I Use Protected Method Visibility in Ruby?

2012-10-26T00:00:00+00:00

When I'm reading Ruby code and I come across protected methods, I spend a moment taking a deeper look at the interface of the object defining them. Protected method visibility is targeted to a very specific and seemingly rare use-case in Ruby: methods defined as protected are only callable by other objects whose class is of the same defining class or its subclasses. The pickaxe book calls this “keeping it within the family.” The Ruby Programming Language describes protected as “the least commonly defined and also the most difficult to understand” of the method visibility types, so when I do see it I wonder what the author is trying to communicate. Is it a hint about the stability of the methods? Are the objects actually using protected access between instances? Did he or she want to encapsulate some behavior but use explicit self as a matter of style? Was this some unfortunate pattern promulgated by some random Ruby on Rails tutorial in 2007? Why would you reach for protected when the semantics of private seem sufficient to encapsulate an object's behavior?

Let's look at some of the common applications of protected method visibility. Some common patterns I've noticed are: attributes for comparison operations, mutator methods for immutable objects, fulfilling an abstract class' contract, and framework hooks.

Attributes for Comparison Operations

The most common employment of protected is applying it to attributes or methods that are necessary to compare two instances to each other without exposing that information in the object's public interface. Since operators on an object are actually method calls, we can override these methods and provide our own comparison logic for operations on an object. protected allows these objects to expose data needed in a comparison to each other but continue to hide it from the rest of the system.

For example, let's look at a simple Collection object. This object is responsible for management of a collection of items. The Collection doesn't expose the items directly but rather defines an interface for interacting with the internal array.

class Collection
  def initialize(items=[])
    @items = items
  end
end

Two Collection instances are deemed to be equal if they hold the same number of elements and the elements are in the same order. To expose this operation we define a method == on Collection:

class Collection
  def initialize(items=[])
    @items = items
  end

  def ==(other)
    # TODO: Compare our items to the other collection's items
  end
end

In order to compare the arrays we'll need to add an items getter but continue to hide this data from external callers. If we add a getter without specifying access control, a caller could access the contents of the array directly but if we set this method private, nobody – including sibling objects – will be able to access the property. protected does exactly what we want.

class Collection
  attr_reader :items
  protected :items

  def initialize(items=[])
    @items = items
  end

  def ==(other)
    items == other.items
  end
end

Collection instances can now compare themselves to each other while still hiding their data from other callers. Collection objects will only respond to items for sibling Collection instances; calls from other objects will raise a NoMethodError.

Collection.new([1, 2, 3]) == Collection.new([1, 2, 3])
# => true

Collection.new([1, 2, 3]) == Collection.new([3, 2, 1])
# => false

>> Collection.new([1, 2, 3]).items
# NoMethodError: protected method `items' called for #

The usual caveat here is that in Ruby access control is “just a suggestion” and a user of an object can still reach in and access anything regardless of its visibility. For example:

Collection.new([1, 2, 3]).instance_variable_get(:@items)
# => [1, 2, 3]

Collection.new([1, 2, 3]).send(:items)
# => [1, 2, 3]

While this is true, there's still value in signaling your intentions to users of the object. Setting explicit access controls guides users to our defined interface and discourages fiddling with internals.

The Ruby standard library class OpenStruct uses this pattern. OpenStruct allows a user to set arbitrary attributes that can be accessed with dot notation.

require "ostruct"

book = OpenStruct.new
book.title  = "The Art of Fielding"
book.author = "Chad Harbach"

book.title
# => "The Art of Fielding

book.author
# => "Chad Harbach"

An OpenStruct is considered equal to another OpenStruct when they hold the same attributes. Under the hood OpenStruct stores these attributes in an internal hash table. It exposes this table as a protected method which allows other OpenStruct instances to determine equivalence. This is implemented similarly to the Collection example:

# lib/ostruct.rb:224 (ruby 1.9.3-p286)

attr_reader :table # :nodoc:
protected :table

#
# Compares this object and +other+ for equality.  An OpenStruct is equal >
# +other+ when +other+ is an OpenStruct and the two object's Hash tables >
# equal.
#
def ==(other)
  return false unless(other.kind_of?(OpenStruct))
  return @table == other.table
end

Mutator Methods for Immutable Objects

Another use of protected I found in the Ruby standard library is using protected methods to maintain the immutability of a value object. Let's say we've decided our Collection is a value object and should be immutable. There are new requirements that necessitate some operations that require Collection to change during runtime. Let's start with the first: the sum of two Collection instances is a new Collection which holds a superset of the summands' arrays.

class Collection
  attr_reader :items
  protected :items

  def initialize(items=[])
    @items = items
  end

  def ==(other)
    items == other.items
  end

  def +(other)
    # TODO: Add our items to the other collection's items
    # and return a new collection with the sum.
  end
end

Since we've marked items as protected we can reach in from one instance into another, grab these items, add them to our items and instantiate a new collection.

class Collection
  attr_reader :items
  protected :items

  def initialize(items=[])
    @items = items
  end

  def ==(other)
    items == other.items
  end

  def +(other)
    self.class.new(items + other.items)
  end
end

Collection.new([1]) + Collection.new([4])
# => #

This simple example is fairly similar to our last – it involves overriding an operator method and using privileged data from the sibling instance in the operations. While state in neither Collection changed, they were able to collaborate and return a new Collection with the desired items.

IPAddr is a class in the Ruby standard library which is a value object that represents an IPv4 or IPv6 address. Under the hood it makes extensive use of this pattern for manipulating the IP address it represent.

Given an IP address (say, 192.168.0.77) and a subnet mask (255.255.255.248), we can use bitwise operations to figure out the upper and lower boundaries of the network of which this host is a member. The lower boundary is referred to as the network address and the upper boundary as the broadcast address.

# For more detail around IP addressing basics see TCP/IP
# Illustrated (Second Edition) pp. 31-43

address = IPAddr.new("192.168.0.77")
# #

# To calculate an IP address' network address, each bit
# in the address is bitwise ANDed with each corresponding
# bit in the subnet mask.
address & IPAddr.new("255.255.255.248")
# #

# To calculate an IP address' broadcast address, take
# the inverse of the subnet mask (flip ones to zeroes
# and vice-versa), and perform a bitwise OR against each
# bit in the address.
address | (~ IPAddr.new("255.255.255.248"))
# #

# So our IP address 192.168.0.77 with subnet mask
# 255.255.255.248 lies on a network whose network address
# is 192.168.0.72 and whose broadcast address is 192.168.0.79.

IPAddr exposes these operations but maintains immutability by cloning itself and calling protected methods on the new instance.

# lib/ipaddr.rb:108 (ruby 1.9.3-p286)

# Returns a new ipaddr built by bitwise AND.                                  
def &(other)                                                                  
 return self.clone.set(@addr & coerce_other(other).to_i)                     
end                                                                           

# Returns a new ipaddr built by bitwise OR.                                   
def |(other)                                                                  
  return self.clone.set(@addr | coerce_other(other).to_i)                     
end

# lib/ipaddr.rb:128 (ruby 1.9.3-p286)

# Returns a new ipaddr built by bitwise negation.                             
def ~                                                                         
  return self.clone.set(addr_mask(~@addr))                                    
end                                                                           

# lib/ipaddr.rb:370 (ruby 1.9.3-p286)

protected

# Set +@addr+, the internal stored ip address, to given +addr+. The
# parameter +addr+ is validated using the first +family+ member,
# which is +Socket::AF_INET+ or +Socket::AF_INET6+.
def set(addr, *family)
  case family[0] ? family[0] : @family
  when Socket::AF_INET
    if addr < 0 || addr > IN4MASK
      raise ArgumentError, "invalid address"
    end
  when Socket::AF_INET6
    if addr < 0 || addr > IN6MASK
      raise ArgumentError, "invalid address"
    end
  else
    raise ArgumentError, "unsupported address family"
  end
  @addr = addr
  if family[0]
    @family = family[0]
  end
  return self
end

Instead of changing its state during these operations, it creates a copy of itself using clone and calls protected methods like set to mutate the instance and return it to the caller.

Fulfilling an Abstract Class's Contract

Another example of the protected keyword is in ActiveSupport's caching layer. ActiveSupport::Cache::Store defines an abstract class that can be inherited to implement a pluggable caching layer. A minimal viable implementation of a cache store involves implementing three methods: read_entry, write_entry and delete_entry. These are called by the public API of the abstract class and implement a specific storage strategy. This separates the concerns of how the cache behaviors from the specifics of how its data is stored.

# activesupport/lib/active_support/cache.rb:441

protected

# activesupport/lib/active_support/cache.rb:461

# Read an entry from the cache implementation. Subclasses must implement
# this method.
def read_entry(key, options) # :nodoc:
  raise NotImplementedError.new
end

# Write an entry to the cache implementation. Subclasses must implement
# this method.
def write_entry(key, entry, options) # :nodoc:
  raise NotImplementedError.new
end

# Delete an entry from the cache implementation. Subclasses must
# implement this method.
def delete_entry(key, options) # :nodoc:
  raise NotImplementedError.new
end

ActiveSupport ships with implementations to store cache data in memory, a file and memcached. Each implementation has its own methods for interacting with its respective store.

# activesupport/lib/active_support/cache/mem_cache_store.rb:121

protected
  # Read an entry from the cache.
  def read_entry(key, options) # :nodoc:
    deserialize_entry(@data.get(escape_key(key), options))
  rescue Dalli::DalliError => e
    logger.error("DalliError (#{e}): #{e.message}") if logger
    nil
  end

# activesupport/lib/active_support/cache/mem_cache_store.rb:153

private

  # Memcache keys are binaries. So we need to force their encoding to binary
  # before applying the regular expression to ensure we are escaping all
  # characters properly.
  def escape_key(key)
    key = key.to_s.dup
    key = key.force_encoding("BINARY")
    key = key.gsub(ESCAPE_KEY_CHARS){ |match| "%#{match.getbyte(0).to_s(16).upcase}" }
    key = "#{key[0, 213]}:md5:#{Digest::MD5.hexdigest(key)}" if key.size > 250
    key
  end

  def deserialize_entry(raw_value)
    if raw_value
      entry = Marshal.load(raw_value) rescue raw_value
      entry.is_a?(Entry) ? entry : Entry.new(entry)
    else
      nil
    end
  end

By marking the abstract interface methods with protected and the implementation methods for the storage mechanism as private there's a demarcation between the concerns. There's no direct reason in the implementation for using protected methods in this case. The calls to these protected methods use an implicit self which means private method calls would work to encapsulate the object's behavior. Using the protected keyword is primarily a matter of convention to call into relief which concerns belong to what components to aid in maintenance.

Framework Hooks

Another conventional use of protected methods is for methods within an object that aren't called directly by the object but are callback hooks that a framework is configured to call. For example, ActionController::Base allows an inheriting class to define filters that are called at specific moments in a request's lifecycle. We'll contrive an example using a Blog application.

BlogApp::Application.routes.draw do
  resources :blogs do
    resources :posts
  end
end

Let's add a PostsController. We want to use strong_parameters to prevent any unauthorized mass-assignment and add an authorization check to ensure the current user's access to create posts on the current blog.

class PostsController < ApplicationController
  before_filter :authorize_user

  def create
    @post = blog.posts.build(post_parameters)

    if @post.save
      redirect_to action: :index, notice: "Post created."
    else
      render :new
    end
  end

  protected

  def authorize_user
    unless blog.authorized?(current_user)
      render nothing: true, status: :unauthorized
    end
  end

  private

  def blog
    @blog ||= Blog.find(params[:blog_id])
  end

  def post_parameters
    params.require(:post).permit(:title, :date, :content)
  end
end

The protected keyword denotes methods that are called by ActionController and the private keyword is used for methods that we call in the controller itself to complete our work. Simliar to the previous example above, there's no implementation reason that we're using protected methods here aside from calling attention to the fact that the methods marked as protected and private are interfaces aimed at different consumers: the external framework and the internal object respectively. It's a hint to future readers of the code that while these methods aren't part of the object's public API, there are users of the interface beyond the object itself.

So, when should I use it?

protected is an odd beast; it accomplishes much of what private does but with the addition of some nuanced complexity and the (dubious) benefit of being able to call methods on self explicitly. There are some conventions around what protected means but they seem to vary from project to project. I could find no project with any guidelines around method visibility. It was not apparent in most of the code I read that had used protected why the original author had chosen to use it.

I talked to several developers while writing this who had committed code to open source projects and had used protected. I received the same response from each: 1) I don't remember why I used protected there 2) I wouldn't use protected if I was writing that code again, private or public would have been better 3) I don't use protected at all today.

In searching ruby-core for conversations about protected methods, it's clear this feature even confuses core contributors. The OpenStruct example above was discussed on the list as a replacement of an instance_eval. The contributor who suggested it was tentative about using it: “From my ruby life for now, here's the only place where protected method lives.”

Protected method visibility could make sense to use in workaday code for the above cases. It was designed for the object interaction patterns shown in the first two examples – attributes for comparison operations and mutator methods for immutable objects. The latter two examples – fulfilling an abstract class' contract and framework hooks – are not universally applied patterns and aren't enforced by the language. If you're going to use protected in this way it's worth a quick discussion with your team to determine if the pattern would be useful or at the very least leaving a paper trail in either the commit message or the RDoc documentation for the methods briefly explaining why they are marked protected.

Talk: Kafka: The Great Logfile in the Sky

2012-08-10T00:00:00+00:00

Slides

Video

pivotallabs on livestream.com. Broadcast Live Free