ostinelli|net

Simple Procedural Walk for UE4

Roberto Ostinelli — Mon, 26 Apr 2021 08:13:00 +0000

In my spare time I love dedicating some energy to Unreal Engine 4. Today, I’m releasing my first UE4 plugin, Simple Procedural Walk.

I’ve had an idea for a game for quite a while (on Steam, Subtype Grounds), which basically takes you in a cooperative journey to destroy the A.I. responsible for robots taking over the world. That’s another story, but while I was animating the killer robots for this game, I found out about how procedural animations can really help in making their movement realistic and interesting, so I decided to give that a try. But… What is procedural animation?

“Standard” animations are generally the result of pre-saved animations, whether these are designed by hand with tools such as Blender, or recorded by using motion capture systems. These animations are then applied to your models and played back, bringing your characters to life.

“Procedural” animations, on the contrary, are not pre-made or pre-recorded: everything is computed in real time based on the character position and their environment. This has numerous advantages. For instance in a walking cycle, the feet of your character will perfectly adapt to the terrain, and they will not slide in unexpected ways. They can even move apart to avoid holes in the ground, or look for places were they can properly get a foothold. The body can be moved so that it gets the proper inclination, and so on.

These two techniques (pre-saved and procedural) are often combined, for instance a motion-captured human walking cycle might be coupled with procedural elements so that, for instance, the feet adapt to the terrain. This gives the advantage of a more natural, organic looking cycle, while supporting the character adaptation to their surroundings. However, pre-saved animations can only be modified up to a certain point while, on the other hand, procedural animations have their own limitations (for instance, it is difficult to achieve properly looking organic movements).

For my robots, procedural animation is a perfect fit, since it is particularly good looking for robotic or insectoid creatures.

Here’s Simple Procedural Walk trailer video:

It is interesting to see how much control you can have on the movement itself with procedural animations.

For example, it is possible to considerably change the overall feel of the movement by having the character unplant its feet at different times. This video demonstrates how a single parameter in the plugin can change the look of a walk cycle:

If you are interested, you may check Simple Procedural Walk out on the Unreal Engine 4 marketplace.

The post Simple Procedural Walk for UE4 appeared first on ostinelli|net.

A journey to Syn v2, a better Erlang & Elixir Process Registry and Group manager

Roberto Ostinelli — Mon, 18 Nov 2019 14:44:09 +0000

For those who don’t know it, Syn (short for synonym) is a global Process Registry and Process Group manager for Erlang and Elixir that I’ve written a few years back. It has been around for some time now, and it has served me well on many occasions. Through the years, I’ve built up a series of considerations and ideas for improvement, and now the time has come to put all of these back into Syn.

Syn v2 has been rewritten from the grounds up, and I’d like to share the reasoning and the architectural choices behind this work. Since Syn is written in Erlang, the few portions of the code here below will show Erlang syntax. Don’t let that discourage you if you are an Elixir developer though, as the same principles apply.

The analysis

Things I wanted to keep

I’ve always treated a Process Registry as a specialized subset of a Key / Value store. Yes, the Key is the process registered name and the Value is the pid(), but the important bit here is that a process inherently belongs to a node – because it runs there. This changes the game quite significantly.

Contrary to standard Key / Value stores where you want to keep all of your data when a node leaves or crashes, why would you want to keep the name reference to a process, if that process runs on the node that left, or worse, crashed (and the process with it)? If a node gets added, why would you want to handoff a running process’ registration handling to this new node (so, not the node the process is running on)? In general, why would you want to decouple the node that handles the registered name of a process from the process itself?

If your answer is that in case of nodes leaving / crashing you need to keep the process’ information, then what you need is a persistent data storage, not a Process Registry. If your answer is that you want to spawn and load balance processes across a cluster (i.e. workers, that also happen to end up registered by name), then maybe what you need is a Job Queue, not necessarily a Process Registry.

Now, if what you want from your Process Registry is to register existing processes, then maybe you can agree that a process‘ registration is strongly tied to the node the process runs on. If the node dies, then the process dies as well, and keeping its registration name around probably doesn’t make much sense. If you agree to this, then my suggestion is that in a distributed registry every node should be handling the registration of the processes that run on it.

In this scenario, we have ourselves a different paradigm from the ones of standard Key / Value stores. For instance, you do not need a Hash Ring to load balance the processes in your cluster and create replicas for fault tolerance of data: if a process or the node it runs on dies, you want to let its registration name go. You could still consider using a hash ring or consensus algorithms such as Raft to register a name, because you might experience race conditions for the same name being registered simultaneously on different nodes. However, Syn takes the approach of using a similar conflict resolution mechanism used to resolve net splits for these cases.

This is even more true in the context where Syn was born. In IoT applications, you generally have a process that handles an external (TCP/UDP) socket to a physical device. It only makes sense that the process runs on the same node of the socket it handles, because in this scenario (maybe it’s just me) I cannot see the sense of having an external socket being managed by a process that runs on a different node:

If the node the socket runs on dies, you most probably want the process that handles it to die as well.
Reciprocally, if the socket handler process dies, you’d loose all of its state and you’d probably would want to disconnect the device from the socket anyway.

To sum it up, I wanted to keep Syn v1 paradigm which is: every node is the authority for registering the processes that run on it. Load balancing is not part of this Process Registry, and it rather has to do with whatever causes processes to be spawned in the first place (i.e. external sockets get created on a specific node based on TCP Load Balancing mechanisms).

Finally, I wanted also to keep a full registry replica on every node of the cluster, so that Syn is optimized for read-intensive operations rather than write-intensive ones. This also seems to make sense, since if you do want to register a process by a name it’s probably because you want it to live long enough for it to have an alias and keep track of it system-wide.

Things I wanted to improve

1. Dynamic node membership

Contrary to what some forum users seem to think, Syn v1 does manage dynamic node addition pretty well. The only caveat is that Syn v1 needs to be initialized on a node after a node joins a cluster: your application needs to have the logic to connect to the other nodes, and only then issue a call to syn:init/0 which initializes the mnesia tables on the node. This is because Syn v1 uses mnesia’s replication features, and mnesia enforces a specific instructions’ order when creating / adding a node to existing replicated tables.

For instance, the call to mnesia:change_config/2 to configure extra_db_nodes needs to happen before the creation of a table (via mnesia:create_table/2) or the addition of a node to existing replicated tables (via mnesia:add_table_copy/3). If you’re curious about this process, you can head to syn_backbone.erl where you can get the gist on how Syn v1 sets up dynamic replication via mnesia.

This could also potentially lead to a rare race condition, which BTW never happened to me in all of these years. If two nodes of a cluster were to be started during a net split so that they do not see each other on boot, they would initialize mnesia separately which would result in them having their own version of Syn tables, with their own fingerprint, that cannot be merged afterwards (due to mnesia internals).

Finally, while mnesia does support node addition, it does not as easily support node removal. The equivalent command to remove a table copy, mnesia:del_table_copy/2, has some other caveats (for example it requires mnesia to be stopped). It is certainly doable though, but the question here would be when to do it in an automated way. For instance, a node could get notified that another node left the cluster, but should it then remove the missing node’s table copy? To do that, it would first have to stop mnesia – and lose all of the ram data – and then, what if the other node was simply down due to a temporary network failure that caused a net split? For these reasons, node removal was never implemented in Syn v1. If a node left, the local mnesia tables would still have it included in extra_db_nodes that mnesia uses for replication, which is not a big deal.

Therefore, even if Syn v1 does support dynamic node addition, there are these caveats to keep in mind – and I can see why some users might have misinterpreted them.

2. Net Splits

Even though these events are rare, they do happen. Moreover, I felt that a mechanism that would solve those would also basically implement a fully working dynamic addition / removal of nodes.

Mnesia does not handle net splits very well. To give some support to this issue, Ulf Wiger experimented and created his unsplit framework (he talks about it in this post on the Erlang questions mailing list). However, I’ve had mixed results with the mechanism that he uses in there.

Basically, Ulf’s solution works by first subscribing to mnesia events. If mnesia triggers an inconsistent database event for a remote node (so it is running a partitioned network), the unsplit code will check whether the remote node is already part of mnesia’s local running nodes. If it is not, it will manually connect the local to the remote node using the undocumented mnesia_monitor:connect_nodes/1 method; while doing so, it will inject some custom resolution code that performs dirty read & write operations to the local and to the remote node’s mnesia tables. I took this mechanism and made a specialized and simplified version for Syn v1, and if you’re up for it you can check this implementation in syn_consistency.erl.

While it works nicely in a 2 nodes cluster, I have unfortunately encountered an inconsistent behavior in bigger clusters. In a cluster of 3 nodes I would see the inconsistent database events triggered correctly by mnesia, but unfortunately mnesia would randomly consider the remote node that triggered the event as already part of its local running nodes. Thus, the resolution part of the code that would perform the read & write operations and solve the split brain situation wouldn’t get a chance to run. My hunch is that mnesia is able to reconnect to parted nodes in a way that may happen before all of this mechanism gets the chance to be called, especially in partial net splits scenarios. I’ve also tried forcing the resolution code to run without the checks, but I got the error not_merged as the result of the merge fun in mnesia_controller:connect_nodes/1, more precisely here.

I don’t know whether this incapacity of randomly solving net splits in bigger clusters results from using mnesia’s in a non-intended way by taking advantage of Ulf’s mechanism (it seems that Ulf wanted to test it with more than 2 nodes himself), or if I might have missed something in my implementation version. That said, handling net splits is not an immediate task especially if you’re using a library that is not meant to do so. The feeling is that mnesia replication mechanisms are not intended to handle them, and hacking your way through might have unexpected results.

3. Behavior and customization

Syn v1 allows to register a process with one name at a time, a behavior consistent with Erlang’s global module. However, I felt this as a limitation and I wanted to allow multi-aliases process registration.

Also, I wanted a more clean approach to support customization callbacks, i.e. a single callback module with its syn_event_handler behavior that would be triggered depending on a developer’s choices.

Finally, in case of registry conflicts during a net split resolution (i.e. when two processes have registered the same name on different nodes during the net split), Syn v1 decides which process to keep and which one to discard. I wanted to provide developers with the ability of defining their own conflict resolution method. The developers could, for instance, save vector clocks data into the meta data of each process and use those to choose which process to keep in case of conflict.

The rewrite

Given all of the above, the choices were simple and clear – and so the implementation.

Generic Register / Unregister operations’ flow

When a registration request comes in for a Name and Pid, this request is routed to the node that the Pid is running on. In most cases, a registration request will be done from a process itself, which means that communication stays on the same node.
Every registration request is treated by a single (gen server) process (the registry process) on every node. This registry process starts monitoring the newly registered process, and also guarantees that registration / unregistration requests are necessarily consistent – since there’s a per-node single registry process authority that sequentially treats them.
The registry process writes to a ~~local (not replicated) mnesia table~~ (Edit in v2.1) local ETS tables the Name and related Pid. This is an in-memory only table that gets created on application start and killed on application stop. ~~I still use mnesia only because of its secondary index feature, as I need to be able to search for table entries both by Name and by Pid~~.
The registry process then sends the registration / unregistration information to all of the other nodes in the cluster. This is not done by sending a message to the registry processes of the other nodes, rather it issues a remote procedure call (RPC) that directly writes in the local ETS tables of the remote nodes. This allows the other registry processes to be free from all intra-nodes syncing operations.
Edit in v2.0.1 —>
When a node receives a registration information from another node, it checks locally whether there’s a name conflict. If one is found, the node will try to gain a global lock for the conflicting name to run a specific portion of code that merges the conflicting data, since other registry processes might see this conflict as well.
The registry process that succeeds in gaining the lock will compare the received registration data and will resolve the conflict between itself and the node that sent the registration information.
Once done, the registry process will free the lock, and all the other nodes that eventually experienced the same name conflict will resolve the same conflict.

Nodes’ addition

When a node joins a cluster, all the registry processes of the nodes in the cluster receive a NODE UP event for that node. Simultaneously, the joining node’s registry process receives a NODE UP event for all of the existing nodes in the cluster.
Every registry process that receives a NODE UP event for a remote node will try to gain a global lock to run a specific portion of code that deals with merging remote data.
The registry process that succeeds in gaining the lock will issue a RPC on the remote node and request the registry data for all of the processes that run on it, for which the remote node is the authority. It will then write this information on its local ETS tables. If this happens after a net split, some naming conflicts might happen at this moment, and a choice on which process to keep will be made depending on the specified logic. By default, Syn will keep the local process, kill the remote process and remove the latter from the ETS tables on the remote node (via a RPC).
Once done, the registry process will free the lock. It will not send its data, as it will be requested from the other registry processes when they get their turn to grab the lock.
The code for all of this at the time of writing can be seen here.

Nodes’ removal

When a node leaves a cluster (voluntarily, because it crashed or because of a net split), all the registry processes of the nodes in the cluster receive a NODE DOWN event for that node.
The registry processes that receive this event proceed to remove from their local mnesia tables all of the processes that ran on the disconnected node.
This works well also in context of partial net splits (A <--> B <--> C where node A can see B, B can see A and C, and C can only see B), since every node keeps locally only the information of the nodes it can see.

Results

These are the results of the rewrite.

Addition & removal of nodes in a cluster is done in a completely dynamic and transparent way, with no caveats.
Automated repairing from net splits has now been taken to a new level. You can check the existing test suites to see what is being covered.
Finally, there now is a single callback module with the syn_event_handler behavior, and the ability for the developer to use a custom function to resolve naming conflicts after net splits.

There still might be corner cases that I haven’t considered in terms of consistency, if some arise I will do my best to tackle any of those in future improvements of Syn.

You can grab your copy of Syn v2 on Hex or Github. Happy registering!

The post A journey to Syn v2, a better Erlang & Elixir Process Registry and Group manager appeared first on ostinelli|net.

Modern Erlang for Beginners: my course

Roberto Ostinelli — Wed, 13 Mar 2019 20:14:00 +0000

I remember when I first started learning Erlang, many years ago. There was the fundamental Programming Erlang book by the late Joe Armstrong, the official Erlang documentation, some various books but what really helped me was a screencast from Kevin Smith sold on the Pragmatic Bookshelf’s website.

I am a visual learner, which means that it is much easier for me to learn with videos or by pairing with other people. So, after those years I decided that I wanted to add my little contribution to the Erlang courses that available out there.

I’m pleased to say that as per today my course is live on The Pragmatic Bookshelf. I hope it will help out those who are seeking to enter this fantastic world, or those Elixir programmers who want to understand Erlang better.

EDIT (Apr. 17, 2023): my course was retired from PragProg on Apr. 21, 2020 and moved to Udemy, but is now retired from Udemy as well.

The post Modern Erlang for Beginners: my course appeared first on ostinelli|net.

Setting up multiple databases in Rails: the definitive guide

Roberto Ostinelli — Wed, 02 Dec 2015 18:00:34 +0000

There are different reasons why you might consider having multiple databases in your Ruby on Rails application. In my specific case scenario, I needed to store large quantities of data representing user behavior: clicks, pages visited, historical changes, and so on.

This kind of databases generally are not mission critical, and grow much faster (and larger) than most databases. Their requirements are often different: for instance, they need more storage space, are more tolerant in the face of hardware or software failures, and are write-intensive. For these reasons, sometimes it is interesting to separate them from your application’s primary database. Often, non-RDBMS databases are chosen for these kind of tasks, something which is however beyond the scope of this article.

I googled and read many different solutions, however I couldn’t find one that was able to fully cover how to:

Have different and isolated migrations and schemas for every database.
Use rails generators to create new migrations for every database, independently.
Offer database-specific rake tasks for the most common database operations (i.e. like the ones available for the primary database).
Integrate with RSpec’s default
```
spec
```
task.
Work with Database Cleaner.
Work on Heroku.

This is my take on how to solve all of these – and have a fully working multiple database solution for your Rails application.

Create the custom database files

For the purpose of this tutorial, we’re going to set up a second database called Stats. To do so, we’re going to duplicate how Rails handles the primary database, and stick to conventions.

First of all, create the file

config/database_stats.yml

and populate it as you do with the primary database’s config file. Your file will look something like this:

development:
  adapter: postgresql
  encoding: utf8
  host: localhost
  pool: 10
  database: myapp_stats_development
  username: postgres
  password:

test:
  adapter: postgresql
  encoding: utf8
  host: localhost
  pool: 10
  database: myapp_stats_test
  username: postgres
  password:

production:
  adapter: postgresql
  encoding: utf8
  url:  <%= ENV["DATABASE_STATS_URL"] %>
  pool: <%= ENV["DB_POOL"] || 5 %>

Note that I’ve given specific names to the databases, trying to follow as closely as possible Rails’ naming conventions. Also, I’ve set the database production url to an environment variable

DATABASE_STATS_URL

. This will allow us to easily set this variable to a secondary database when deploying to Heroku.

We’re now going to create a directory that will hold the schema and all the migrations of the Stats database, so that it will have its own files clearly isolated from the primary database. We are basically going to duplicate Rails’ primary database

db

directory.

Create the directory

db_stats

in the Rails root and ensure to copy the structure and files of the primary database

db

directory within it. You will have something like:

-- db
   |-- migrate
   schema.rb
   seeds.rb
-- db_stats
   |-- migrate
   schema.rb
   seeds.rb

The created files

schema.rb

and

seeds.rb

, together with the

migrate

directory, should just be empty.

Add Rake tasks

To handle the Stats database, and allow for its creation, migrations, schema dumping and other functionalities we’re going to need custom Rake tasks. These tasks will provide us with the same functionalities that Rails provides us for the primary database.

Create a new file

lib/tasks/db_stats.rake

, and paste the following:

task spec: ["stats:db:test:prepare"]

namespace :stats do

  namespace :db do |ns|

    task :drop do
      Rake::Task["db:drop"].invoke
    end

    task :create do
      Rake::Task["db:create"].invoke
    end

    task :setup do
      Rake::Task["db:setup"].invoke
    end

    task :migrate do
      Rake::Task["db:migrate"].invoke
    end

    task :rollback do
      Rake::Task["db:rollback"].invoke
    end

    task :seed do
      Rake::Task["db:seed"].invoke
    end

    task :version do
      Rake::Task["db:version"].invoke
    end

    namespace :schema do
      task :load do
        Rake::Task["db:schema:load"].invoke
      end

      task :dump do
        Rake::Task["db:schema:dump"].invoke
      end
    end

    namespace :test do
      task :prepare do
        Rake::Task["db:test:prepare"].invoke
      end
    end

    # append and prepend proper tasks to all the tasks defined here above
    ns.tasks.each do |task|
      task.enhance ["stats:set_custom_config"] do
        Rake::Task["stats:revert_to_original_config"].invoke
      end
    end
  end

  task :set_custom_config do
    # save current vars
    @original_config = {
      env_schema: ENV['SCHEMA'],
      config: Rails.application.config.dup
    }

    # set config variables for custom database
    ENV['SCHEMA'] = "db_stats/schema.rb"
    Rails.application.config.paths['db'] = ["db_stats"]
    Rails.application.config.paths['db/migrate'] = ["db_stats/migrate"]
    Rails.application.config.paths['db/seeds'] = ["db_stats/seeds.rb"]
    Rails.application.config.paths['config/database'] = ["config/database_stats.yml"]
  end

  task :revert_to_original_config do
    # reset config variables to original values
    ENV['SCHEMA'] = @original_config[:env_schema]
    Rails.application.config = @original_config[:config]
  end
end

This needs a little explanation: let’s break up this file in its main sections. First of all, we simply provide “proxies” to standard Rails database tasks, in a newly created Rake namespace

stats:db

task :drop do
  Rake::Task["db:drop"].invoke
end

task :create do
  Rake::Task["db:create"].invoke
end

task :setup do
  Rake::Task["db:setup"].invoke
end

task :migrate do
  Rake::Task["db:migrate"].invoke
end

[...]

Then, we loop all of these tasks, and ensure the task

stats:set_custom_config

is run before and the task

stats:revert_to_original_config

after every one of the “proxy” tasks:

# append and prepend proper tasks to all tasks defined in stats:db namespace
ns.tasks.each do |task|
  task.enhance ["stats:set_custom_config"] do
    Rake::Task["stats:revert_to_original_config"].invoke
  end
end

We have to do this since, unfortunately, Rails support for multiple databases isn’t that great, hence we need to provide minor hacks to make everything work. For this reason we have to set specific environment and configuration variables to custom values which match our Stats database before we run the “proxy” tasks, and then ensure that the original values are set back once those tasks have been run. The following two tasks do just that:

task :set_custom_config do
  # save current vars
  @original_config = {
    env_schema: ENV['SCHEMA'],
    config: Rails.application.config.dup
  }

  # set config variables for custom database
  ENV['SCHEMA'] = "db_stats/schema.rb"
  Rails.application.config.paths['db'] = ["db_stats"]
  Rails.application.config.paths['db/migrate'] = ["db_stats/migrate"]
  Rails.application.config.paths['db/seeds'] = ["db_stats/seeds.rb"]
  Rails.application.config.paths['config/database'] = ["config/database_stats.yml"]
end

task :revert_to_original_config do
  # reset config variables to original values
  ENV['SCHEMA'] = @original_config[:env_schema]
  Rails.application.config = @original_config[:config]
end

Notice how the lines 9-13 set values to the files and directories we have created in the previous steps.

Finally, if you’re using RSpec you can add one dependency to the

spec

task, to ensure that the Stats database is automatically prepared when tests are run:

task spec: ["stats:db:test:prepare"]

Once all of this is set up, we can create the Stats database and run its first migration:

$ rake stats:db:create
$ rake stats:db:migrate

This will generate the Stats database schema file in

db_stats/schema.rb

Add a custom generator

Unfortunately, we cannot simply use Rails’ generator

ActiveRecord::Generators::MigrationGenerator

because it hardcodes the parent directory of the migration (notice the path hardcoded to the directory

db/migrate

in line 4 here below):

def create_migration_file
  set_local_assigns!
  validate_file_name!
  migration_template @migration_template, "db/migrate/#{file_name}.rb"
end

Therefore, we need to have a custom generator to create migrations for the Stats database. However, we can still inherit from it and monkey patch this specific function. Create the following generator in

lib/generators/stats_migration_generator.rb

require 'rails/generators/active_record/migration/migration_generator'

class StatsMigrationGenerator < ActiveRecord::Generators::MigrationGenerator
  source_root File.join(File.dirname(ActiveRecord::Generators::MigrationGenerator.instance_method(:create_migration_file).source_location.first), "templates")

  def create_migration_file
    set_local_assigns!
    validate_file_name!
    migration_template @migration_template, "db_stats/migrate/#{file_name}.rb"
  end
end

In line 9 we set the directory base to the Stats database directory. Also, in line 4 we initialize the templates directory and point it at the original one used by the generator we’re inheriting from.

With all of this in place, we can now generate migrations for the Stats database:

$ rails g stats_migration create_clicks
      create  db_stats/migrate/20151201191642_create_clicks.rb

You’ll notice that the migration file gets created in the Stats database migrate directory

db_stats/migrate

. You can edit this file and then run your migrations with the Rake task that we’ve set up in the previous steps, just as you normally would do with your primary database:

$ rake stats:db:migrate

Finalize connection and models

We’re almost done. Add a new initializer file

config/initializers/db_stats.rb

and paste the following:

# save stats database settings in global var
DB_STATS = YAML::load(ERB.new(File.read(Rails.root.join("config","database_stats.yml"))).result)[Rails.env]

Notice that we reference the Stats database configuration file that we created in the first step here above. By doing this, we initialise a global variable DB_STATS that holds the current environment’s configuration of the Stats database.

Finally, we can set our models’ connection to this configuration. For example, let’s say that we have a

Click

model that corresponds to the migration here above. All you have to do is add one extra line that specifies which connection to use:

class Click < ActiveRecord::Base
  establish_connection DB_STATS

end

It’s that easy. Your model will now use the database Stats.

If you have multiple models that need to connect to the Stats database, however, you will need to add an extra step. If you were to have another model establishing its own connection to the Stats database, it would have its own connection pool and you might risk getting out of available connections to your Stats database. Therefore, if you have multiple models it is recommended to inherit from a single model, so that all the models connecting to the Stats database will share the same connection pool.

To do so, create the base model that connects to the Stats database:

class StatsBase < ActiveRecord::Base
  establish_connection DB_STATS
  self.abstract_class = true
end

You can now inherit in all your models:

class Click < StatsBase
end

class View < StatsBase
end

Heroku

As already anticipated, the last step that you need to make this work on Heroku is to set the environment variable

DATABASE_STATS_URL

to the database you want to use as Stats. For example, if you created a second database called

HEROKU_POSTGRESQL_TEAL_URL

all you have to do is to set this database’s value using the Heroku toolbelt:

$ heroku config:set DATABASE_STATS_URL=postgres://gsdfjrthjsnaew:gry6OJF6drDjththjkSDngldsf@ec2-116-22-114-221.compute-1.amazonaws.com:5432/hmsrthj24dfgks

And you’re ready to go.

Bonus: DatabaseCleaner

If you’re using the DatabaseCleaner gem, you can set it to clean the models that use the Stats database too. For example, your

spec/rails_helper.rb

may look something like this:

ENV["RAILS_ENV"] ||= 'test'
require 'spec_helper'
require File.expand_path("../../config/environment", __FILE__)
require 'rspec/rails'

Dir[Rails.root.join("spec/support/**/*.rb")].each { |f| require f }

ActiveRecord::Migration.maintain_test_schema!

RSpec.configure do |config|
  config.use_transactional_fixtures = false
  config.infer_spec_type_from_file_location!

  config.before(:suite) do
    DatabaseCleaner.clean_with(:truncation)
    DatabaseCleaner[:active_record, { model: Click }].clean_with(:truncation)
  end

  config.before(:each) do |example|
    unit_test = ![:feature, :request].include?(example.metadata[:type])
    strategy = unit_test ? :transaction : :truncation

    DatabaseCleaner.strategy = strategy
    DatabaseCleaner[:active_record, { model: Click }].strategy = strategy

    DatabaseCleaner.start
    DatabaseCleaner[:active_record, { model: Click }].start
  end

  config.after(:each) do
    DatabaseCleaner.clean
    DatabaseCleaner[:active_record, { model: Click }].clean
  end
end

According to DatabaseCleaner README, it should be possible to set a connection option instead of the model one. Unfortunately, my attempts at this have been unsuccessful. If anyone knows how to do this and avoid specifying a DatabaseCleaner strategy for every model, please let me know.

I hope you’ve enjoyed reading this, and that my ramblings can be helpful to someone going down this same path. As usual, any suggestions on how to improve any of this are warmly welcome.

Happy multiple db’ing! :)

The post Setting up multiple databases in Rails: the definitive guide appeared first on ostinelli|net.

An evaluation of Erlang global process registries: meet Syn

Roberto Ostinelli — Mon, 06 Jul 2015 09:44:43 +0000

Due to my personal interests and history, I often find myself building applications in field of the Internet Of Things. Most of the times I end up using Erlang: it is based on the Actor’s Model and is an ideological (and practical) perfect match to manage IoT interactions.

I recently built an application where devices can connect to, and interact with each other. Every device is identified via a unique ID (its serial number) and based on this ID the devices can send and receive messages. Nothing new here: it’s a standard messaging platform, which supports a custom protocol.

Due to the large amount of devices that I needed to support, this application runs on a cluster of Erlang nodes. Once a device connects to one of those nodes, the related TCP socket events are handled by a process running on that node. To send a message to a specific device, you send a message to the process that handles the devices’s TCP socket.

While building this application, I was early in the process faced with a very common problem: I needed a global process registry that would allow me to globally register a process based on its serial number, so that messages can be sent from anywhere in the cluster. This registry would need to have the following main characteristics:

Distributed.
Fast write speeds (>10,000 / sec).
Handle naming conflict resolution.
Allow for adding/removal of nodes.

Therefore I started to search for possible solutions (which included posting to the Erlang Questions mailing list), and these came out as my options:

Erlang’s
```
global
```
module.
Erlang’s
```
pg2
```
module.
Gproc.
CloudI Process Groups.
Roll out a custom solution.

The Stress Test

I decided to evaluate every one of these solutions based on a variety of considerations. However, I also wanted to see how they would perform when submitted to some kind of a stress test. Therefore, I defined and wrote a simple one that:

Launches a certain number of processes per node (for example, 25,000 processes per node).
Registers these processes (25,000 processes per node), each with a globally unique Key.
Waits for those Keys to be propagated to all the nodes.
Unregisters all of these processes.
Waits for those Keys to be removed from all the nodes.
Re-registers all of the processes, to check for unwanted effects of subsequent add/remove operations.
Again, waits for those Keys to be propagated to all the nodes.
Kills all the processes (this time, without previously unregistering them).
Waits for those Keys to be removed from all the nodes (to check for process monitoring).

The test measures how long each one of these steps takes.

The following is the code for this stress test. You can see that it defines a behaviour: this is to implement callback modules that match the different syntax used by the different libraries.

-module(process_registry_bench).

-export([start/3]).
-export([register/2, unregister/2]).
-export([register_on_node/2, unregister_on_node/2]).

-callback init() -> term().
-callback register(Key :: string(), pid()) -> term().
-callback unregister(Key :: string(), pid()) -> term().
-callback retrieve(Key :: string()) -> pid() | undefined.
-callback process_loop() -> any().

-define(MAX_RETRIEVE_WAITING_TIME, 60000).


start(CallbackModule, ProcessesCount, Nodes) ->
	%% connect
	connect_nodes(Nodes),

	%% callback init
	CallbackModule:init(),

	%% launch processes
	{UpperKey, PidInfos} = launch_processes(CallbackModule, ProcessesCount),

	%% benchmark: register
	{TimeReg, _} = timer:tc(?MODULE, register, [CallbackModule, PidInfos]),
	io:format("Registered processes in ~p sec, at a rate of ~p/sec~n", [
		TimeReg/1000000,
		ProcessesCount/TimeReg*1000000
	]),

	%% benchmark: registration propagation
	{RetrievedInMs1, RetrieveProcess1} = retrieve(pid, CallbackModule, UpperKey),
	io:format("Check that process with Key ~p was found: ~p in ~p ms~n", [
		UpperKey, RetrieveProcess1, RetrievedInMs1
	]),

	%% benchmark: unregister
	{TimeUnreg, _} = timer:tc(?MODULE, unregister, [CallbackModule, PidInfos]),
	io:format("Unregistered processes in ~p sec, at a rate of ~p/sec~n", [
		TimeUnreg/1000000,
		ProcessesCount/TimeUnreg*1000000
	]),

	%% benchmark: unregistration propagation
	{RetrievedInMs2, RetrieveProcess2} = retrieve(undefined, CallbackModule, UpperKey),
	io:format("Check that process with Key ~p was NOT found: ~p in ~p ms~n", [
		UpperKey, RetrieveProcess2, RetrievedInMs2
	]),

	%% benchmark: re-registering
	{TimeReg2, _} = timer:tc(?MODULE, register, [CallbackModule, PidInfos]),
	io:format("Re-registered processes in ~p sec, at a rate of ~p/sec~n", [
		TimeReg2/1000000,
		ProcessesCount/TimeReg2*1000000
	]),

	%% benchmark: re-registration propagation
	{RetrievedInMs3, RetrieveProcess3} = retrieve(pid, CallbackModule, UpperKey),
	io:format("Check that process with Key ~p was found: ~p in ~p ms~n", [
		UpperKey, RetrieveProcess3, RetrievedInMs3
	]),

	%% benchmark: monitoring
	io:format("Kill all processes~n", []),
	kill_processes(PidInfos),
	{RetrievedInMs4, RetrieveProcess4} = retrieve(undefined, CallbackModule, UpperKey),
	io:format("Check that process with Key ~p was NOT found: ~p in ~p ms~n", [
		UpperKey, RetrieveProcess4, RetrievedInMs4
	]).

connect_nodes(Nodes) ->
    [true = net_kernel:connect_node(Node) || Node <- Nodes].

launch_processes(CallbackModule, ProcessesCount) ->
	%% return the processes info in format [{Node, [{Key, Pid}]}, ...]
	Nodes = [node() | nodes()],
	ProcessesPerNode = round(ProcessesCount / length(Nodes)),
	UpperKey = integer_to_list(ProcessesPerNode * length(Nodes)),
	F = fun(Node, Acc) ->
		StartingKey = length(Acc) * ProcessesPerNode,
		Pids = launch_processes_on_node(CallbackModule, ProcessesPerNode, StartingKey, Node),
		[{Node, Pids} | Acc]
	end,
	{UpperKey, lists:foldl(F, [], Nodes)}.
launch_processes_on_node(CallbackModule, ProcessesPerNode, StartingKey, Node) ->
	%% return the key and process in a list of format [{Key, Pid}, ...]
	Seq = [
		integer_to_list(Key)
		|| Key <- lists:seq(StartingKey + 1, ProcessesPerNode + StartingKey)
	],
	[{Key, spawn(Node, CallbackModule, process_loop, [])} || Key <- Seq].

register(CallbackModule, PidInfos) ->
	%% register in parallel on all nodes
	F = fun({Node, NodePidInfos}, Acc) ->
		RpcKey = rpc:async_call(Node, ?MODULE, register_on_node, [
			CallbackModule, NodePidInfos
		]),
		[{Node, RpcKey} | Acc]
	end,
	RpcKeys = lists:foldl(F, [], PidInfos),
	%% wait for registration to complete on all nodes
	FResult = fun({Node, RpcKey}) ->
		Registered = rpc:yield(RpcKey),
		io:format("Registered ~p processes on node ~p~n", [Registered, Node])
	end,
	lists:foreach(FResult, RpcKeys).
register_on_node(CallbackModule, NodePidInfos) ->
	F = fun({Key, Pid}) ->
		CallbackModule:register(Key, Pid)
	end,
	lists:foreach(F, NodePidInfos),
	length(NodePidInfos).

retrieve(Expected, CallbackModule, Key) ->
	StartTime = epoch_time_ms(),
	retrieve(Expected, CallbackModule, Key, StartTime).
retrieve(pid, CallbackModule, Key, StartTime) ->
	%% wait for a pid to be returned
	case CallbackModule:retrieve(Key) of
		undefined ->
			timer:sleep(50),
			case epoch_time_ms() > StartTime + ?MAX_RETRIEVE_WAITING_TIME of
				true -> {error, timeout_during_retrieve};
				false -> retrieve(pid, CallbackModule, Key, StartTime)
			end;
		{error, Error} ->
			{error, Error};
		Pid ->
			RetrievedInMs = epoch_time_ms() - StartTime,
			{RetrievedInMs, Pid}
	end;
retrieve(undefined, CallbackModule, Key, StartTime) ->
	%% wait for undefined to be returned
	case CallbackModule:retrieve(Key) of
		undefined ->
			RetrievedInMs = epoch_time_ms() - StartTime,
			{RetrievedInMs, undefined};
		{error, Error} ->
			{error, Error};
		_Pid ->
			timer:sleep(50),
			case epoch_time_ms() > StartTime + ?MAX_RETRIEVE_WAITING_TIME of
				true -> {error, timeout_during_retrieve};
				false -> retrieve(undefined, CallbackModule, Key, StartTime)
			end
	end.

unregister(CallbackModule, PidInfos) ->
	%% unregister in parallel on all nodes
	F = fun({Node, NodePidInfos}, Acc) ->
		RpcKey = rpc:async_call(Node, ?MODULE, unregister_on_node, [
		CallbackModule, NodePidInfos
		]),
		[{Node, RpcKey} | Acc]
	end,
	RpcKeys = lists:foldl(F, [], PidInfos),
	%% wait for unregistration to complete on all nodes
	FResult = fun({Node, RpcKey}) ->
		Unregistered = rpc:yield(RpcKey),
		io:format("Unregistered ~p processes on node ~p~n", [Unregistered, Node])
	end,
	lists:foreach(FResult, RpcKeys).
unregister_on_node(CallbackModule, NodePidInfos) ->
	F = fun({Key, Pid}) ->
		CallbackModule:unregister(Key, Pid)
	end,
	lists:foreach(F, NodePidInfos),
	length(NodePidInfos).

kill_processes(PidInfos) ->
	F = fun({_Node, NodePidInfos}) ->
		[exit(Pid, kill) || {_Key, Pid} <- NodePidInfos]
	end,
	lists:foreach(F, PidInfos).

epoch_time_ms() ->
    {Mega, Sec, Micro} = os:timestamp(),
    (Mega * 1000000 + Sec) * 1000 + round(Micro / 1000).

To run this stress test:

process_registry_bench:start(CallbackModule, ProcessCount, Nodes).

For instance, to launch it with the callback module

global_bench

for 100,000 processes running on a cluster of 4 nodes

['1@127.0.0.1', '2@127.0.0.1', '3@127.0.0.1', '4@127.0.0.1']

process_registry_bench:start(global_bench, 100000, [
    '1@127.0.0.1',
    '2@127.0.0.1',
    '3@127.0.0.1',
    '4@127.0.0.1'
]).

Running this test returns an output similar to:

Registered 25000 processes on node '1@127.0.0.1'
Registered 25000 processes on node '2@127.0.0.1'
Registered 25000 processes on node '3@127.0.0.1'
Registered 25000 processes on node '4@127.0.0.1'
Registered processes in 6.385835 sec, at a rate of 15659.659230155492/sec
Check that process with Key "100000" was found: <6218.25065.0> in 0 ms
Unregistered 25000 processes on node '1@127.0.0.1'
Unregistered 25000 processes on node '2@127.0.0.1'
Unregistered 25000 processes on node '3@127.0.0.1'
Unregistered 25000 processes on node '4@127.0.0.1'
Unregistered processes in 4.481706 sec, at a rate of 22312.93172733776/sec
Check that process with Key "100000" was NOT found: undefined in 0 ms
Registered 25000 processes on node '1@127.0.0.1'
Registered 25000 processes on node '2@127.0.0.1'
Registered 25000 processes on node '3@127.0.0.1'
Registered 25000 processes on node '4@127.0.0.1'
Re-registered processes in 4.943493 sec, at a rate of 20228.611631492146/sec
Check that process with Key "100000" was found: <6218.25065.0> in 0 ms
Kill all processes
Check that process with Key "100000" was NOT found: undefined in 0 ms
ok

The Process Registry Libraries

The following are the considerations that I made for every solution.

1. Erlang’s native global module

Considerations

The Erlang global module has native functionalities to support a global process registry. I was not particularly attracted to it, because:

I always think that this module should be used to identify application’s long-running services.
I didn’t know if millions of entries can be supported. This module wasn’t built with my use case in mind: as per my previous point, it is generally used to register long-running processes.
It has a locking mechanism to ensure that the registration is atomic. I felt this could become a serious bottleneck to the registration of processes.

However, this is a native Erlang module, which also allows to define a resolve function to be used for conflict resolution (i.e. in case of race conditions, or during net splits, when a Key gets registered simultaneously on two different nodes). It is able to satisfy the distributed requirements out of the box, with no need for additional libraries.

Stress Test

I gave it a go at my stress test, with the following callback module:

-module(global_bench).
-behaviour(process_registry_bench).

-export([init/0]).
-export([register/2, unregister/2]).
-export([retrieve/1]).
-export([process_loop/0]).

init() ->
	ok.

register(Key, Pid) ->
	yes = global:register_name(Key, Pid).

unregister(Key, _Pid) ->
	global:unregister_name(Key).

retrieve(Key) ->
	global:whereis_name(Key).

process_loop() ->
	receive
		_ -> ok
	end.

Note that

process_loop

(which is the loop running in the processes) does nothing, except keeping the process alive.

The results of the stress test are:

	1 Node	2 Nodes	3 Nodes	4 Nodes
Reg / second	27,233	2,673	1,997	1,579
Retrieve registered Key (ms)	0	0	0	0
Unreg / second	29,491	2,908	2,206	1,596
Retrieve unregistered Key (ms)	0	0	0	0
Re-Reg / second	27,149	2,993	2,131	2,542
Retrieve re-registered Key (ms)	0	0	0	0
Retrieve Key of killed Pid (ms)	0	timeout	timeout	timeout

Conclusions

The locking mechanism heavily influences the decrease in performance that can be seen when adding nodes. With a cluster of 2+ nodes we already are under the spec of 10,000 registrations / second.
The monitoring of processes is slow. After having killed all the processes, in a cluster of 2+ nodes it takes more than 60 seconds to have
```
global:whereis_name/1
```
return
```
undefined
```
(this is what timeout means in the table here above). I had to decrease the number of processes to around 80,000 to have the stress test pass in a cluster of 4 nodes, and it would take around 55 seconds for a killed process’ Key to be removed from the registry.

For these reasons, it didn’t look like I could use this module.

2. Erlang’s native pg2 module

Considerations

Erlang pg2 module has native functionalities to support a global process registry. I was not particularly attracted to it, because:

This library handles Process Groups, which is very different from handling unique Registered Names. We can use it for our purpose though, by basically creating Groups with a single entry. These groups are named according to our Keys, and every Group has a single entry: the Pid that we are registering. This is kind of a trick, but it’s not a showstopper.
Having Process Groups basically means that conflict resolution isn’t covered. If two processes are registered on different nodes with the same Key (because of race conditions or during a net split) this will result in having a Process Group with two elements instead of one. Sometimes this is fine; however, I wanted to ensure that there would be a clearly identified single Pid per device in the whole system. Not a showstopper either, but a turn-off.
I didn’t know if millions of entries can be supported. This module wasn’t built with my use case in mind.
Here too, it has a locking mechanism to ensure that the registration is atomic which could become a bottleneck to the registration of processes.

Stress Test

Here’s the callback module:

-module(pg2_bench).
-behaviour(process_registry_bench).

-export([init/0]).
-export([register/2, unregister/2]).
-export([retrieve/1]).
-export([process_loop/0]).

init() ->
	ok.

register(Key, Pid) ->
	ok = pg2:create(Key), %% create group
	ok = pg2:join(Key, Pid). %% add pid

unregister(Key, _Pid) ->
	ok = pg2:delete(Key).

retrieve(Key) ->
	case pg2:get_members(Key) of
		{error, {no_such_group, Key}} -> undefined;
		[] -> undefined;
		[Pid] -> Pid
	end.

process_loop() ->
	receive
		_ -> ok
	end.

The results of the stress test are:

	1 Node	2 Nodes	3 Nodes	4 Nodes
Reg / second	25,062	3,823	2,914	1,862
Retrieve registered Key (ms)	0	0	0	0
Unreg / second	39,522	6,903	5,191	3,425
Retrieve unregistered Key (ms)	0	0	0	0
Re-Reg / second	25,701	3,794	2,783	1,817
Retrieve re-registered Key (ms)	0	0	0	0
Retrieve Key of killed Pid (ms)	timeout	timeout	timeout	timeout

Conclusions

The locking mechanism heavily influences the decrease in performance that can be seen when adding nodes. With a cluster of 2+ nodes we already are under the spec of 10,000 registrations / second.
The monitoring of processes is slow. After having killed all the processes, even on a single nodes it takes more than 60 seconds to have
```
pg2:get_members/1
```
return that the group no longer exits. I had to decrease the number of processes to around 45,000 to have the stress test pass in a cluster of 4 nodes, and it would take a little less than 60 seconds for a killed process’ Key to be removed from the registry.

For these reasons, it didn’t look like I could use this module.

3. Gproc

Considerations

gproc is a well-known process registry which is normally used for the additional features that it provides on top of Erlang’s native process dictionary (for instance, it is able to provide pub/sub patterns). It is a solid and well-supported library, and you can often see Ulf Wiger (one of the library’s authors) generously providing support for it.

However, there were some concerns I had:

For the distributed part it relies on
```
gen_leader
```
, on which I’ve heard too many horror stories (maybe that’s not a thing anymore). Ulf pointed me to a gproc branch that uses locks_leader, where he is mainly concentrating his efforts for gproc’s support for distributed operations.
I felt that the main purpose of this library is not to provide a distributed process registry as much as extending the existing Erlang registration mechanisms with some additional features. The README in gproc’s Github page clearly depicts it as being an “Extended process dictionary”; it just felt that the distributed part hasn’t been the primary focus in the development of this library.
I could not understand how conflict resolution is managed in a distributed environment.

Stress Test

Here’s the callback module:

-module(gproc_bench).
-behaviour(process_registry_bench).

-export([init/0]).
-export([register/2, unregister/2]).
-export([retrieve/1]).
-export([process_loop/0]).

init() ->
	%% start app on every node
	Nodes = [node() | nodes()],
	F = fun(Node) ->
		rpc:call(Node, application, ensure_all_started, [gproc]),
		rpc:call(Node, gproc_dist, start_link, [Nodes])
	end,
	lists:foreach(F, Nodes).

register(Key, Pid) ->
	Pid ! {self(), reg, Key},
	receive
		done -> ok
	end.

unregister(Key, Pid) ->
	Pid ! {self(), unreg, Key},
	receive
		done -> ok
	end.

retrieve(Key) ->
	case catch gproc:lookup_pid({n, g, Key}) of
		{'EXIT', _} -> undefined;
		Pid -> Pid
	end.

process_loop() ->
	receive
		{Sender, reg, Key} ->
			gproc:reg({n, g, Key}, ignored),
			Sender ! done,
			process_loop();
		{Sender, unreg, Key} ->
			gproc:unreg({n, g, Key}),
			Sender ! done,
			process_loop()
	end.

Note: in gproc, to ensure thread safety, a process can only set its own values. That’s why the

register/2

and

unregister/2

callbacks here above send messages to the processes, which then register or unregister themselves (see

process_loop

). As you can see here above I’ve decided to provide a locking call for these functions (by using a

receive

block), to emulate the locking calls that I’ve used in the other libraries.

The results of the stress test are:

	1 Node	2 Nodes	3 Nodes	4 Nodes
Reg / second	67,011	19,111	22,048	15,659
Retrieve registered Key (ms)	0	0	0	0
Unreg / second	118,228	22,845	24,282	22,312
Retrieve unregistered Key (ms)	0	0	0	0
Re-Reg / second	127,200	22,115	25,884	20,228
Retrieve re-registered Key (ms)	0	0	0	0
Retrieve Key of killed Pid (ms)	178	1,890	7,584	10,600

Conclusions

These are overall very good results.
I didn’t need to reduce the process count to make all of the test pass.
The monitoring of processes can be optimized. After having killed all the processes, on a cluster of 4 nodes it takes >10 seconds for
```
gproc:lookup_pid/1
```
to not find the Pid once a process has exited.
Unfortunately, I had some inconsistent results running this test in a cluster of 2+ nodes. Often, the test could not retrieve the registered Key (after the first registration round) in less than 60 second, and timed out.

I was a little skeptical though on the inconsistency that I saw in the test results, which might be related to the

gen_leader

issues that I’ve occasionally heard about. The author’s choice to move towards

locks_leader

might be a sign of this. Despite these thoughts, this looked like a good potential candidate.

4. CloudI Process Groups

Considerations

cpg is an actively maintained library, and his main author Michael Truog is often very available to discuss his choices and provide support. cpg deals with Process Groups and not unique Registered Names, therefore my concerns where similar to the ones I had with pg2:

Handling Process Groups is very different from handling unique Registered Names. We can use the same trick used with pg2, i.e. creating Process Groups named with Key, with a single entry (the Pid).
Here too, having Process Groups basically means that conflict resolution isn’t covered. This made me a little uncomfortable because I wanted to ensure that there would be a clearly identified single Pid per device in the whole system.

Stress Test

Here’s the callback module:

-module(cpg_bench).
-behaviour(process_registry_bench).

-export([init/0]).
-export([register/2, unregister/2]).
-export([retrieve/1]).
-export([process_loop/0]).

init() ->
	%% start app on every node
	Nodes = [node() | nodes()],
	[rpc:call(Node, reltool_util, application_start, [cpg]) ||  Node <- Nodes].

register(Key, Pid) ->
	ok = cpg:join(Key, Pid).

unregister(_Key, Pid) ->
	ok = cpg:leave(Pid).

retrieve(Key) ->
	case catch cpg:get_members(Key) of
		{ok, Key, [Pid]} -> Pid;
		{error, {no_such_group, Key}} -> undefined;
		Error -> {error, Error}
	end.

process_loop() ->
	receive
		_ -> ok
	end.

The results of the stress test are:

	1 Node	2 Nodes	3 Nodes	4 Nodes
Reg / second	110,198	42,680	20,703	8,488
Retrieve registered Key (ms)	0	0	0	0
Unreg / second	109,374	32,264	25,599	15,128
Retrieve unregistered Key (ms)	0	1	0	0
Re-Reg / second	126,791	30,862	32,138	20,791
Retrieve re-registered Key (ms)	0	0	0	0
Retrieve Key of killed Pid (ms)	error	error	error	error

Conclusions

These are overall very good results.
I was surprised of the major drop in a cluster of 4 nodes. I run this test multiple times and it always returned similar results.
The monitoring of processes didn’t work appropriately. Even on a single node, the test experienced an internal timeout:

{'EXIT',
 {timeout,
  {gen_server,
   call,
   [cpg_default_scope,
    {get_members,
     "100000"}]}}}

I had to decrease the number of processes to around 25,000 to have the stress test pass in a cluster of 4 nodes. The monitoring issue didn’t make me feel particularly at ease, however this library did look like a potential candidate.

5. Custom Solution: Syn

Considerations

Since it became clear that I could not use Erlang’s native

global

pg2

modules, and that the two other libraries I looked into were candidates but each one with their own little twerks, I decided to try a custom solution, which I called

syn

(short for synonym).

In any distributed system you are faced with a consistency challenge, which is often resolved by having one master arbiter performing all write operations (chosen with a mechanism of leader election), or through atomic transactions. As said here above, I needed a global process registry for an application of the IoT field. In this context, Keys used to identify a process are often the physical object’s unique identifier (for instance, its serial or mac address), and are therefore already defined and unique before hitting the system. The consistency challenge is less of a problem in this case, since the likelihood of concurrent incoming requests that would register processes with the same Key is extremely low and, in most cases, acceptable.

Therefore, Availability has been chosen over Consistency and Syn is eventually consistent.

Under the hood, Syn performs dirty reads and writes into a distributed in-memory Mnesia table, replicated across all the nodes of the cluster. This made me feel comfortable that I wouldn’t need to reinvent the replication mechanisms of Erlang’s native DB, however I needed a way to handle conflict resolution and net splits. For this reason, Syn can automatically manage conflict resolution by implementing a specialized and simplified version of the mechanisms used in Ulf Wiger’s unsplit framework.

You can read more about Syn in its github repo.

Stress Test

Here’s the callback module:

-module(syn_bench).
-behaviour(process_registry_bench).

-export([init/0]).
-export([register/2, unregister/2]).
-export([retrieve/1]).
-export([process_loop/0]).

init() ->
	%% start app on every node
	Nodes = [node() | nodes()],
	F = fun(Node) ->
		rpc:call(Node, syn, start, []),
		rpc:call(Node, syn, init, [])
	end,
	lists:foreach(F, Nodes).

register(Key, Pid) ->
	ok = syn:register(Key, Pid).

unregister(Key, _Pid) ->
	ok = syn:unregister(Key).

retrieve(Key) ->
	syn:find_by_key(Key).

process_loop() ->
	receive
		_ -> ok
	end.

The results of the stress test are:

	1 Node	2 Nodes	3 Nodes	4 Nodes
Reg / second	106,324	52,792	60,958	40,929
Retrieve registered Key (ms)	0	0	0	56
Unreg / second	105,506	50,591	67,042	42,896
Retrieve unregistered Key (ms)	0	0	0	0
Re-Reg / second	106,424	51,322	77,258	47,125
Retrieve re-registered Key (ms)	0	0	0	0
Retrieve Key of killed Pid (ms)	719	995	1,577	1,825

Conclusions

These are overall very good results. I’m not sure why Syn is performing better with 3 nodes than with 2 (and I’ve repeated this test more than once).
I didn’t need to reduce the process count to make all of the test pass.
The monitoring of processes worked appropriately.

Final notes

I want to stress out how comparisons and these tests are difficult to perform. Every library behaves differently, and it is hard (if not impossible) to define some kind of a common stress test to allow for a better understanding of their performance levels. I gave it a go, but looking at the above definition of my stress test for instance I ask myself: “Why did I set the process count to 100,000? I can see that most libraries behave fine with lower numbers”. Also, “What would happen if instead of registering processes sequentially in a single process per node, we had them register themselves simultaneously, therefore increasing the load on the registry?”. More importantly, “Does this test represent some kind of real life scenario?”.

This article wants to share my thoughts and how I ended up writing Syn. Sure, Syn performs well in the defined use case and stress test, but this does in no way mean that the other libraries here won’t perform way better in other stress tests and scenarios. I’d actually be glad to know that someone else is willing to take the time to evaluate these, and other, global process registries. They are a kind of holy grail; and let’s remember that anything distributed is never easy, nor given.

As a final note, I’d enjoy reading comments from the library authors or other Erlang enthusiasts. This is such a delicate matter that I’d love to have a healthy exchange of opinions, hopefully contributing to improving all of our experiences.

The post An evaluation of Erlang global process registries: meet Syn appeared first on ostinelli|net.

How to build a Rails API server: Optimizing the framework

Roberto Ostinelli — Tue, 23 Jun 2015 16:54:44 +0000

I have been developing Rails JSON API applications for quite some time now, and I’d like to share a few of my setups and discuss why I do things this way. I’m starting today a series of articles that will cover up pretty much the steps I take every time I bootstrap a new Rails JSON API application.

One of the first things I do is to ensure I’m optimizing Rails for speed. I basically optimize the framework itself, prior coding any specific application logic.

You may have heard before that “Premature optimization is the root of all evil“. However, “Premature optimization is a phrase used to describe a situation where a programmer lets performance considerations affect the design of a piece of code”, which “can result in a design that is not as clean as it could have been or code that is incorrect, because the code is complicated by the optimization and the programmer is distracted by optimizing” (source: WikiPedia). This is not what we’re doing here: we’re just going to apply a few changes to Rails, and then basically forget about those and start coding in a framework that is optimized to serve our API.

Many of Rails functionalities are simply not needed when building an API server, and by stripping down Rails to a bare minimum we can actually achieve pretty significant performance increases.

Greenfield Ruby On Rails

Let’s first see what an empty project can achieve. I’m currently using Ruby 2.2.2 and Rails 4.2.1. Let’s create a new Rails application:

rails new api_greenfield -T

Let’s add a production server. For the scope of this post, it’s not really important what we use, as long as it’s a server that we can use in production. We are going to benchmark the results we get after applying our changes to Rails, so the absolute values resulting from our benchmarks are not as important as the relative improvements that we see in speed.

We’re going to use Puma, as it is now the recommended Ruby webserver by Heroku (and as I host most of my applications there, using it has become my default choice). Add it to the project Gemfile:

source 'https://rubygems.org'
ruby '2.2.2'

gem 'rails', '4.2.1'
gem 'sqlite3'

gem 'puma'

Then

bundle install

. Create a Puma configuration file

config/puma.rb

and set the following basic params:

workers 4
threads_count = 1
threads threads_count, threads_count

preload_app!

rackup DefaultRackup
port ENV['PORT'] || 3000
environment ENV['RAILS_ENV'] || 'development'

on_worker_boot do
  # Worker specific setup for Rails 4.1+
  # See: https://devcenter.heroku.com/articles/deploying-rails-applications-with-the-puma-web-server#on-worker-boot
  ActiveRecord::Base.establish_connection
end

We now need to set up a simple response page that we will hit with our benchmarks. We’re going to create a controller and an action that responds with a JSON body to the entry point

/benchmarks/simple

. To do so, let’s create

benchmarks_controller.rb

class BenchmarksController < ApplicationController

  def simple
    # example from http://json.org/example
    json = {
      glossary: {
        title: "example glossary",
        gloss_div: {
          title: "S",
          gloss_list: {
            gloss_entry: {
              id: "SGML",
              sort_as: "SGML",
              gloss_term: "Standard Generalized Markup Language",
              acronym: "SGML",
              abbrev: "ISO 8879:1986",
              gloss_def: {
                para: "A meta-markup language, used to create markup languages such as DocBook.",
                gloss_see_also: ["GML", "XML"]
              },
              gloss_see: "markup"
            }
          }
        }
      }
    }

    render json: json
  end
end

Set the routes for this controller:

Rails.application.routes.draw do
  resources :benchmarks, only: :none do
    collection do
      get :simple
    end
  end
end

Start Puma in production:

RAILS_ENV=production bundle exec puma -C config/puma.rb

Verify that Rails responds with our JSON body at the chosen entry point:

$ curl -H "Content-type: application/json" http://127.0.0.1:3000/benchmarks/simple
{"glossary":{"title":"example glossary","gloss_div":{"title":"S","gloss_list":{"gloss_entry":{"id":"SGML","sort_as":"SGML","gloss_term":"Standard Generalized Markup Language","acronym":"SGML","abbrev":"ISO 8879:1986","gloss_def":{"para":"A meta-markup language, used to create markup languages such as DocBook.","gloss_see_also":["GML","XML"]},"gloss_see":"markup"}}}}}

The server is up and ready. We can now benchmark our greenfield Rails application running with Puma. We will use the basic Apache Benchmark tool to do so.

$ ab -c 5 -n 10000 -H "Content-type: application/json" http://127.0.0.1:3000/benchmarks/simple
This is ApacheBench, Version 2.3 <$Revision: 1604373

gt; Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/ Licensed to The Apache Software Foundation, http://www.apache.org/ Benchmarking 127.0.0.1 (be patient) Completed 1000 requests Completed 2000 requests Completed 3000 requests Completed 4000 requests Completed 5000 requests Completed 6000 requests Completed 7000 requests Completed 8000 requests Completed 9000 requests Completed 10000 requests Finished 10000 requests Server Software: Server Hostname: 127.0.0.1 Server Port: 3000 Document Path: /benchmarks/simple Document Length: 369 bytes Concurrency Level: 5 Time taken for tests: 4.676 seconds Complete requests: 10000 Failed requests: 0 Total transferred: 6990000 bytes HTML transferred: 3690000 bytes Requests per second: 2138.53 [#/sec] (mean) Time per request: 2.338 [ms] (mean) Time per request: 0.468 [ms] (mean, across all concurrent requests) Transfer rate: 1459.79 [Kbytes/sec] received Connection Times (ms) min mean[+/-sd] median max Connect: 0 0 0.0 0 0 Processing: 1 2 1.0 2 27 Waiting: 1 2 1.0 2 27 Total: 1 2 1.0 2 27 Percentage of the requests served within a certain time (ms) 50% 2 66% 2 75% 3 80% 3 90% 3 95% 4 98% 4 99% 5 100% 27 (longest request)

This is actually not bad at all! A greenfield Rails project is able to sustain 2,138 req/sec. Obviously, this is without any application logic, nor database calls, but it is still a good starting point.

The Rails API gem

The Rails API gem is “a subset of a normal Rails application, created for applications that don’t require all functionality that a complete Rails application provides. It is a bit more lightweight, and consequently a bit faster than a normal Rails application. The main example for its usage is in API applications only, where you usually don’t need the entire Rails middleware stack nor template generation”. Note that Rails API will be part of Rails 5, but for now we still have to include the gem:

source 'https://rubygems.org'
ruby '2.2.2'

gem 'rails', '4.2.1'
gem 'rails-api'
gem 'sqlite3'

gem 'puma'

Don’t forget to

bundle install

. Then, change our

benchmarks_controller.rb

to inherit from the Rails::API Action Controller:

class BenchmarksController < ActionController::API

Also, comment out in

application_controller.rb

# protect_from_forgery with: :exception

Let’s try a new benchmark (portions omitted):

$ ab -c 5 -n 10000 -H "Content-type: application/json" http://127.0.0.1:3000/benchmarks/simple

[...]

Concurrency Level:      5
Time taken for tests:   4.220 seconds
Complete requests:      10000
Failed requests:        0
Total transferred:      6990000 bytes
HTML transferred:       3690000 bytes
Requests per second:    2369.39 [#/sec] (mean)
Time per request:       2.110 [ms] (mean)
Time per request:       0.422 [ms] (mean, across all concurrent requests)
Transfer rate:          1617.39 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0    0   0.0      0       0
Processing:     1    2   0.9      2      27
Waiting:        1    2   0.9      2      27
Total:          1    2   0.9      2      27

Percentage of the requests served within a certain time (ms)
  50%      2
  66%      2
  75%      2
  80%      3
  90%      3
  95%      3
  98%      4
  99%      4
 100%     27 (longest request)

We now see a response rate of 2,369 req/sec, which is an increase in performance of ~11% over greenfield Rails. This is a modest improvement, but an improvement nonetheless.

OJ

Rails’ default JSON serializer isn’t the fastest out there, so let’s swap it for Oj:

source 'https://rubygems.org'
ruby '2.2.2'

gem 'rails', '4.2.1'
gem 'rails-api'
gem 'sqlite3'

gem 'puma'

gem 'oj'
gem 'oj_mimic_json'

Let’s run the benchmark with Oj (portions omitted):

$ ab -c 5 -n 10000 -H "Content-type: application/json" http://127.0.0.1:3000/benchmarks/simple

[...]

Concurrency Level:      5
Time taken for tests:   4.040 seconds
Complete requests:      10000
Failed requests:        0
Total transferred:      6990000 bytes
HTML transferred:       3690000 bytes
Requests per second:    2475.34 [#/sec] (mean)
Time per request:       2.020 [ms] (mean)
Time per request:       0.404 [ms] (mean, across all concurrent requests)
Transfer rate:          1689.71 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0    0   0.3      0      27
Processing:     1    2   0.8      2      29
Waiting:        1    2   0.8      1      29
Total:          1    2   0.9      2      29
WARNING: The median and mean for the waiting time are not within a normal deviation
        These results are probably not that reliable.

Percentage of the requests served within a certain time (ms)
  50%      2
  66%      2
  75%      2
  80%      3
  90%      3
  95%      3
  98%      3
  99%      4
 100%     29 (longest request)

We can see a small improvement here, which is practically irrelevant (~4%) as we hit 2,475 req/sec. The switch to Oj is going to be more relevant the bigger the JSON objects to serialize are, but at this stage it doesn’t hurt to keep Oj in here.

ActionController::Metal

It is now time to give the final boost, by:

Removing unnecessary railties.
Using Rails’ ActionController::Metal instead of the base controllers that our BenchmarkController has inherited from until now.

First, remove unnecessary imports from

application.rb

(your mileage may vary – this is my standard setup and I’ve rarely needed anything else):

# require "active_model/railtie"
# require "active_job/railtie"
require "active_record/railtie"
# require "action_controller/railtie"
require "action_mailer/railtie"
# require "action_view/railtie"
# require "sprockets/railtie"

Second (and this is what is really going to make a difference), we’re going to create a new controller that all of our API controllers are going to inherit from. Let’s create our base

api_controller.rb

class ApiController < ActionController::Metal
  abstract!

  include AbstractController::Callbacks
  include ActionController::RackDelegation
  include ActionController::StrongParameters

  private

  def render(options={})
    self.status = options[:status] || 200
    self.content_type = 'application/json'
    body = Oj.dump(options[:json], mode: :compat)
    self.headers['Content-Length'] = body.bytesize.to_s
    self.response_body = body
  end

  ActiveSupport.run_load_hooks(:action_controller, self)
end

As you can see, in this controller we define our custom render method. By default, I’ve already included the three modules that I basically use everywhere:

AbstractController::Callbacks which allows you to set callbacks such as
```
before_action
```
in your controllers.
ActionController::RackDelegation which is needed to set the
```
response_body
```
(called in the
```
render
```
method).
ActionController::StrongParameters which allows you to use Strong Params in your controllers.

Other modules that you might want to include here are, for instance:

ActionController::HttpAuthentication::Token::ControllerMethods to use the
```
authenticate_with_http_token
```
helper method if you are going to use token authentication in your API.
ActionController::HttpAuthentication::Basic::ControllerMethods to use the
```
authenticate_with_http_basic
```
helper method if you are going to use basic authentication in your API.

Now for our benchmarks, let’s ensure that

benchmarks_controller.rb

inherits from our newly created controller:

class BenchmarksController < ApiController

Here are the results of the benchmark that includes all of above changes (portions omitted):

$ ab -c 5 -n 10000 -H "Content-type: application/json" http://127.0.0.1:3000/benchmarks/simple

[...]

Concurrency Level:      5
Time taken for tests:   2.377 seconds
Complete requests:      10000
Failed requests:        0
Total transferred:      7200000 bytes
HTML transferred:       3690000 bytes
Requests per second:    4206.19 [#/sec] (mean)
Time per request:       1.189 [ms] (mean)
Time per request:       0.238 [ms] (mean, across all concurrent requests)
Transfer rate:          2957.48 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0    0   0.0      0       0
Processing:     1    1   0.4      1       5
Waiting:        0    1   0.4      1       5
Total:          1    1   0.4      1       5

Percentage of the requests served within a certain time (ms)
  50%      1
  66%      1
  75%      1
  80%      1
  90%      2
  95%      2
  98%      2
  99%      2
 100%      5 (longest request)

This time the impact is notable, as we hit 4,206 req/sec.

Final Touch

With our latest ApiController, we are not using the controller that the Rails API gem exposes to us. Therefore, let’s remove the gem:

source 'https://rubygems.org'
ruby '2.2.2'

gem 'rails', '4.2.1'
# gem 'rails-api'
gem 'sqlite3'

gem 'puma'

gem 'oj'
gem 'oj_mimic_json'

However, the Rails API gem did other interesting things under the hood, such as disabling some unnecessary Rails middleware. Since we removed it, we now need to do so ourselves. Add to

application.rb

module ApiGreenfield
  class Application < Rails::Application

    [...]

    # remove unnecessary middleware
    config.middleware.delete Rack::Sendfile
    config.middleware.delete Rack::MethodOverride
    config.middleware.delete ActionDispatch::Cookies
    config.middleware.delete ActionDispatch::Session::CookieStore
    config.middleware.delete ActionDispatch::Flash
  end
end

Running the benchmark returns the previous results, so we can safely say we don’t need the Rails API gem anymore.

Conclusions

We have started with a greenfield Rails project, and have gradually applied changes to improve the speed performance of a simple benchmarked application:

Version	Req/sec	Increase
Greenfield Rails	2,138	-
+ Rails API Gem	2,369	+11%
+ Rails API Gem + Oj	2,475	+15%
+ Oj + ActionController::Metal + Custom middleware	4,206	+97%

Overall, we experienced an increase from 2,138 to 4,206 req/sec, which is doubling the initial performance of a greenfield Rails application.

For additional boosts, you may consider caching techniques (such as partial JSON caching), which are application dependent and are therefore out of scope here.

Happy API’ing!

The post How to build a Rails API server: Optimizing the framework appeared first on ostinelli|net.

GIN: a JSON-API framework in Lua

Roberto Ostinelli — Thu, 18 Jun 2015 12:03:53 +0000

GIN is a fast, low-latency, low-memory footprint, web JSON-API framework with Test Driven Development helpers and patterns. It is a framework that I open sourced some time ago, as an experiment.

It all started when I heard so many good things about Lua that I wanted to see it in action and find a project where I could unleash its power. Being an API fan, it came natural for me to build a JSON-API server framework. And GIN was born.

GIN is currently in its early stage, but it already enables fast development, TDD and ease of maintenance. It is helpful when you need an extra-boost in performance and scalability, since it is entirely written in Lua and it runs embedded in a packaged version of nginx called OpenResty. For those not familiar with Lua, don’t let that scare you away: Lua is really easy to use, very fast and simple to get started with.

Controllers

The syntax of a controller is extremely simple. For instance, a simple controller that returns an application’s version information looks like:

local InfoController = {}

function InfoController:root()
    return 200, { id = 'my_api', description = 'An API server powered by GIN.' }
end

return InfoController

The return statement specifies:

The HTTP code 200
The body of the response, which gets encoded by GIN into JSON as:

{
  "id": "my_api",
  "description": "An API server powered by GIN."
}

Most of standard JSON-API paradigms are already embedded in GIN.

Versioning

For instance, Versioning is extremely simple. Major versioning is baked into GIN. Based on directory naming conventions, the controllers that correspond to the versioning specified in the header will be called by GIN automatically. Gin uses the HTTP header Accept to version your API. The header has the format:

application/vnd.{application-name}.v{version}+json

So for instance, if your application name (as defined in the file ./config/application.lua) is demo, a client trying to access the version 1 of your API will have to send the header:

Accept: application/vnd.demo.v1+json

Gin only accepts requests that provide JSON in the body, hence the +json portion of the Accept header.

Routing

Routing is very easy, too. For instance, this is an example of a routing file that handles multiple versions:

local routes = require 'gin.core.routes'

-- define version 1
local v1 = routes.version(1)

-- define routes
v1:GET("/users", { controller = "users", action = "index" })
v1:POST("/users", { controller = "users", action = "create" })

-- define version 2
local v2 = routes.version(2)

-- define routes
v2:GET("/users", { controller = "users", action = "index" })
v2:POST("/users", { controller = "users", action = "create" })
v2:GET("/users/:id", { controller = "users", action = "show" })

return routes

Errors

Moreover, Errors are defined globally at application level. This allows to keep track of error numbering and descriptions in a single file. For example, an error file is defined as:

local Errors = {
    [1000] = { status = 400, message = "Invalid request.", headers = { ["X-Header"] = "header" } },
}
return Errors

The global variable Errors contains the entries for all possible errors of your application. Every item in this table defines an error, where:

The 1000 key is the error number.
status (required): is the HTTP status code.
message (required): is the error description.
headers (optional): are the additional headers to be returned in the response.

When this is defined, then raising these errors from a controller is as simple as calling:

self:raise_error(1000)

As per this example, this will return a HTTP status 400, with the specified headers and the JSON body:

{
  "code": 1000,
  "message": "Invalid request."
}

Testing

However, this wouldn’t be a complete framework if it didn’t provide an easy way to test your application. The main testing framework embedded in GIN is Busted, which has an RSpec-like syntax. If you ever used Jasmine you’ll feel right at home. In addition, Gin provides test helpers for you to use.

Controller Tests in Gin are actually integration tests. An instance of OpenResty nginx will be started and a real request will be issued to a test server. The main Gin test helper you’ll need to test Controllers is the method hit, that will perform the integration test for you.

Here’s an example of a controller test (file located in ./spec/controllers/1/info_controller_spec.lua), that is a valid test for the InfoController shown here above:

require 'spec.spec_helper'

describe("InfoController", function()
    describe("#root", function()
        it("responds with application info", function()
            local response = hit({
                method = 'GET',
                path = "/"
            })

            assert.are.same(200, response.status)
            assert.are.same({ id = 'my_api', description = 'An API server powered by GIN.' }, response.body)
        end)
    end)
end)

Others

GIN also has models and support for multiple databases, and includes a migration engine. Furthermore, it is packed with a lot of other features, such as:

An API Console (which allows you to try out your API from a web interface).
Support for multiple environments.
A client console.

Alright, I want to see it!

GIN main website is gin.io, where you’ll find full documentation, and even a full tutorial on how to Test Drive your Development with GIN. Full code is available on Github.

Would this be worthy more than a simple experiment?

The post GIN: a JSON-API framework in Lua appeared first on ostinelli|net.

A comparison between Misultin, Mochiweb, Cowboy, NodeJS and Tornadoweb

Roberto Ostinelli — Mon, 09 May 2011 03:19:08 +0000

As some of you already know, I’m the author of Misultin, an Erlang HTTP lightweight server library. I’m interested in HTTP servers, I spend quite some time trying them out and am always interested in comparing them from different perspectives.

Today I wanted to try the same benchmark against various HTTP server libraries:

Misultin (Erlang)
Mochiweb (Erlang)
Cowboy (Erlang)
NodeJS (V8)
Tornadoweb (Python)

I’ve chosen these libraries because they are the ones which currently interest me the most. Misultin, obviously since I wrote it; Mochiweb, since it’s a very solid library widely used in production (afaik it has been used or is still used to empower the Facebook Chat, amongst other things); Cowboy, a newly born lib whose programmer is very active in the Erlang community; NodeJS, since bringing javascript to the backend has opened up a new whole world of possibilities (code reusable in frontend, ease of access to various programmers,…); and finally, Tornadoweb, since Python still remains one of my favourites languages out there, and Tornadoweb has been excelling in loads of benchmarks and in production, empowering FriendFeed.

Two main ideas are behind this benchmark. First, I did not want to do a “Hello World” kind of test: we have static servers such as Nginx that wonderfully perform in such tasks. This benchmark needed to address dynamic servers. Second, I wanted sockets to get periodically closed down, since having all the load on a few sockets scarcely correspond to real life situations.

For the latter reason, I decided to use a patched version of HttPerf. It’s a widely known and used benchmark tool from HP, which basically tries to send a desired number of requests out to a server and reports how many of these actually got replied, and how many errors were experienced in the process (together with a variety of other pieces of information). A great thing about HttPerf is that you can set a parameter, called –num-calls, which sets the amount of calls per session (i.e. socket connection) before the socket gets closed by the client. The command issued in these tests was:

httperf --timeout=5 --client=0/1 --server= --port=8080 --uri=/?value=benchmarks --rate= --send-buffer=4096
        --recv-buffer=16384 --num-conns=5000 --num-calls=10

The value of rate has been set incrementally between 100 and 1,200. Since the number of requests/sec = rate * num-calls, the tests were conducted for a desired number of responses/sec incrementing from 1,000 to 12,000. The total number of requests = num-conns * rate, which has therefore been a fixed value of 50,000 along every test iteration.

The test basically asks servers to:

check if a GET variable is set
if the variable is not set, reply with an XML stating the error
if the variable is set, echo it inside an XML

Therefore, what is being tested is:

headers parsing
querystring parsing
string concatenation
sockets implementation

The server is a virtualized up-to-date Ubuntu 10.04 LTS with 2 CPU and 1.5GB of RAM. Its /etc/sysctl.conf file has been tuned with these parameters:

# Maximum TCP Receive Window
net.core.rmem_max = 33554432
# Maximum TCP Send Window
net.core.wmem_max = 33554432
# others
net.ipv4.tcp_rmem = 4096 16384 33554432
net.ipv4.tcp_wmem = 4096 16384 33554432
net.ipv4.tcp_syncookies = 1
# this gives the kernel more memory for tcp which you need with many (100k+) open socket connections
net.ipv4.tcp_mem = 786432 1048576 26777216
net.ipv4.tcp_max_tw_buckets = 360000
net.core.netdev_max_backlog = 2500
vm.min_free_kbytes = 65536
vm.swappiness = 0
net.ipv4.ip_local_port_range = 1024 65535
net.core.somaxconn = 65535

The /etc/security/limits.conf file has been tuned so that ulimit -n is set to 65535 for both hard and soft limits.

Here is the code for the different servers.

Misultin

-module(misultin_bench).
-export([start/1, stop/0, handle_http/1]).

start(Port) ->
    misultin:start_link([{port, Port}, {loop, fun(Req) -> handle_http(Req) end}]).

stop() ->
    misultin:stop().

handle_http(Req) ->
    % get value parameter
    Args = Req:parse_qs(),
    Value = misultin_utility:get_key_value("value", Args),
    case Value of
        undefined ->
            Req:ok([{"Content-Type", "text/xml"}], ["no value specified"]);
        _ ->
            Req:ok([{"Content-Type", "text/xml"}], ["", Value, ""])
    end.

Mochiweb

-module(mochi_bench).
-export([start/1, stop/0, handle_http/1]).

start(Port) ->
    mochiweb_http:start([{port, Port}, {loop, fun(Req) -> handle_http(Req) end}]).

stop() ->
    mochiweb_http:stop().

handle_http(Req) ->
    % get value parameter
    Args = Req:parse_qs(),
    Value = misultin_utility:get_key_value("value", Args),
    case Value of
        undefined ->
            Req:respond({200, [{"Content-Type", "text/xml"}], ["no value specified"]});
        _ ->
            Req:respond({200, [{"Content-Type", "text/xml"}], ["", Value, ""]})
    end.

Note: i’m using misultin_utility:get_key_value/2 function inside this code since proplists:get_value/2 is much slower.

Cowboy

-module(cowboy_bench).
-export([start/1, stop/0]).

start(Port) ->
    application:start(cowboy),
    Dispatch = [
        %% {Host, list({Path, Handler, Opts})}
        {'_', [{'_', cowboy_bench_handler, []}]}
    ],
    %% Name, NbAcceptors, Transport, TransOpts, Protocol, ProtoOpts
    cowboy:start_listener(http, 100,
        cowboy_tcp_transport, [{port, Port}],
        cowboy_http_protocol, [{dispatch, Dispatch}]
    ).

stop() ->
    application:stop(cowboy).

-module(cowboy_bench_handler).
-behaviour(cowboy_http_handler).
-export([init/3, handle/2, terminate/2]).

init({tcp, http}, Req, _Opts) ->
    {ok, Req, undefined_state}.

handle(Req, State) ->
    {ok, Req2} = case cowboy_http_req:qs_val(<<"value">>, Req) of
        {undefined, _} ->
            cowboy_http_req:reply(200, [{<<"Content-Type">>, <<"text/xml">>}], <<"no value specified">>, Req);
        {Value, _} ->
            cowboy_http_req:reply(200, [{<<"Content-Type">>, <<"text/xml">>}], ["", Value, ""], Req)
    end,
    {ok, Req2, State}.

terminate(_Req, _State) ->
    ok.

NodeJS

var http = require('http'), url = require('url');
http.createServer(function(request, response) {
    response.writeHead(200, {"Content-Type":"text/xml"});
    var urlObj = url.parse(request.url, true);
    var value = urlObj.query["value"];
    if (value == ''){
        response.end("no value specified");
    } else {
        response.end("" + value + "");
    }
}).listen(8080);

Tornadoweb

import tornado.ioloop
import tornado.web

class MainHandler(tornado.web.RequestHandler):
    def get(self):
        value = self.get_argument('value', '')
        self.set_header('Content-Type', 'text/xml')
        if value == '':
            self.write("no value specified")
        else:
            self.write("" + value + "")

application = tornado.web.Application([
    (r"/", MainHandler),
])

if __name__ == "__main__":
    application.listen(8080)
    tornado.ioloop.IOLoop.instance().start()

I took this code and run it against:

Misultin 0.7.1 (Erlang R14B02)
Mochiweb 1.5.2 (Erlang R14B02)
Cowboy master 420f5ba (Erlang R14B02)
NodeJS 0.4.7
Tornadoweb 1.2.1 (Python 2.6.5)

All the libraries have been run with the standard settings. Erlang was launched with Kernel Polling enabled, and with SMP disabled so that a single CPU was used by all the libraries.

Test results

The raw printout of HttPerf results that I got can be downloaded from here.

Note: the above graph has a logarithmic Y scale.

According to this, we see that Tornadoweb tops at around 1,500 responses/seconds, NodeJS at 3,000, Mochiweb at 4,850, Cowboy at 8,600 and Misultin at 9,700. While Misultin and Cowboy experience very little or no error at all, the other servers seem to funnel under the load. Please note that “Errors” are timeout errors (over 5 seconds without a reply). Total responses and response times speak for themselves.

I have to say that I’m surprised on these results, to the point I’d like to have feedback on code and methodology, with alternate tests that can be performed. Any input is welcome, and I’m available to update this post and correct eventual errors I’ve made, as an ongoing discussion with whomever wants to contribute.

However, please do refrain from flame wars which are not welcomed here. I have published this post exactly because I was surprised on the results I got.

What is your opinion on all this?

—————————————————–

UPDATE (May 16th, 2011)

Due to the success of these benchmarks I want to stress an important point when you read any of these (including mines).

Benchmarks often are misleading interpreted as “the higher you are on a graph, the best that *lib-of-the-moment-name-here* is at doing everything”. This is absolutely the wrongest way to look at those. I cannot stress this point enough.

‘Fast’ is only 1 of the ‘n’ features you desire from a webserver library: you definitely want to consider stability, features, ease of maintenance, low standard deviation, code usability, community, developments speed, and many other factors whenever choosing the best suited library for your own application. There is no such thing as generic benchmarks. These ones are related to a very specific situation: fast application computational times, loads of connections, and small data transfer.

Therefore, please use this with a grain of salt and do not jump to generic conclusions regarding any of the cited libraries, which as I’ve clearly stated in the beginning of my post I all find interesting and valuable. And I still am very open in being criticized for the described methodology or other things I might have missed.

The post A comparison between Misultin, Mochiweb, Cowboy, NodeJS and Tornadoweb appeared first on ostinelli|net.

Misultin: erlang and websockets

Roberto Ostinelli — Wed, 20 Jan 2010 15:33:12 +0000

Inspired by Joe Armstrong’s post, I’ve recently added websocket support to misultin v0.4, my Erlang library for building fast lightweight HTTP servers.

Basically, websockets allow a two-way asynchronous communication between browser and servers, filling the gap that some technologies such as ajax and comet have tried to fulfill in these recent years. If you want to try this out yourself, you will first need to grab a browser which implements websockets, such as Google Chrome.

The typical html page with javascript code to use websockets is as follows:

Here’s the code to use misultin to handle the requests of this script:

-module(misultin_websocket_example).
-export([start/1, stop/0]).

% start misultin http server
start(Port) ->
    misultin:start_link([{port, Port}, {loop, fun(Req) -> handle_http(Req, Port) end},
    {ws_loop, fun(Ws) -> handle_websocket(Ws) end}]).

% stop misultin
stop() ->
    misultin:stop().

% callback on request received
handle_http(Req, Port) ->    
    % output
    Req:ok([]).

% callback on received websockets data
handle_websocket(Ws) ->
    receive
        {browser, Data} ->
            Ws:send(["received '", Data, "'"]),
            handle_websocket(Ws);
        _Ignore ->
            handle_websocket(Ws)
    after 5000 ->
        Ws:send("pushing!"),
        handle_websocket(Ws)
    end.

handle_websocket/1 is spawned by misultin to handle the connected websockets. Data coming from a browser will be sent to this process and will have the message format {browser, Data}, where Data is a string(). If you need to send data to the browser, you may do so by using the parametrized function Ws:send(Data), Data being a string() or an iolist().

Compile and run the example here above with misultin_websocket_example:start(8080). Then, open up your Chrome (or other websocket compliant browser) and point it to an .html file containing the above code.

You should normally see this being gradually printed on your browser:

Wed Jan 20 2010 15:18:52 GMT+0100 (CET): websocket connected!
Wed Jan 20 2010 15:18:52 GMT+0100 (CET): sent message to server: 'hello server'!
Wed Jan 20 2010 15:18:52 GMT+0100 (CET): server sent the following: 'received 'hello server!''
Wed Jan 20 2010 15:18:57 GMT+0100 (CET): server sent the following: 'pushing!'
Wed Jan 20 2010 15:19:02 GMT+0100 (CET): server sent the following: 'pushing!'
Wed Jan 20 2010 15:19:07 GMT+0100 (CET): server sent the following: 'pushing!'

In normal environments you may consider serving the .html page from misultin directly. You may do so with the following and complete misultin module:

-module(misultin_websocket_example).
-export([start/1, stop/0]).

% start misultin http server
start(Port) ->
   misultin:start_link([{port, Port}, {loop, fun(Req) -> handle_http(Req, Port) end}, {ws_loop, fun(Ws) -> handle_websocket(Ws) end}]).

% stop misultin
stop() ->
   misultin:stop().

% callback on request received
handle_http(Req, Port) ->
   % output
   Req:ok([{"Content-Type", "text/html"}],
   ["
   
      
         
      
      
         
      
   "]).

% callback on received websockets data
handle_websocket(Ws) ->
   receive
      {browser, Data} ->
         Ws:send(["received '", Data, "'"]),
         handle_websocket(Ws);
      _Ignore ->
         handle_websocket(Ws)
   after 5000 ->
      Ws:send("pushing!"),
      handle_websocket(Ws)
   end.

Please note that the Websocket Protocol still is draft. use with caution.

The post Misultin: erlang and websockets appeared first on ostinelli|net.

Misultin library

Roberto Ostinelli — Mon, 27 Jul 2009 12:11:47 +0000

Today I’ve released Misultin (pronounced mee-sul-teen), an Erlang library for building fast lightweight HTTP servers. The first benchmarks are quite satisfying, even though there still is work to do.

Here is the simple code for Misultin’s Hello World.

-module(misultin_hello_world).
-vsn('0.1').
-export([start/1, stop/0, handle_http/1]).

% start misultin http server
start(Port) ->
    misultin:start_link([{port, Port}, {loop, fun(Req) -> handle_http(Req) end}]).

% stop misultin
stop() ->
    misultin:stop().

% callback on request received
handle_http(Req) ->
    Req:ok("Hello World.").

Here’s the code to echo a GET variable in a XML form.

-module(misultin_get_variable).
-vsn('0.1').
-export([start/1, stop/0, handle_http/1]).

% start misultin http server
start(Port) ->
    misultin:start_link([{port, Port}, {loop, fun(Req) -> handle_http(Req) end}]).

% stop misultin
stop() ->
    misultin:stop().

% callback on request received
handle_http(Req) ->
    % get params
    Args = Req:parse_qs(),
    Value = proplists:get_value("value", Args),
    case Value of
        undefined ->
            Req:ok([{"Content-Type", "text/xml"}], "no value specified");
        _ ->
            Req:ok([{"Content-Type", "text/xml"}], "~s", [Value])
    end.

Available also are additional code examples, and the full list of exports.

You may find Misultin, released under the New BSD License, on its project page on Google code.

The post Misultin library appeared first on ostinelli|net.