Posts

Ruby’s EventMachine – Part 2 : Asynchronous != Faster

In this post I will look synchronous vs asynchronous programming with Ruby’s EventMachine, to show that asynchronous does not always mean that your code will run faster.

In part 1 of this series on Ruby’s EventMachine I discussed the benefits of event-based programming in general. I am a big fan of event-based programming, as you will see in these posts, but I wanted to flip the coin over and look at one of the down-sides of event-based programming.

The Cost

Managing events does not come for free. There is overhead of wrapping code in callbacks, stashing context, queuing events, deleting events, managing timer events and communication the the operating system.

Exhibit A

With the example I am using for this post, we talk over a TCP network connection and perform a high number of transactions in a short period of time. If you read part 1, then you will know that this sounds like an ideal use-case for EventMachine to really shine.

The example I am going to use is Memcached. Memcached is fast. If you have a low-latency network connection to Memached, then it is really fast.

Memcached, as its name implies, runs in memory, so the only thing that is going to slow it down is it being overwhelmed with network requests or some inefficiency in its algorithms (which are CPU bound). Personally, I have never hit the upper-bound of either of these, as there is always something else in the my architecture which croaks first.

The Test

I wrote a test to see if asynchronous memcached communication, using EventMachine, or synchronous memcached communication, using the memcached gem, would be faster.

Here is the code…

require 'eventmachine' # for async
require 'memcached'    # for sync
require 'benchmark'

DEBUG = false
TEST_SIZE = 100_000

def debug msg
  if DEBUG
    $stderr.puts msg
  end
end

def async
  EM.run do
    cache = EventMachine::Protocols::Memcache.connect 'localhost', 11211
    debug "sending SET requests..."
    (1..TEST_SIZE).each do |n|
      cache.set "key#{n}", "value#{n}" do
        debug "  SET key#{n} complete"
      end
    end
    debug "SET requests sent"
    debug "sending GET requests..."
    (1..TEST_SIZE).each do |n|
      cache.get "key#{n}" do |value|
        debug "  GET key#{n} = #{value} complete"
      end
    end
    debug "GET requests sent"
    debug "sending DEL requests..."
    (1..TEST_SIZE).each do |n|
      cache.del("key#{n}") do
        debug "  DEL key#{n} complete"
        if n == TEST_SIZE
          EM.stop
        end
      end
    end
    debug "DEL requests sent"
  end
end

def sync
  cache = Memcached.new("localhost:11211")
  debug "sending SET requests..."
  (1..TEST_SIZE).each do |n|
    cache.set "key#{n}", "value#{n}"
    debug "  SET key#{n} complete"
  end
  debug "SET requests sent"
  debug "sending GET requests..."
  (1..TEST_SIZE).each do |n|
    value = cache.get "key#{n}"
    debug "  GET key#{n} = #{value} complete"
  end
  debug "GET requests sent"
  debug "sending DEL requests..."
  (1..TEST_SIZE).each do |n|
    cache.delete("key#{n}")
    debug "  DEL key#{n} complete"
  end
  debug "DEL requests sent"
end

puts Benchmark.measure { puts "sync:";  sync  }
puts Benchmark.measure { puts "async:"; async }

The Results

We are using benchmark to measure the time taken.

$ ruby memcached.rb
sync:
  4.170000   4.360000   8.530000 ( 17.299242)
async:
 32.150000   0.990000  33.140000 ( 33.246160)

This report shows the user CPU time, system CPU time, the sum of the user and system CPU times, and the elapsed real time. The unit of time is seconds. (from Benchmark docs)

So we can see from the above that EventMachine-based version took about twice as long to run, took 8 times as much user CPU time and over 4 times as much system CPU time. That is quite significant.

Avoid EventMachine With Memcached?

This test needs to be put in context. The test was being run on same machine as Memcached, so the network latency was extremely low. EventMachine was not being used for anything else and this script had no other tasks to perform in-between sending requests to Memcached and receiving the responses, so blocking was not an issue.

I could benchmark this and conclude that synchronous Memcached usage was the way to go. I would then roll it out to production, where Memcached is running on a different machine in a different data-center (please do not do this), and the latency would kill this synchronous script. Where you have latency and many requests, asynchronous event-based programming is usually going to win.

Therefore, if the context is such that this kind of synchronous model works better for you, performance is important and you can be sure that things are not going to change, then maybe it is worth consideration.

Suck It Up

Nearly everything application I write now is event-based. I use EventMachine in Ruby and Tornado’s io_loop in Python. I write high performance code and do everything non-blocking, because I do not want anything to halt my event-loop, ever.

I will gladly take a little overhead and fire up a new process or a new machine, if necessary, if it means that an external service like Memcached will not bring my event-loop to a halt when it has issues. It may be fast now, but one network glitch or Memcached crash may render my event-loop defunct. I might go from processing 10k requests per second to processing 1 per second, if I have a timeout of 1 second on one blocking network connection. So, yes, I will gladly suck up this asynchronous overhead in the short-term to protect from [expected] unexpected issues in the future.

Can EventMachine Be Faster?

I am a believer that anything can be better, faster, stronger. EventMachine is already heavily written in C++, which itself is a clear sign that its operation is CPU-bound.

There is a pure ruby implementation of EventMachine. You can play with this to compare the performance of the C++ implementation. In basic tests, you unlikely see a difference. The general benefit you get with an event-based system, when dealing with latency in disk and network I/O, far out-weighs the overhead of the event-system. It is only when you start to hit it extremely hard you will see the differences.

A faster EventMachine would be great, but it will make little difference to you when comparing with asynchronous code. You can never escape the overhead that asynchronous code adds. Therefore, synchronous code will continue to much look much faster in examples like the one above.

Event-based programming enables your application to utilize 100% of the cpu, because anything not cpu-bound can be passed off to the operating system. Therefore, if our code-base is truly event-based, we would only see the benefit of a faster EventMachine once we hit 100% utilization of the CPU.

Side note: I have hit some bugs when using the pure-ruby implementation in my code and the EventMachine test-suite was not passing for me when trying to use pure-ruby. The test-suite is now passing with the latest HEAD of the git repository, so these might have been a temporary issues, but it highlighted to me the order of priority for C++ vs pure-ruby implementations.

Resources

Ruby’s EventMachine

Other posts in this series

Comments

  1. yuan

    great! how can I integrate eventmachine to my rails apps?

  2. Ludovic Henry

    What are the results if you use epoll instead of select? Thank you.

  3. Phil Pirozhkov

    It’s a very good example that EM is still far from being perfect.
    This is mostly due to the poor memcached protocol implementation, which adds all these callbacks to an array and is poping it one by one.

    There’s a problem in test itself, you should consider that in evented style you never know which operation finishes first, and you cannot be sure that DEL is sent after SET and GET. So this should be better:

    require 'em-synchrony'
    require 'em-synchrony/em-memcache'
    def async2
      EventMachine.synchrony do
        cache = EM::P::Memcache.connect
        TEST_SIZE.times do |n|
          cache.set "key#{n}", "value#{n}"
        end
        TEST_SIZE.times do |n|
          value = cache.get "key#{n}"
        end
        TEST_SIZE.times do |n|
          cache.delete("key#{n}")
        end
        EventMachine.stop
      end
    end
    puts Benchmark.measure { puts "async:"; async2 }
    

    However, the overhead is the same.

    1. Phil Pirozhkov

      Forgot to mention that on my machine sync is only 50%-100% faster than async given your benchmark. (2 cores, linux 3.5.4).

  4. Jarmo Pertman

    You could also try Benchmark.bmbm instead of .measure to see if there’s any effect when having also the rehearsal phase.

  5. Nicolás

    Greate post! Waiting for part 3!!!