Why Rails 4 Live Streaming is a big deal

TLDR: Rails Live Streaming allows Rails to compete with Node.js in the streaming arena. Streaming requires application servers to support either multi-threaded or evented I/O. Most Ruby application servers are not up for the job. Phusion Passenger Enterprise 4.0 (a Ruby app server) is to become hybrid multi-processed, multi-threaded and evented. This allows seamless support for streaming, provides excellent backwards compatibility, and allows future support for more use cases than streaming alone.

Several days ago Rails introduced Live Streaming: the ability to send partial responses to the client immediately. This is a big deal because it opens up a huge number of use cases that Rails simply wasn’t suitable for. Rails was and still is an excellent choice for “traditional” web apps where the user sends a request and expects a full response back. It was a bad choice for anything that works with response streams, for example:

  • Progress responses that continuously inform the user about the progress. Imagine a web application that performs heavy calculations that can take several minutes. Before Live Streaming, you had to split this system up into multiple pages that must respond immediately. The main page would offload the actual work into a background worker, and return a response informing the user that the work is now in progress. The user must poll a status page at a regular interval to lookup the progress of the work. With Live Streaming, you can not only simplify the code by streaming progress information in a single request, but also push progress information to the user much more quickly and without polling:
    def big_work
      work = WorkModel.new
      while !work.done?
        work.do_some_calculations
        response.stream.write "Progress: #{work.progress}%\n"
      end
      response.stream.close
    end
    
  • Chat servers. Or, more generally, web apps that involve a large number of mostly idle but persistent connections. Until today this has largely been the domain of evented systems such as Node.js and Erlang.

And as Aaron Patterson has already explained, this feature is different from Rails 3.2’s template streaming.

Just “possible” is not enough

The same functionality was actually already technically possible in Ruby. According to the Rack spec, Rack app objects must return a tuple:

[status_code, headers, body]

Here, body must respond to the each method. You can implement live streaming by yourself, with raw Rack, by returning a body object that yields partial responses in its each method.

class StreamingBody
  def each
    work = WorkModel.new
    while !work.done?
      work.do_some_calculations
      yield "Progress: #{work.progress}%\n"
    end
  end
end

Notice that the syntax is nearly identical to the Rails controller example code. With this, it is possible to implement anything.

However streaming in Ruby has never caught a lot of traction compared to systems such as Node.js. The latter is much more popular for these kind of use cases. I believe this inequality in populairty is caused by a few things:

  1. Awareness. Not everybody knew this was possible in Ruby. Indeed, it is not widely documented.
  2. Ease and support. Some realize this is possible, but chose not to use Ruby because many frameworks do not provide easy support for streaming. It was possible to stream responses in pre-4.0 Rails but the framework code generally does not take streaming into account, so if you try to do anything fancy you run the risk of breaking things.

With Live Streaming, streaming is now easy to use as well as officially supported.

Can Rails compete with Node.js?

Node.js is gaining more and more momentum nowadays. As I see it there are several reasons for this:

  1. Love for JavaScript. Some developers prefer JavaScript over Ruby, for whatever reasons. Some like the idea of using the same language for both frontend and backend (although whether code can be easily shared between frontend and backend remains a controversial topic among developers). Others like the V8 engine for its speed. Indeed, V8 is a very well-optimized engine, much more so than Ruby 1.9’s YARV engine.
  2. Excellent support for high I/O concurrency use cases. Node.js is an evented I/O system, and evented systems can handle a massive amount of concurrent connections. All libraries in the Node.js ecosystem are designed for evented use cases, because there’s no other choice. In other languages you have to specifically look for evented libraries, so the signal-to-noise ratio is much lower.

I have to be careful here: the phrases “high I/O concurrency” and “massive ammount of concurrent connections” deserve more explanation because it’s easy to confuse them with “uber fast” or “massively scalable”. That is not what I meant. What I meant is, a single Node.js process is capable of handling a lot of client sockets, assuming that any work you perform does not saturate memory, CPU or bandwidth. In contrast, Ruby systems traditionally could only handle 1 concurrent request per process, even you don’t do much work inside a request. We call this a multi-process I/O model because the amount of concurrent users (I/O) the system can handle scales only with the number of processes.

In traditional web apps that send back full responses, this is not a problem because the web server queues all requests, the processes respond as quickly as possible (usually saturating the CPU) and the web server buffers all responses and relieves the processes immediately. In streaming use cases, you have long-running requests so the aforementioned mechanism of letting the web server buffer responses is simply not going to work. You need more I/O concurrency: either you must have more processes, or processes must be able to handle more than 1 request simultaneously. Node.js processes can effectively handle an unlimited number of requests simultaneously, when not considering any constraints posed by CPU, memory or bandwidth.

Node.js is more than HTTP. It allows arbitrary networking with TCP and UDP. Rails is pretty much only for HTTP and even support for WebSockets is dubious, even in raw Rack. It cannot (and I believe, should not) compete with Node.js on everything, but still… Now suddenly, Rails can compete with Node.js for a large number of use cases.

Two sides of the coin

Reality is actually a bit more complicated than this. Although Rails can handle streaming responses now, not all Ruby application servers can. Ilya Grigorik described this problem in his article Rails Performance Needs an Overhaul and criticized Phusion Passenger, Mongrel and Unicorn for being purely multi-process, and thus not able to support high concurrency I/O use cases. (Side note: I believe the article’s title was poorly chosen; it criticizes I/O concurrency support, not performance.)

Mongrel’s current maintenance status appears to be in limbo. Unicorn is well-maintained, but its author Eric Wong has explicitly stated in his philosophy that Unicorn is to remain a purely multi-processed application server, with no intention to ever become multithreaded or evented. Unicorn is explicitly designed to handle fast responses only (so no streaming responses).

At the time Ilya Grigorik’s article was written, Thin was the only application server that was able to support high I/O concurrency use cases. Built on EventMachine, Thin is evented, just like Node.js. Since then, another evented application server called Goliath has appeared, also built on EventMachine. However, evented servers require evented application code, and Rails is clearly not evented.

There have been attempts to make serial-looking code evented through the use of Ruby 1.9 fibers, e.g. through the em-synchrony gem, but in my opinion fibers cause more problems than they solve. Ruby 1.8’s green threading model was essentially already like fibers: there was only one OS thread, and the Ruby green thread scheduler switches context upon encountering a blocking I/O operation. Fibers also operate within a single OS thread, but you can only context switch with explicit calls. In other words, you have to go through each and every blocking I/O operation you perform and insert fiber context switching logic, which Ruby 1.8 already did for you. Worse, fibers give the illusion of thread safetiness, while in reality you can run into the same concurrency problems as with threading. But this time, you cannot easily apply locks to prevent unwanted context switching. Unless the entire ecosystem is designed around fibers, I believe evented servers + fibers only remains useful for a small number of use cases where you have tight control over the application code environment.

There is another way to support high I/O concurrency though: multi-threading, with 1 thread per connection. Multi-threaded systems generally do not support as much concurrent I/O as evented system, but are still quite formidable. Multi-threaded systems are limited by things such as the thread stack size, the available virtual memory address space and the quality of the kernel scheduler. But with the right tweaking they can approach the scalability of evented systems.

And so this leaves multithreaded servers as the only serious options for handling streaming support in Rails apps. It’s very easy to make Rails and most other apps work on them. Puma has recently appeared as a server in this category. Like most other Ruby application servers, you have to start Puma at least once for every web app, and each Puma instance is to be attached to a frontend web server in a reverse proxy setup. Because Ruby 1.9 has a Global Interpreter Lock, you should start more than 1 Puma process if you want to take advantage of multiple cores. Or you can use Rubinius, which does not have a Global Interpreter Lock.

And what about Phusion Passenger?

Towards a hybrid multi-processed, multi-threaded and evented application server

To recap, each I/O model – multi-process, multi-threaded, evented – has its own pros and drawbacks:

  • Multi-process
    • Pros:
      • Excellent application compatibility.
      • Lack of threading avoids concurrency bugs (e.g. race conditions) created by threading.
      • Simple and easy to understand. If one process crashes, the others are not affected.
      • Can utilize multiple cores.
    • Cons:
      • Supports very low I/O concurrency.
      • Uses a lot of memory.
  • Multi-threaded
    • Considerations:
      • Not as compatible as multi-process, although still quite good. Many libraries and frameworks support threaded environments these days. In web apps, it’s generally not too hard to make your own code thread-safe because web apps tend to be inherently embarrassingly parallel.
      • Can normally utilize multiple cores in a single process, but not in MRI Ruby. You can get around this by using JRuby or Rubinius.
    • Pros:
      • Supports high I/O concurrency.
      • Threads use less memory than processes.
    • Cons:
      • If a thread crashes, the entire process goes down.
      • Good luck debugging concurrency bugs.
  • Evented
    • Pros:
      • Extremely high I/O concurrency.
      • Uses even less memory than threads.
    • Cons:
      • Bad application compatibility. Most libraries are not designed for evented systems at all. Your application itself has to be aware of events for this to work properly.
      • If your app/libraries are evented, then you can still run into concurrency bugs like race conditions. It’s easier to avoid them in an evented system than in a threaded system, but when they do occur they are very difficult to debug.
      • Cannot utilize multiple cores in a single process.

As mentioned before, Phusion Passenger is currently a purely multi-processed application server. If we want to change its I/O model, which one should we choose? We believe the best answer is: all of them. We can give users a choice, and let them chose – on a per-application basis – which I/O model they want.

Phusion Passenger Enterprise 4.x (which we introduced earlier) is to become a hybrid multi-processed, multi-threaded and evented application server. You can choose with a single configuration option whether you want to stay with the traditional multi-processed I/O model, whether you want multiple threads in a single process, or whether you want processes to be evented. In the latter two cases, you even control how many processes you want, in order to take advantage of multiple cores and for resistance against crashes. We believe a combination of processes and threads/events are best.

Being a hybrid server with configurable I/O model allows Phusion Passenger to support more than just streaming. Suddenly, the possibilities become endless. We could for example support arbitrary TCP protocols in the future with no limits on traffic workloads.

phusion

 

Code has just landed in the Phusion Passenger Enterprise 4.0 branch to support multithreading. Note that the current Phusion Passenger Enterprise release is of the 3.0.x series and does not support this yet. As you can see in our roadmap, Phusion Passenger Enterprise 4.0 beta will follow 3.0.x very soon.