Node JS and Binary Data

UPDATE: None of this should be necessary, as FileReadStream in the latest node uses buffers by default. However, it appears that either I'm doing something wrong or the docs are out of date, as it doesn't work that way on node HEAD.

Two areas where the exclenned Node.js's sadly lacks is the handling of binary data and large strings. In this post I'd like to go over some techniques for dealing with binary data in node, most of which revolves around dealing with V8's garbage collector, and the fact that strings in node are not made for binary data, they're made for UTF-8 and UTF-16 data.

There are three main gory details that make working with data in Node.js a pain:

  1. Large Strings (> ~64K) are not your friend.
  2. Binary (and ASCII) data in a node string are stored as the first byte of a UTF-16 string.
  3. Binary data can be most efficiently stored in Node.js as a Buffer

Lets look at the first item, big strings aren't your friend. Node.js creator ry himself tackled this issue himself in a performance comparison he made with nginx. If you view the pdf, (or look at the extracted chart below) you'll see that node does a decent job keeping pace with nginx up until the 64 byte mark hits, then performance just falls apart. The reason, in ry's words:

V8 has a generational garbage collector [which] moves objects around randomly. Node can’t get a pointer to raw string data to write to socket.

You can see this in the relevant graph in ryan's slides, which I've conveniently extracted and posted below (I hope you don't mind Ryan).

 

What wasn't immediately obvious to me after reading this was what this meant in cases where one was using node to pass around large bits of binary data that come in as strings. if you use node to say, read from the file system you get back a binary string, not a buffer. My question was: "If I have binary data already stuck in a lousy UTF-16 string, but then stick it in buffer before sending it out, will that help with speed?." The answer an increase in throughput from 100 MiB/Sec to 160 MiB/Sec.

 

Check out the graph below from my own performance tests, where I played with different readChunk sizes (how much data the FileReadStream reads at once and buffer sizes (How much data we store in a buffer before flushing to a socket):

As you can see performance using buffers (Buf) beats the pants off writes using strings (Str). The difference between the two pieces of code can be seen below. I initially didn't think that doing this conversion would help at all, I figured once it was already in a string (as data from a FileReadStream is), one may as well flush it to the socket and continue on. This makes me wonder if other apps would also be best off accumulating their output (perhaps even their UTF-8 output) in a Buffer where possible, then finally flushing the Buffer, instead of making repeat calls to res.write. Someone needs to test this. Additionally, this makes me wonder if further improvements to my own test case could be improved if the node FileReadStream object was modified to return a Buffer rather than a string.

Additionaly, you may be asking about using a larger bufSize than readChunk size, which I did indeed test, but found there was not much of a difference when using a larger buffer, so the optimal strategy really does seem to be reading a 64KiB chunk into a 64KiB buffer. You can see this data at the bottom of the post.

In the data I graphed above, I made a number of runs with `ab -c 100 -n 1000` against 1 MiB file changing the chunkSize and readSize. Relevant sample code can be seen below. The full sample code would be my fork of node-paperboy

 

The full performance data is available below:

 

(download)

 

Node and process.nextTick

Coding in Node, or any evented setting, often requires different ways of thinking. The primary difference being that you have to keep track of when the code you're writing will run. I've really just started writing code in Node, below is a clarification of a couple things I found initially confusing. As an example:

This code will print out "BYE" before "HI". While initially counter-intuitive, you can actually leverage this to make more readable code. Below is an example from my fork of node-paperboy. As you can see, #deliver returns a delegate, which lets us set up our callbacks.

The interesting part about this is that after all our delegates are setup there's no need to call a method to says we're done adding methods to the delegate, and that its free to run and deliver the file. Looking at the implementation of #deliver we can get a little more information about how this works:

That's an incomplete portion of #deliver, a good chunk of it has been omitted for brevity, the most important part here is process.nextTick, everything within the anonymous function nextTick uses gets deferred until the next tick of the clock, somewhat similarly (though more efficiently) than `setTimeout(function() {}, 0);` . This allows us to return our delegate after this has been setup, to allow the user to set the callbacks via method calls on the delegate object. In this example, after the anonymous function passed to http.createServer is done executing will the next tick occur.

An important thing to remember is that you often don't need nextTick if you're performing operations that are guaranteed to run on the next tick. Anything wrapped inside an async request like fs.stat or an http.Client request will end up running on the next tick. The only reason that process.nextTick was explicitly required here was due to the synchronous check `if (fpErr) {...}`, the rest of the code runs wrapped inside of fs.stat, which is an async call.

Node events are in some ways similar to delegates in how they're defined, if you're interested, I recommend taking a look at the implementation for streamFile, as an example of how these are used.

Coding with node can be twisted (pun intended) but if you need the benefits an evented framework provides and you work with, not against, it isn't half bad.

Node File Read Performance

In my last post, Node.js Can Really Scale, I demonstrated the impressive scalability of Node.js, and more generally, the scalability of the evented model, whereby increasing concurrency barely affects total throughput. You may have noticed however, that 1000 reqs took about 17 secs giving us 58 reqs/second. That's pretty bad, but this is mostly reflective of node's slowness when transferring in binary mode. Apparently the same test today runs twice as fast, but more interestingly, performance was further dramatically improved by reducing the chunk size of reads from ~500 KiB (the full file size) to the default size of 4 KiB. The larger chunk size took 44% more time, I'm using fs.createReadStream to perform the reads, and on each read the data is written out to the client. The smaller chunk size means that for that 500 KiB, res.write() gets called 125 more times, yet that doesn't even seem to matter. That means, in terms of total throughput, I was able to reach ~ 150 res/second, or ~10 MiB / Second, a pretty decent improvement. While a big improvement from previous numbers, node really is not a speed demon when it comes to serving files.

 
I'm not exactly sure why this is faster, but I have a hunch that this has to do with the fact that binary data in Node.js (and V8) needs to be represented in UTF-16, and/or the fact that string concatenation is worse than O(n), I'm not sure what it is in V8, but I would have to guess that flushing data more frequently into the kernel buffer (which I'd guess from my horribly limited and outdated and limited kernel networking knowledge is O(n)) is vastly more efficient. I've got no idea how efficient UTF-16 conversion is, but it doesn't sound fast at all.
 
Strangely, those same benchmarks I ran yesterday seem to be running twice as fast in terms of throughput, and I don't know why, since I changed nothing, and the server was unloaded. The 44% improvement is still present when I change the chunk size, but there must have been some unknown factor I missed.

Node.js Really Can Scale

EDIT: The concurrency #s are accurate but in terms of speed, its actually faster by 150%, see this post for more info.

 

Node JS really does scale, check out the following graph of performance for 1000 requests on an app I recently wrote for work (all times are in milliseconds, with a total of 1000 reqs). 

Each request has 1 memcached call and then a 500 kb file read (these happen in serial), which is then written to the socket. This is on a 2.33 Ghz xeon w/ 4 gigs of ram, unloaded, running ubuntu Karmic. The file is loaded in the OS cache since it gets hit so often, so HD performance doesn't affect this. I had to stop after 500 connections because node won't open 500 file descriptors at a time. The file sending was handled by my fork of node-paperboy.
 
This app always pegged a single core (node's evented design doesn't use SMP capabilities), I'd have to think that if you ran one node per-core you'd get even better performance, hopefully I'll have time later to setup haproxy as a load balancer and try this.