UPDATE: None of this should be necessary, as FileReadStream in the latest node uses buffers by default. However, it appears that either I'm doing something wrong or the docs are out of date, as it doesn't work that way on node HEAD.
Two areas where the exclenned Node.js's sadly lacks is the handling of binary data and large strings. In this post I'd like to go over some techniques for dealing with binary data in node, most of which revolves around dealing with V8's garbage collector, and the fact that strings in node are not made for binary data, they're made for UTF-8 and UTF-16 data.
There are three main gory details that make working with data in Node.js a pain:
Lets look at the first item, big strings aren't your friend. Node.js creator ry himself tackled this issue himself in a performance comparison he made with nginx. If you view the pdf, (or look at the extracted chart below) you'll see that node does a decent job keeping pace with nginx up until the 64 byte mark hits, then performance just falls apart. The reason, in ry's words:
V8 has a generational garbage collector [which] moves objects around randomly. Node can’t get a pointer to raw string data to write to socket.
You can see this in the relevant graph in ryan's slides, which I've conveniently extracted and posted below (I hope you don't mind Ryan).
What wasn't immediately obvious to me after reading this was what this meant in cases where one was using node to pass around large bits of binary data that come in as strings. if you use node to say, read from the file system you get back a binary string, not a buffer. My question was: "If I have binary data already stuck in a lousy UTF-16 string, but then stick it in buffer before sending it out, will that help with speed?." The answer an increase in throughput from 100 MiB/Sec to 160 MiB/Sec.
Check out the graph below from my own performance tests, where I played with different readChunk sizes (how much data the FileReadStream reads at once and buffer sizes (How much data we store in a buffer before flushing to a socket):
As you can see performance using buffers (Buf) beats the pants off writes using strings (Str). The difference between the two pieces of code can be seen below. I initially didn't think that doing this conversion would help at all, I figured once it was already in a string (as data from a FileReadStream is), one may as well flush it to the socket and continue on. This makes me wonder if other apps would also be best off accumulating their output (perhaps even their UTF-8 output) in a Buffer where possible, then finally flushing the Buffer, instead of making repeat calls to res.write. Someone needs to test this. Additionally, this makes me wonder if further improvements to my own test case could be improved if the node FileReadStream object was modified to return a Buffer rather than a string.
Additionaly, you may be asking about using a larger bufSize than readChunk size, which I did indeed test, but found there was not much of a difference when using a larger buffer, so the optimal strategy really does seem to be reading a 64KiB chunk into a 64KiB buffer. You can see this data at the bottom of the post.
In the data I graphed above, I made a number of runs with `ab -c 100 -n 1000` against 1 MiB file changing the chunkSize and readSize. Relevant sample code can be seen below. The full sample code would be my fork of node-paperboy.
The full performance data is available below:
Today I wrote a Node JS logging module, node-streamlogger.
It supports multiple log levels and multiple log output files, as well as reopening the logfiles allowing you to rotate them.
Coding in Node, or any evented setting, often requires different ways of thinking. The primary difference being that you have to keep track of when the code you're writing will run. I've really just started writing code in Node, below is a clarification of a couple things I found initially confusing. As an example:
In my last post, Node.js Can Really Scale, I demonstrated the impressive scalability of Node.js, and more generally, the scalability of the evented model, whereby increasing concurrency barely affects total throughput. You may have noticed however, that 1000 reqs took about 17 secs giving us 58 reqs/second. That's pretty bad, but this is mostly reflective of node's slowness when transferring in binary mode. Apparently the same test today runs twice as fast, but more interestingly, performance was further dramatically improved by reducing the chunk size of reads from ~500 KiB (the full file size) to the default size of 4 KiB. The larger chunk size took 44% more time, I'm using fs.createReadStream to perform the reads, and on each read the data is written out to the client. The smaller chunk size means that for that 500 KiB, res.write() gets called 125 more times, yet that doesn't even seem to matter. That means, in terms of total throughput, I was able to reach ~ 150 res/second, or ~10 MiB / Second, a pretty decent improvement. While a big improvement from previous numbers, node really is not a speed demon when it comes to serving files.
EDIT: The concurrency #s are accurate but in terms of speed, its actually faster by 150%, see this post for more info.
Node JS really does scale, check out the following graph of performance for 1000 requests on an app I recently wrote for work (all times are in milliseconds, with a total of 1000 reqs).
I just started getting into Node.js for a project at work, and did a significant reworking of a node static file server called Paperboy. check out my fork of paperboy over at Github.