Search

8/22/2011

NodeJS – The what, why, how and when | Xebia Blog

NodeJS – The what, why, how and when | Xebia Blog

What is NodeJS?

The NodeJS five-word sales pitch from their own website is “Evented I/O for V8 Javascript”. We’ll get to what that means exactly in the How. NodeJS, in a few more words, is a server-side application framework with a focus on high concurrent performance. Applications written for Node run in a single-threaded, event-based process.

Event-based programming approaches this in a different matter. When calling the blocking operation, you pass it an additional argument: a block of code to execute when the operation has completed. Instead of waiting for a reply from the user or the database, the application continues its execution.


Why
Why evented I/O? The short answer: Latency.

What does this mean? Effectively, it means that classic, blocking processes twiddle their thumbs for 41,000,000 CPU cycles while waiting for something from disk to load. Those processes can’t do anything else while they twiddle their thumbs, so in the case of a webserver, other requests get paused until the application has received its data.

The classic solution for this problem: Add moar processes. When one process twiddles its thumbs waiting for data, the other process can take over and handle the next request. With 200 processes available for handling requests, consumers never have to wait long in the queue for their request to be handled by the next freed process.

But it has its limits. Hard limits, even, which have been the source of many outages in the past. What if there’s a big delay on the network for whatever reason? What if there’s more new requests coming in per second than the database can process? The 200 processes each block once they reach the point where they retrieve data from the network, and once they’re all occupied, the webserver simply stops serving requests, users time out, and Twitter trends with messages like “IS #SOMESITE.COM DOWN? #FML”.

How does Node solve this? Well, to put it simply, it doesnt’t twiddle its thumbs. It goes ‘Alright underlying system, give me this information from the network. I’ll go do something useful, lemme know when you’re done aight? Aight.’ And then it proceeds to do something useful, like handle the next request.

It’s the ‘lemme know when you’re done’ that contains the core of NodeJS’s performance. Instead of waiting for something to be retrieved (blocking), it continues to run (non-blocking). No precious CPU cycles get wasted that way waiting.

Because it doesn’t block, there’s also a greatly reduced need for running multiple processes. It just keeps going, regardless of how many requests it has to handle. When it receives a bazillion requests, it doesn’t run out of processes or some other arbitrary limit, it just handles each one, one by one. It handles each event as it’s triggered, one at a time, and it does so very rapidly.

It’s like ordering food. You step up to the counter of your local fast food store, and order a hamburger. Classically, you stand there and wait until the chef is done and the person serving gives you your order, while others queue up neatly behind you. In the multi-threaded approach, the store hires more serving personnel and chefs that can each handle one customer at a time.

In the non-blocking approach, the servant smiles, says “Alright sir, please pick a seat, your order will be brought to you when it’s done. Next!”. And so it continues. The servant passes the order to the kitchen, the kitchen prepares your burger, and a pretty serving girl brings you your hamburger when it’s done. The queue passes the counter quickly, because everything the servant does is take the people’s order and pass it along.

If only fast food stores actually worked like this.

Why Javascript?

First, event handling, callbacks and asynchronous behavior are at its core. It’s been used in this fashion for years in its natural habitat, the browser. Opening a file or network resource and passing that method a callback isn’t that different from triggering an AJAX-call with a callback – in fact, it’s exactly the same. The browser is inherently asynchronous, as pretty much every piece of logic relies on user input, network I/O, or something simpler as a timed event – each of those would block the execution of the script if it was done with blocking operations, and that’s the last thing you want in a user-experience heavy application like a webbrowser.

Event handling is at the core of both Javascript as a language and as a way of thinking for Javascript developers, which is another important reasoning behind using Javascript: It’s familiar to many developers. Javascript is a bit of a ninja amongst programming languages – most people don realize it’s one of the most-used programming languages out there. Its popularity has boosted significantly over the last decade, with the rise of AJAX, highly interactive and responsive web applications, streaming updates, and big companies like Google pushing web applications as a primary platform for all kinds of applications.

Why single-threaded?
The advantages of a single-threaded application is that the processor doesn’t have the overhead of context switching (when one thread gets CPU time while the other’s paused). There’s no heap allocations or forks or startup sequences that need to be done when creating a new thread. Besides that, there’s of course the obvious advantage of not needing to program with concurrency in the back of your head. Less headache, less hard-to-track bugs, less specific knowledge and understanding needed, etc.

But what about modern-day hardware? The mainstream CPU builders don’t build single-core processors anymore, it’s all fancy dual-, quad-, hexa- and octacores these days, with even more cores and special-purpose components to be added in the nearby future.

Well, that’s not a problem. Whilst a Node program is single-threaded, there’s nothing stopping you from creating multiple Node processes and running them side-by-side, each using a single core. Put a loadbalancer or reverse proxy like nginx in front of it , either on the same server or externally, and you’re done. You could even use another Node process as a loadbalancer. Of course, you’d have to be using a back-end storage system that can handle concurrent requests (i.e. from multiple Node processes) in order to be able to do that.

But V8 can also run outside of the browser, and, being an open source project, has been extended to create NodeJS. Node adds a thread pool (using libeio), event loop (libevent), and fancy things like DNS resolving and cryptography.

On top of that are a bunch of Node bindings for I/O (sockets, HTTP, etc).

Finally, a standard library to do pretty much everything you need written in pure Javascript is built on top of that. Have a picture:


when
With that out of the way, here’s a few use cases Node can be used for and can excel at:

Generic web framework. It’s got all the things you need – server-side logic, connectors for back-end systems (like databases), file serving, template parsing (with a wide variety of template languages), authentication, you name it.
Highly concurrent websites – high-volume webservices, varying loads, etc.
Highly concurrent connections – for example, websockets with many clients sending and receiving data.
Back-end systems dealing with files. Example: GitHub’s Nodeload, which prepares Git repositories for download by compressing them into tarballs. It calls git archive, waits, then streams the result back to the user using Node’s stream API’s. (output I/O is also I/O and can benefit from evented I/O). See https://github.com/blog/900-nodeload2-downloads-reloaded

沒有留言: