Search

3/02/2011

Understanding the node.js event loop « Mixu's tech blog

Understanding the node.js event loop « Mixu's tech blog

So the largest waste with current programming technologies comes from waiting for I/O to complete. There are several ways in which one can deal with the performance impact (from Sam Rushing):

* synchronous: you handle one request at a time, each in turn. pros: simple cons: any one request can hold up all the other requests
* fork a new process: you start a new process to handle each request. pros: easy cons: does not scale well, hundreds of connections means hundreds of processes. fork() is the Unix programmer’s hammer. Because it’s available, every problem looks like a nail. It’s usually overkill
* threads: start a new thread to handle each request. pros: easy, and kinder to the kernel than using fork, since threads usually have much less overhead cons: your machine may not have threads, and threaded programming can get very complicated very fast, with worries about controlling access to shared resources.
The second basis thesis is that thread-per-connection is memory-expensive: [e.g. that graph everyone showns about Apache sucking up memory compared to Nginx]

Apache is multithreaded: it spawns a thread per request (or process, it depends on the conf). You can see how that overhead eats up memory as the number of concurrent connections increases and more threads are needed to serve multiple simulataneous clients. Nginx and Node.js are not multithreaded, because threads and processes carry a heavy memory cost. They are single-threaded, but event-based. This eliminates the overhead created by thousands of threads/processes by handling many connections in a single thread.

An event loop is “an entity that handles and processes external events and converts them into callback invocations”. So I/O calls are the points at which Node.js can switch from one request to another. At an I/O call, your code saves the callback and returns control to the node.js runtime environment. The callback will be called later when the data actually is available.

Of course, on the backend, there are threads and processes for DB access and process execution. However, these are not explicitly exposed to your code, so you can’t worry about them other than by knowing that I/O interactions e.g. with the database, or with other processes will be asynchronous from the perspective of each request since the results from those threads are returned via the event loop to your code. Compared to the Apache model, there are a lot less threads and thread overhead, since threads aren’t needed for each connection; just when you absolutely positively must have something else running in parallel and even then the management is handled by Node.js.

Other than I/O calls, Node.js expects that all requests return quickly; e.g. CPU-intensive work should be split off to another process with which you can interact as with events, or by using an abstraction like WebWorkers. This (obviously) means that you can’t parallelize your code without another thread in the background with which you interact via events. Basically, all objects which emit events (e.g. are instances of EventEmitter) support asynchronous evented interaction and you can interact with blocking code in this manner e.g. using files, sockets or child processes all of which are EventEmitters in Node.js. Multicore can be done using this approach; see also: node-http-proxy.

Internal implementation

Internally, node.js relies on libev to provide the event loop, which is supplemented by libeio which uses pooled threads to provide asynchronous I/O. To learn even more, have a look at the libev documentation.


http://creationix.com/jsconf.pdf

沒有留言: