|
1 | 1 | # crypto-async
|
2 |
| -Native Cipher, Hash, and HMAC operations executed in Node's threadpool for multi-core throughput. |
| 2 | +Native Cipher, Hash, and HMAC operations executed in Node's threadpool for |
| 3 | +multi-core throughput. |
3 | 4 |
|
4 | 5 | ## Motivation
|
5 | 6 | #### Some issues with parts of the `crypto` module
|
6 |
| -* `crypto` cipher, hash and hmac streams are not really asynchronous. They execute in C++, but only in the main thread and so they still block the event loop. Encrypting 64 MB of data might block the event loop for +/- 70ms. Hashing 64 MB of data might block the event loop for +/- 190ms. |
7 |
| -* These `crypto` operations do not take advantage of multiple CPU cores. Your server may have 4 cores available but `crypto` will use only 1 of these 4 cores for all encrypting and hashing operations. |
8 |
| -* These `crypto` operations were not designed to use statically allocated buffers. They allocate a new output buffer when encrypting or hashing data, even if you already have an output buffer available. If you want to hash only a portion of a buffer you must first create a slice. Thousands of JS object allocations put unnecessary strain on the GC. This in turn leads to longer GC pauses which also block the event loop. |
9 |
| -* These `crypto` operations require multiple roundtrips between JS and C++ even if you are only encrypting or hashing a single buffer. |
10 |
| -* These `crypto` operations are not suitable for high-throughput network protocols or filesystems which need to checksum and encrypt/decrypt large amounts of data. Such a user-space network protocol or filesystem using `crypto` might actually saturate a single CPU core with crypto operations before saturating a fast local network or SSD disk. |
| 7 | +* `crypto` cipher, hash and hmac streams are not really asynchronous. They |
| 8 | +execute in C++, but only in the main thread and so they still block the event |
| 9 | +loop. Encrypting 64 MB of data might block the event loop for +/- 70ms. Hashing |
| 10 | +64 MB of data might block the event loop for +/- 190ms. |
| 11 | +* These `crypto` operations do not take advantage of multiple CPU cores. Your |
| 12 | +server may have 4 cores available but `crypto` will use only 1 of these 4 cores |
| 13 | +for all encrypting and hashing operations. |
| 14 | +* These `crypto` operations were not designed to use statically allocated |
| 15 | +buffers. They allocate a new output buffer when encrypting or hashing data, even |
| 16 | +if you already have an output buffer available. If you want to hash only a |
| 17 | +portion of a buffer you must first create a slice. Thousands of JS object |
| 18 | +allocations put unnecessary strain on the GC. This in turn leads to longer GC |
| 19 | +pauses which also block the event loop. |
| 20 | +* These `crypto` operations require multiple roundtrips between JS and C++ even |
| 21 | +if you are only encrypting or hashing a single buffer. |
| 22 | +* These `crypto` operations are not suitable for high-throughput network |
| 23 | +protocols or filesystems which need to checksum and encrypt/decrypt large |
| 24 | +amounts of data. Such a user-space network protocol or filesystem using `crypto` |
| 25 | +might actually saturate a single CPU core with crypto operations before |
| 26 | +saturating a fast local network or SSD disk. |
11 | 27 |
|
12 | 28 | #### Some new ideas with the `crypto-async` module
|
13 |
| -* Truly asynchronous. All calls execute asynchronously in the `node.js` threadpool. This keeps the main thread and event loop free without blocking. |
14 |
| -* Scalable across multiple CPU cores. While `crypto-async` is a fraction slower per call than `crypto` (possibly because of the overhead of interacting with the threadpool), for buffers larger than 1024 bytes it shines and provides N-cores more throughput. `crypto-async` achieves up to 3x more throughput compared to `crypto`. |
15 |
| -* Zero-copy. All keys, ivs, source and target arguments can be passed directly using offsets into existing buffers, without requiring any slices and without allocating any temporary output buffers. This enables predictable memory usage for programs with tight memory budgets. |
16 |
| -* Designed to support the common use-case of encrypting or hashing a single buffer, where memory is adequate and buffers are already in memory. This avoids multiple round-trips between JS and C++. |
17 |
| -* Separates the control plane and the data plane to enable high-throughput applications. |
| 29 | +* Truly asynchronous. All calls execute asynchronously in the `node.js` |
| 30 | +threadpool. This keeps the main thread and event loop free without blocking. |
| 31 | +* Scalable across multiple CPU cores. While `crypto-async` is a fraction slower |
| 32 | +per call than `crypto` (possibly because of the overhead of interacting with the |
| 33 | +threadpool), for buffers larger than 1024 bytes it shines and provides N-cores |
| 34 | +more throughput. `crypto-async` achieves up to 3x more throughput compared to |
| 35 | +`crypto`. |
| 36 | +* Zero-copy. All keys, ivs, source and target arguments can be passed directly |
| 37 | +using offsets into existing buffers, without requiring any slices and without |
| 38 | +allocating any temporary output buffers. This enables predictable memory usage |
| 39 | +for programs with tight memory budgets. |
| 40 | +* Designed to support the common use-case of encrypting or hashing a single |
| 41 | +buffer, where memory is adequate and buffers are already in memory. This avoids |
| 42 | +multiple round-trips between JS and C++. |
| 43 | +* Separates the control plane and the data plane to enable high-throughput |
| 44 | +applications. |
18 | 45 |
|
19 | 46 | ## Performance
|
20 | 47 | ```
|
@@ -100,15 +127,33 @@ npm install crypto-async
|
100 | 127 | ## Usage
|
101 | 128 |
|
102 | 129 | #### Adjust threadpool size and control concurrency
|
103 |
| -Node runs filesystem and DNS operations in the threadpool. The threadpool consists of 4 threads by default. This means that at most 4 operations can be running at any point in time. If any operation is slow to complete, it will cause head-of-line blocking. The size of the threadpool should therefore be increased at startup time (at the top of your script, before requiring any modules) by setting the `UV_THREADPOOL_SIZE` environment variable (the absolute maximum is 128 threads, which requires only ~1 MB memory in total according to the [libuv docs](http://docs.libuv.org/en/v1.x/threadpool.html)). |
104 |
| - |
105 |
| -Conventional wisdom would set the number of threads to the number of CPU cores, but most operations running in the threadpool are not run hot, they are not CPU-intensive and block mostly on IO. Issuing more IO operations than there are CPU cores will increase throughput and will decrease latency per operation by decreasing queueing time. On the other hand, `crypto-async` operations are CPU-intensive. Issuing more `crypto-async` operations than there are CPU cores will not increase throughput and will increase latency per operation by increasing queueing time. |
| 130 | +Node runs filesystem and DNS operations in the threadpool. The threadpool |
| 131 | +consists of 4 threads by default. This means that at most 4 operations can be |
| 132 | +running at any point in time. If any operation is slow to complete, it will |
| 133 | +cause head-of-line blocking. The size of the threadpool should therefore be |
| 134 | +increased at startup time (at the top of your script, before requiring any |
| 135 | +modules) by setting the `UV_THREADPOOL_SIZE` environment variable (the absolute |
| 136 | +maximum is 128 threads, which requires only ~1 MB memory in total according to |
| 137 | +the [libuv docs](http://docs.libuv.org/en/v1.x/threadpool.html)). |
| 138 | + |
| 139 | +Conventional wisdom would set the number of threads to the number of CPU cores, |
| 140 | +but most operations running in the threadpool are not run hot, they are not |
| 141 | +CPU-intensive and block mostly on IO. Issuing more IO operations than there are |
| 142 | +CPU cores will increase throughput and will decrease latency per operation by |
| 143 | +decreasing queueing time. On the other hand, `crypto-async` operations are |
| 144 | +CPU-intensive. Issuing more `crypto-async` operations than there are CPU cores |
| 145 | +will not increase throughput and will increase latency per operation by |
| 146 | +increasing queueing time. |
106 | 147 |
|
107 | 148 | You should therefore:
|
108 | 149 |
|
109 |
| -1. Set the threadpool size to `IO` + `N`, where `IO` is the number of filesystem and DNS operations you expect to be running concurrently, and where `N` is the number of CPU cores available. This will reduce head-of-line blocking. |
| 150 | +1. Set the threadpool size to `IO` + `N`, where `IO` is the number of filesystem |
| 151 | +and DNS operations you expect to be running concurrently, and where `N` is the |
| 152 | +number of CPU cores available. This will reduce head-of-line blocking. |
110 | 153 |
|
111 |
| -2. Allow or design for at most `N` `crypto-async` operations to be running concurrently, where `N` is the number of CPU cores available. This will keep latency within reasonable bounds. |
| 154 | +2. Allow or design for at most `N` `crypto-async` operations to be running |
| 155 | +concurrently, where `N` is the number of CPU cores available. This will keep |
| 156 | +latency within reasonable bounds. |
112 | 157 |
|
113 | 158 | ```javascript
|
114 | 159 | process.env['UV_THREADPOOL_SIZE'] = 128;
|
@@ -164,6 +209,11 @@ cryptoAsync.hmac(algorithm, key, source,
|
164 | 209 | );
|
165 | 210 | ```
|
166 | 211 |
|
| 212 | +### Zero-Copy Methods |
| 213 | + |
| 214 | +The following method alternatives require more arguments but support zero-copy |
| 215 | +crypto operations, for reduced memory overhead and GC pressure. |
| 216 | + |
167 | 217 | #### Cipher (Zero-Copy)
|
168 | 218 | ```javascript
|
169 | 219 | var cryptoAsync = require('crypto-async');
|
@@ -271,4 +321,5 @@ node benchmark.js
|
271 | 321 |
|
272 | 322 | ## AEAD Ciphers
|
273 | 323 |
|
274 |
| -AEAD ciphers such as GCM are currently not supported and may be added in future as an `aead` method. |
| 324 | +AEAD ciphers such as GCM are currently not supported and may be added in future |
| 325 | +as an `aead` method. |
0 commit comments