Data Science

GPU Accelerating Node.js JavaScript for Visualization and Beyond

NVIDIA GTC21 had numerous great and engaging contents, especially around RAPIDS, so it would be easy to miss our debut presentation “Using RAPIDS to Accelerate Node.js JavaScript for Visualization and Beyond.” Yep – we are bringing the power of GPU accelerated data science to the JavaScript Node.js community with the Node-RAPIDS project.

Node-RAPIDS is an open-source, technical preview of modular RAPIDS’ library bindings in Node.js, as well as, complementary methods for enabling high-performance, browser-based visualizations.

Venn diagram showing intersections of Performance, Reach, and Usability, highlighting the intersection of all three.

What’s the problem with web viz?

Around a decade ago, the mini-renaissance around web-based data visualization showed the benefits of highly interactive, easy to share, and use tools such as D3. While not as performant as C/C++ or Python frameworks, their popularity took off because of JavaScript’s accessibility. No surprise that it often ranks as the most popular developer language, preceding Python or Java, and there is now a full catalog of visualization and data tools.

Yet, this large JavaScript community of developers is impeded by the lack of first-class and accelerated data tools in their preferred language. The analysis is most effective when it is paired as close as possible to its data source, science, and visualizations. To fully access GPU hardware with JavaScript, (beyond webGL limitations and hacks) requires being a polyglot to set up complicated middleware plumbing or use non-js frameworks like Plotly Dash. As a result, data engineers, data scientists, visualization specialists, and front-end developers are often siloed, even within organizations. This is detrimental because data visualization is the ideal medium of communication between these groups.

As for the RAPIDS Viz team, ever since our first proof of concept, we’ve wanted to build tools that can more seamlessly interact with hundreds of millions of data points in real-time through our browsers – and we finally have a way.

Why Node.js

If you are not familiar with Node.js, it is an open-source, cross-platform runtime environment based on C/C++ that executes JavaScript code outside of a web browser. Over 1 Million Node.js downloads occur per day. Node Package Manager (NPM) is the default JavaScript package manager and Microsoft owns it. Node.js is used in the backend of online marketplaces like eBay, AliExpress, and is used by high-traffic websites, such as Netflix, PayPal, and Groupon. Clearly, it is a powerful framework.

 XKCD "Universal Connector" commit from https://xkcd.com/1406/.
Figure 1: XKCD – Node.js is a Universal Connector.

Node.js is the connector that gives us JavaScript with direct access to hardware, which results in a streamlined API and the ability to use NVIDIA CUDA ⚡. By creating node-rapids bindings, we enable a massive developer community with the ability to use GPU acceleration without the need to learn a new language or work in a new environment. We also give the same community access to a high-performance data science platform: RAPIDS!

Here is a snippet of node-RAPIDS in action based on our basic notebook, which shows a 6x speedup for a small regex example: 

// Using https://github.com/rapidsai/node-rapids/
const cudf = require('@rapidsai/cudf');
const regexps = [
/Cloud|Overcast/,
/Rain|T-Storm|Thunderstorm|Squalls|Drizzle/,
/Snow/,
/Fog/,
/Ice|Hail|Freezing|Sleet/,
/Dust|Smoke|Sand/,
];
console.log('');
const weather_condition_gpu = cudf.DataFrame.readCSV({
header: 0,
sourceType: 'files',
sources: [`${__dirname}/US_Accidents_Dec20.csv`],
dataTypes: {
id: 'str', source: 'str', tmc: 'float64', severity: 'int32', start_time: 'str', end_time: 'str',
start_lat: 'float64', start_lng: 'float64', end_lat: 'float64', end_lng: 'float64',
distance: 'float64', description: 'str', number: 'int32', street: 'str', side: 'str',
city: 'str', county: 'str', state: 'str', zipcode: 'str', country: 'str', timezone: 'str', airport_code: 'str',
weather_timestamp: 'str', temperature: 'float64', wind_chill: 'float64', humidity: 'float64', pressure: 'float64',
visibility: 'float64', wind_direction: 'str', wind_speed: 'float64', precipitation: 'float64', weather_condition: 'str',
amenity: 'bool', bump: 'bool', crossing: 'bool', give_way: 'bool', junction: 'bool', no_exit: 'bool', railway: 'bool',
roundabout: 'bool', station: 'bool', stop: 'bool', traffic_calming: 'bool', traffic_signal: 'bool', turning_loop: 'bool',
sunrise_sunset: 'str', civil_twilight: 'str', nautical_twighlight: 'str', astronomical_twighlight: 'str'
},
}).get('weather_condition');
console.time(`GPU time`);
regexps.forEach((regexp) => {
console.time(`${regexp.source} time`);
const matches = weather_condition_gpu.containsRe(regexp.source).sum();
console.timeEnd(`${regexp.source} time`);
console.log(`${regexp.source} matches: ${matches.toLocaleString()}`);
});
console.timeEnd(`GPU time`);
console.log('');
const weather_condition_cpu = (() => {
const categorical = weather_condition_gpu.cast(new cudf.Categorical(new cudf.Utf8String));
const categories = [...categorical.categories];
const codes = [...categorical.codes];
return codes.map((i) => categories[i]);
})();
console.time(`CPU time`);
regexps.forEach((regexp) => {
console.time(`${regexp.source} time`);
const matches = weather_condition_cpu.reduce((matches, weather_condition) => {
return matches + (regexp.exec(weather_condition) || []).length;
}, 0);
console.timeEnd(`${regexp.source} time`);
console.log(`${regexp.source} matches: ${matches.toLocaleString()}`);
});
console.timeEnd(`CPU time`);
console.log('');
/* OUTPUT:
---------------------------
// 1.6GB .CSV
// GPU: Titan RTX
Cloud|Overcast time: 26.819ms
Cloud|Overcast matches: 1,896,354
Rain|T-Storm|Thunderstorm|Squalls|Drizzle time: 63.813ms
Rain|T-Storm|Thunderstorm|Squalls|Drizzle matches: 326,441
Snow time: 6.396ms
Snow matches: 68,101
Fog time: 6.997ms
Fog matches: 52,063
Ice|Hail|Freezing|Sleet time: 44.031ms
Ice|Hail|Freezing|Sleet matches: 4,698
Dust|Smoke|Sand time: 29.932ms
Dust|Smoke|Sand matches: 8,846
GPU time: 190.457ms
// CPU: AMD Ryzen Threadripper 1900X 8-Core (3.8GHZ)
Cloud|Overcast time: 244.493ms
Cloud|Overcast matches: 1,896,354
Rain|T-Storm|Thunderstorm|Squalls|Drizzle time: 192.591ms
Rain|T-Storm|Thunderstorm|Squalls|Drizzle matches: 326,441
Snow time: 206.071ms
Snow matches: 68,101
Fog time: 204.61ms
Fog matches: 52,063
Ice|Hail|Freezing|Sleet time: 214.325ms
Ice|Hail|Freezing|Sleet matches: 4,698
Dust|Smoke|Sand time: 164.633ms
Dust|Smoke|Sand matches: 8,846
CPU time: 1.230s
---------------------------
// GPU is 6.45x faster than CPU
*/

Node-RAPIDS: designed as building blocks

 Notional diagram of node-rapids module structure, showing components for memory management (cuda, rmm), data science  (cudf, cuml, cugraph, cuspatial), graphics, streaming, and front end.
Figure 2: Node-RAPIDS module overview.

Similar to node projects, Node-RAPIDS is designed to be modular. Our aim is not to build turnkey web applications, but to create an inventory of functionality that enables or accelerates a wide variety of use cases and pipelines. The preceding is an overview of the current and planned Node-RAPIDS modules grouped in general categories. A Node-RAPIDS application can use as many or as few of the modules as needed.

To make starting out less daunting, we are also building a catalog of demos that can serve as templates for generalized applications. As we develop more bindings, we will create more demos to showcase their capabilities.

Notional architecture diagram showing a GPU-accelerated node.js application for crossfiltering, made possible using node-rapids.
Figure 3: Example of a Cross Filter App.

The preceding is an idealized stack of a geospatial cross filter dashboard application using RAPIDS cuDF and RAPIDS cuSpatial libraries. We have a simple demo using Deck.gl that you can preview with our video and explore demo code on Github. 


Figure 4: Example of Streaming ETL Process.

The last example preceding is a server-side only ETL pipeline without any visualization. We have an example of a straightforward ETL process using cuDF bindings and the nteract notebook desktop application, which you can preview with our video and nteract with (get it) on our Notebook.

What is next?

While we have been thinking about this project for a while, we are just getting started in development. RAPIDS is an incredible framework, and we want to bring it to more people and more applications – RAPIDS everywhere as we say.

Near-term next steps:

  • Some short-term next steps are to continue building core RAPIDS binding features, which you can check out on our current binding coverage table.
  • If the idea of GPU accelerated SQL queries straight from your web app sounds interesting (it does to us), we hope to get started on some blazingSQL bindings soon too.
  • And most noteworthy, we plan to start creating and publishing modular docker containers, which will dramatically simplify the current from-source tech preview installation process.

As always, we need community engagement to help guide us. If you have feature requests, questions, or use cases, you can reach out to us!

This project has numerous potentials. It can accelerate a wide variety of Node.js applications, as well as bring first-class, high-performance data science and visualization tools to a huge community. We hope you join us at the beginning of this exciting project.

Resources:

Discuss (1)

Tags