Data Science

NVIDIA GTC: Top Data Science Sessions

NVIDIA GTC is the must attend AI conference for developers. It’s a place where practitioners, leaders, and innovators share their ideas about the latest trends in data science. Here are six top data science GTC sessions worth attending.

Vehicle Routing at Domino’s: Exploring a GPU-Enabled Approach (A31074) presented by Frank Cardone from Dominos

Thursday, Nov 11, 5:00 AM – 5:25 AM PST

Domino’s Pizza delivers thousands of pizzas a day and requires real-time planning and logistics capabilities. Working with the RAPIDS team, Domino’s implemented a real-time planning system that fits their strict requirements and delivers sub-second runtimes for their use case.

Accelerating Data Science: What’s New in RAPIDS (A31490) presented by John Zedlewski from NVIDIA

Tuesday, Nov 9, 10:00 AM – 10:50 AM PST

RAPIDS, the open-source framework for accelerated data science, combines the compute power of modern GPUs with the ease-of-use of the PyData ecosystem. This talk will start with an overview of the RAPIDS ecosystem, which now includes more than a half-dozen libraries and a wide range of integrations, and then dive into some of the recent frontiers of RAPIDS and upcoming features, including improvements to inference, Pandas compatibility, SQL support, and graph analytics.

PyTorch Ecosystem: The State of the State 2021 (A31212) presented by Dwarak Rajagopal from Facebook

Thursday, Nov 11, 6:00 AM – 6:25 AM PST

PyTorch continues to be a foundational component for AI research, and increasingly for production at scale, at companies like Facebook, Microsoft, and many others. As PyTorch continues to grow, so does the ecosystem around it. Joe Spisak, product lead for PyTorch at Facebook AI, will provide an update on the state of the AI community and ecosystem around PyTorch.

Accelerated Data Science for Molecule Design (A31104) presented by Damien Coupry from GSK

Thursday, Nov 11, 3:00 AM – 3:50 AM PST

Drug discovery is a complex process, taking years of costly work and with a very high attrition rate. State-of-the-art machine learning techniques can be applied at many levels of the discovery process, resulting in significant gains in speed. The performance of these algorithms is highly correlated with the volume and consistency of data available during training. In contrast, pharmacologically relevant data tends to be fragmented by assay differences, have low variation within chemical series, and have a poor distribution of values. Leveraging scalable featurization and domain-aware embeddings, as well as data augmentation techniques, is therefore a high-value approach for data science in health care.

Accelerating Apache Spark (A31439) presented by Sameer Raheja from NVIDIA

Tuesday, Nov 9, 1:00 PM – 1:25 PM PST

The RAPIDS Spark plugin has increased in functionality and performance in the past 8 months. Get an overview of the new functionality, including functionality for windowing, expanded operations for decimal 128, list, struct and map data types, expanded support for Parquet and ORC, and more. We’ll describe how the plugin handles datasets that do not fit in GPU memory and give an update of performance improvements using the NDS benchmark. Finally, we’ll give an update on the environments in which the plugin runs, highlighting partnerships with Cloudera, AWS, Azure and GCP.

Supporting End-to-end Accelerated Data Science Workflows in any Data Center with NVIDIA EGX (A3a31353) presented by Will Benton from NVIDIA

Thursday, Nov 11, 6:30 AM – 6:55 AM PST

You probably know that NVIDIA GPUs support accelerated training and inference for machine learning and deep learning models. If you’re only using your GPUs for models, though, you’re missing out on opportunities to streamline data workflows and practitioner experience across the entire data life cycle. Learn how NVIDIA hardware, NVIDIA application frameworks, and partner technologies combine to support an end-to-end data science life cycle. We’ll build up an application to model and detect payments fraud, from exploratory analysis to production deployment; show the benefits of GPUs for developers, data engineers, and data scientists; and explain how you can support these cross-functional teams with common infrastructure.

There are over 150 data science sessions at GTC. Visit the GTC website to register and learn more.

Discuss (4)