Networking

SC20 Demo: Maximizing Performance for Distributed Machine Learning and Deep Learning with SHARP

Nov 16, 2020

By Nefi Alarcon

Today’s modern-day machine learning data centers require complex computations and fast, efficient data delivery. The NVIDIA Mellanox Scalable Hierarchical Aggregation and Reduction Protocol (SHARP) takes advantage of the in-network computing capabilities in the NVIDIA Mellanox Quantum switch, dramatically improving the performance of distributed machine learning workloads.

SHARP technology improves upon the performance of MPI and Machine Learning collective operations by offloading collective operations from the CPU to the network and eliminating the need to send data multiple times between endpoints.

This innovative approach decreases the amount of data traversing the network as aggregation nodes are reached, and dramatically reduces collective operations time.

Magnum IO>

Learn More about SHARP

View all SC20 Demos>

Related resources

GTC session: Training Deep Learning Models at Scale: How NCCL Enables Best Performance on AI Data Center Networks
GTC session: MCR-DL: Mix-and-Match Communication Runtime for Deep Learning
GTC session: Accelerating Deep Learning Applications With GPU-Based On-the-Fly Compression
SDK: NCCL
SDK: NVSHMEM
SDK: RAPIDS

Discuss (0)

About the Authors

About Nefi Alarcon
Nefi Alarcon is a senior executive communications manager on NVIDIA's leadership team. He has years of media relations and communication experience, and has previously worked at Google, Mozilla, and CNN. He received his bachelor's degree in Journalism from George Washington University.

View all posts by Nefi Alarcon