PLASTER: Bringing Deep Learning Inferencing to Millions of Servers

At the GPU Technology Conference in Silicon Valley earlier this year, NVIDIA CEO Jensen Huang introduced a new acronym named PLASTER to address seven major challenges for delivering AI-based services: Programability, Latency, Accuracy, Size, Throughput, Energy efficiency and Rate of learning.
Meeting these challenges will require more than just sticking an ASIC or an FPGA in a datacenter, Huang said. “Hyperscale data centers are the most complicated computers ever made — how could it be simple?” Huang said.

Like magic: Kubernetes can assign workload across many GPUs on one server or many GPUs on many servers.

A new whitepaper published today explores each of these AI challenges in the context of NVIDIA’s deep learning solutions. PLASTER as a whole is greater than the sum of its parts. Anyone interested in developing and deploying AI-based services should factor in all of PLASTER’s elements to arrive at a complete view of deep learning performance. Addressing the challenges described in PLASTER is important in any deep learning solution, and it is especially useful for developing and delivering the inference engines underpinning AI-based services. Each section of this paper includes a brief description of measurements for each framework component and an example of a customer leveraging NVIDIA solutions to tackle critical problems with machine learning.
Read the whitepaper >

PLASTER: Bringing Deep Learning Inferencing to Millions of Servers

Tags

About the Authors

PLASTER: Bringing Deep Learning Inferencing to Millions of Servers

Tags

About the Authors

Comments

Related posts

Inside the NVIDIA Vera Rubin Platform: Six New Chips, One AI Supercomputer

Measure and Improve AI Workload Performance with NVIDIA DGX Cloud Benchmarking

Speedy Model Training With RAPIDS + Determined AI

Top AI Researchers Receive First NVIDIA Tesla V100s

Accelerating Hyperscale Data Center Applications with Tesla GPUs

Related posts

Running Large-Scale GPU Workloads on Kubernetes with Slurm

Running AI Workloads on Rack-Scale Supercomputers: From Hardware to Topology-Aware Scheduling

Bringing AI Closer to the Edge and On-Device with Gemma 4

Stream High-Fidelity Spatial Computing Content to Any Device with NVIDIA CloudXR 6.0

Build and Stream Browser-Based XR Experiences with NVIDIA CloudXR.js