Technical Walkthrough 1

Accelerated Inference for Large Transformer Models Using FasterTransformer and Triton Inference Server

This is the first part of a two-part series discussing the NVIDIA FasterTransformer library, one of the fastest libraries for distributed inference of... 10 MIN READ
Technical Walkthrough 2

NVIDIA AI Platform Delivers Big Gains for Large Language Models

As the size and complexity of large language models (LLMs) continue to grow, NVIDIA is today announcing updates to the NeMo Megatron framework that provide... 7 MIN READ