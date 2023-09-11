TensorRT: What’s New

NVIDIA® TensorRT-LLM greatly speeds optimization of large language models (LLMs). Leveraging TensorRT™, FasterTransformer, and more, TensorRT-LLM accelerates LLMs via targeted optimizations like Flash Attention, Inflight Batching, and FP8 in an open-source Python API, enabling developers to get optimal inference performance on GPUs.

NVIDIA TensorRT 8.6 improves cross-compatibility between GPUs and software stacks, making TensorRT more versatile across hardware deployments and upgrades.

TensorRT 8.6 GA is a free download for members of the NVIDIA Developer Program.