Tag: TensorRT

AI / Deep Learning

NVIDIA Announces TensorRT 8 Slashing BERT-Large Inference Down to 1 Millisecond

NVIDIA announced TensorRT 8.0 which brings BERT-Large inference latency down to 1.2 ms with new optimizations. 3 MIN READ
AI / Deep Learning

Achieving FP32 Accuracy for INT8 Inference Using Quantization Aware Training with NVIDIA TensorRT

○ TensorRT is an SDK for high-performance deep learning inference and with TensorRT 8.0, you can import models trained using Quantization Aware Training (QAT)… 17 MIN READ
AI / Deep Learning

Speeding Up Deep Learning Inference Using TensorFlow, ONNX, and NVIDIA TensorRT

This post was updated July 20, 2021 to reflect NVIDIA TensorRT 8.0 updates. In this post, you learn how to deploy TensorFlow trained deep learning models using… 15 MIN READ
AI / Deep Learning

Getting the Most Out of NVIDIA T4 on AWS G4 Instances

Learn how to get the best natural language inference performance from AWS G4dn instance powered by NVIDIA T4 GPUs, and how to deploy BERT networks easily using… 14 MIN READ
AI / Deep Learning

Extending NVIDIA Performance Leadership with MLPerf Inference 1.0 Results

In this post, we step through some of these optimizations, including the use of Triton Inference Server and the A100 Multi-Instance GPU (MIG) feature. 7 MIN READ
AI / Deep Learning

ICYMI: New AI Tools and Technologies Announced at GTC 2021 Keynote

At GTC 2021, NVIDIA announced new software tools to help developers build optimized conversational AI, recommender, and video solutions. 7 MIN READ