Neta Zmora

Neta Zmora is a senior deep learning software architect working on DL acceleration. Before joining NVIDIA in 2020, Neta was a research engineer at Intel’s AI Lab developing methods for deep neural network compression.

Posts by Neta Zmora

Technical Walkthrough 0

Achieving FP32 Accuracy for INT8 Inference Using Quantization Aware Training with NVIDIA TensorRT

○ TensorRT is an SDK for high-performance deep learning inference and with TensorRT 8.0, you can import models trained using Quantization Aware Training (QAT) to run inference in INT8 precision without losing FP32 accuracy. QAT significantly reduces compute required and storage overhead for efficient inference. 17 MIN READ