DLA performance is enabled by both hardware acceleration and software. For example, DLA software performs fusions to reduce the number of passes to and from system memory. TensorRT also provides higher-level abstraction to the DLA software stack.

TensorRT delivers a unified platform and common interface for AI inference on either the GPU or the DLA, or both. The TensorRT builder provides the compile time and build time interface that invokes the DLA compiler. Once the plan file is generated, the TRT runtime calls into the DLA runtime stack to execute the workload on the DLA cores. TensorRT also makes it easy to port from GPU to DLA by specifying only a few additional flags.