DriveWorks SDK Reference
5.8.83 Release
For Test and Development only

DNN Workflow

This code snippet demonstrates the how the DNN module is typically used. Note that error handling is left out for clarity.

Initialize network from file.

If the model has been generated on DLA using --useDLA option with tensorrt_optimization tool, the processor type should be either DW_PROCESSOR_TYPE_DLA_0 or DW_PROCESSOR_TYPE_DLA_1 depending on which DLA engine the inference should take place. Otherwise, the processor type should always be DW_PROCESSOR_TYPE_GPU.

contextHandle is assumed to be a previously initialized dwContextHandle_t.

// Load the DNN from a file. Note that the DNN model has to be generated with the tensorrt_optimization tool.
dwDNNHandle_t dnn = nullptr;
dwDNN_initializeTensorRTFromFile(&dnn, "network.fp32", nullptr, DW_PROCESSOR_TYPE_GPU, contextHandle);
@ DW_PROCESSOR_TYPE_GPU
Definition: Types.h:170
struct dwDNNObject * dwDNNHandle_t
Handles representing Deep Neural Network interface.
Definition: DNN.h:62
DW_API_PUBLIC dwStatus dwDNN_initializeTensorRTFromFile(dwDNNHandle_t *const network, const char8_t *const modelFilename, const dwDNNPluginConfiguration *const pluginConfiguration, dwProcessorType const processorType, dwContextHandle_t const context)
Creates and initializes a TensorRT Network from file.

Check that the loaded network has the expected number of inputs and outputs.

// Find out the number of input and output blobs in the netowrk
uint32_t numInputs = 0;
uint32_t numOutputs = 0;
dwDNN_getInputBlobCount(&numInputs, dnn);
dwDNN_getOutputBlobCount(&numOutputs, dnn);
if (numInputs != 1) {
std::cerr << "Expected a DNN with one input blob." << std::endl;
return -1;
}
if (numOutputs != 2) {
std::cerr << "Expected a DNN with two output blobs." << std::endl;
return -1;
}
DW_API_PUBLIC dwStatus dwDNN_getInputBlobCount(uint32_t *const count, dwDNNHandle_t const network)
Gets the input blob count.
DW_API_PUBLIC dwStatus dwDNN_getOutputBlobCount(uint32_t *const count, dwDNNHandle_t const network)
Gets the output blob count.

Ask the DNN about the order of the input and output blobs. The network is assumed to contain the input blob "data_in" and output blobs "data_out1" and "data_out2".

uint32_t inputIndex = 0;
uint32_t output1Index = 0;
uint32_t output2Index = 0;
// Find indices of blobs by their name.
dwDNN_getInputIndex(&inputIndex, "data_in", dnn);
dwDNN_getOutputIndex(&output1Index, "data_out1", dnn);
dwDNN_getOutputIndex(&output2Index, "data_out2", dnn);
DW_API_PUBLIC dwStatus dwDNN_getOutputIndex(uint32_t *const blobIndex, const char8_t *const blobName, dwDNNHandle_t const network)
Gets the index of an output blob with a given blob name.
DW_API_PUBLIC dwStatus dwDNN_getInputIndex(uint32_t *const blobIndex, const char8_t *const blobName, dwDNNHandle_t const network)
Gets the index of an input blob with a given blob name.

Initialize host and device memory to hold the inputs and outputs of the network.

std::vector<float32_t*> dnnInputs(numInputs, nullptr);
std::vector<float32_t*> dnnOutputs(numOutputs, nullptr);
std::vector<float32_t> dnnInputHost;
std::vector<std::vector<float32_t>> dnnOutputHost(numOutputs);
// Allocate device memory for DNN input.
dwBlobSize sizeInput;
dwDNN_getInputSize(&sizeInput, inputIndex, dnn);
size_t numInputElements = sizeInput.batchsize * sizeInput.channels * sizeInput.height * sizeInput.width;
cudaMalloc(&dnnInputs[inputIndex], sizeof(float32_t) * numInputElements);
dnnInputHost.resize(numInputElements);
// Allocate device and host memory for DNN outputs
dwBlobSize size1, size2;
dwDNN_getOutputSize(&size1, output1Index, dnn);
dwDNN_getOutputSize(&size2, output2Index, dnn);
size_t numElements1 = size1.batchsize * size1.channels * size1.height * size1.width;
size_t numElements2 = size2.batchsize * size2.channels * size2.height * size2.width;
cudaMalloc(&dnnOutputs[output1Index], sizeof(float32_t) * numElements1);
cudaMalloc(&dnnOutputs[output2Index], sizeof(float32_t) * numElements2);
dnnOutputHost[output1Index].resize(numElements1);
dnnOutputHost[output2Index].resize(numElements2);
// Fill dnnInputHost with application data.
uint32_t channels
Number of channels (c).
Definition: Types.h:666
uint32_t batchsize
Batch size (n).
Definition: Types.h:664
uint32_t height
Height (h).
Definition: Types.h:668
uint32_t width
Width (w).
Definition: Types.h:670
float float32_t
Specifies POD types.
Definition: Types.h:70
Holds blob dimensions.
Definition: Types.h:662
DW_API_PUBLIC dwStatus dwDNN_getOutputSize(dwBlobSize *const blobSize, uint32_t const blobIndex, dwDNNHandle_t const network)
Gets the output blob size at blobIndex.
DW_API_PUBLIC dwStatus dwDNN_getInputSize(dwBlobSize *const blobSize, uint32_t const blobIndex, dwDNNHandle_t const network)
Gets the input blob size at blobIndex.

Copy DNN input from host buffers to device, then perform DNN inference and copy results back. All operations are performed asynchronously with the host code.

// Enqueue asynchronous copy of network input data from host to device memory.
cudaMemcpyAsync(dnnInputs[inputIndex], dnnInputHost.data(), sizeof(float32_t) * numInputElements, cudaMemcpyHostToDevice);
// Begin DNN inference in the currently selected CUDA stream.
dwDNN_infer(dnnInputs.data(), dnnOutputs.data(), dnn);
// Enqueue asynchronous copy of the inference results to host memory
cudaMemcpyAsync(dnnOutputHost[output1Index].data(), dnnOutputs[output1Index], sizeof(float32_t) * numElements1, cudaMemcpyDeviceToHost);
cudaMemcpyAsync(dnnOutputHost[output2Index].data(), dnnOutputs[output2Index], sizeof(float32_t) * numElements2, cudaMemcpyDeviceToHost);
// Do something while inference results are being calculated.
otherUsefulWork();
// Wait until all pending operations on the CUDA device have finished.
cudaDeviceSynchronize();
// Inference and memory copies are done. Read results from dnnOutputHost[output1Index] and dnnOutputHost[output2Index].
DW_API_PUBLIC dwStatus dwDNN_infer(dwDNNTensorHandle_t *const outputTensors, uint32_t const outputTensorCount, dwConstDNNTensorHandle_t *const inputTensors, uint32_t const inputTensorCount, dwDNNHandle_t const network)
Runs inference pipeline on the given input.

Finally, free previously allocated memory.

// Free resources.
cudaFree(dnnInputs[inputIndex]);
cudaFree(dnnOutputs[output1Index]);
cudaFree(dnnOutputs[output2Index]);
DW_API_PUBLIC dwStatus dwDNN_release(dwDNNHandle_t const network)
Releases a given network.

For more information see: