Developer Blog

Automatic license plate recognition (ALPR) on stationary to fast-moving vehicles is one of the common intelligent video analytics applications for smart cities. Some of the common use cases include parking assistance systems, automated toll booths, vehicle registration and identification for delivery and logistics at ports, and medical supply transporting warehouses. Being able to do this in real time is key to servicing these markets to their full potential. Traditional techniques rely on specialized cameras and processing hardware, which is both expensive to deploy and difficult to maintain.

Figure 1. Cars with bounding boxes.

The pipeline for ALPR involves detecting vehicles in the frame using an object detection deep learning model, localizing the license plate using a license plate detection model, and then finally recognizing the characters on the license plate. Optical character recognition (OCR) using deep neural networks is a popular technique to recognize characters in any language.

In this post, we show you how to use production-quality AI models such as License Plate Detection (LPD) and License Plate Recognition (LPR) models in conjunction with the NVIDIA Transfer Learning Toolkit (TLT). Ready-to-use models allow you to quickly lift off your ALPR project. The resulting TLT-optimized models can be readily deployed using the DeepStream SDK.

Video. Real-time license plate recognition.

To get started with creating and deploying highly accurate, pretrained models from the TLT, you need the following resources:

All the pretrained models are free and readily available on NVIDIA NGC. TLT provides two LPD models and two LPR models: one set trained on US license plates and another trained on license plates in China. For more information, see the LPD and LPR model cards.

Workflow uses three cascaded models starting with vehicle detection, license plate detection followed by license plate recognition.
Figure 2. ALPR application pipeline for a real-time car license plate recognition application.

You use TLT through the tlt-launcher interface for training. To run the TLT launcher, map the ~/tlt-experiments directory on the local machine to the Docker container using the ~/.tlt_mounts.json file. For more information, see TLT Launcher.

Install the TLT launcher:

pip3 install nvidia-pyindex
pip3 install nvidia-tlt

Create the ~/.tlt_mounts.json file and add the following content inside:

{
    "Mounts": [
        {
            "source": "/home/<username>/tlt-experiments",
            "destination": "/workspace/tlt-experiments"
        },
        {
            "source": "/home/<username>/openalpr",
            "destination": "/workspace/openalpr"
        }

    ]
}

Mount the path /home/<username>/tlt-experiments on the host machine to be the path /workspace/tlt-experiments inside the container. You also mount the path /home/<username>/openalpr on the host machine to be the path /workspace/openalpr inside the container.

License plate detection

In this section, we walk you through how to take the pretrained US-based LPD model from NGC and fine-tune the model using the OpenALPR dataset.

Dataset

Use the OpenALPR benchmark as your experimental dataset. You take the LPD pretrained model from NGC and fine-tune it on the OpenALPR dataset.

Algorithm introduction

LPD model is based on the Detectnet_v2 network from TLT. The training algorithm optimizes the network to minimize the localization and confidence loss for the objects.

The training is carried out in two phases. In the first phase, the network is trained with regularization to facilitate pruning. Following the first phase, you prune the network removing channels whose kernel norms are below the pruning threshold. In the second phase, the pruned network is retrained. Regularization is not included during the second phase.

Training the LPD model

Set up your NVIDIA NGC account and install the TLT launcher. To fine-tune the LPD model, download the LPD notebook from NGC. Then, download the NGC LPD pretrained model (usa_unpruned.tlt).

Prepare the dataset

First, sync the OpenALPR benchmark:

$ git clone https://github.com/openalpr/benchmarks benchmarks

Next, run the following command to download the dataset and resize images/labels. Download lpd_prepare_data.py:

$ python lpd_prepare_data.py --input_dir benchmarks/endtoend/us --output_dir  lpd  --target_width  640 --target_height 480

Split the data into two parts: 80% for the training set and 20% for the validation set. Run the following command to split the dataset randomly and generate tfrecords. This command uses a spec file called SPECS_tfrecord.txt.

$ tlt detectnet_v2 dataset_convert -d /workspace/openalpr/SPECS_tfrecord.txt -o /workspace/openalpr/lpd_tfrecord/lpd

Configure the spec file

After you prepare the dataset, configure the parameters for training by downloading the training spec. You are specifying the NGC pretrained model for LPD using the pretrained_model_file parameter in the spec file. Set the batch-size to 4 and run 120 epochs for training. The training model is evaluated with the validation set every 10 epochs.

Training

Run the following command to start fine-tuning on the OpenALPR data:

$ tlt detectnet_v2 train -e /workspace/openalpr/SPECS_train.txt -r /workspace/openalpr/exp_unpruned -k nvidia_tlt

After the training completes, you see the following log that shows the average precision (AP) on the validation set:

class name      average precision (in %)
------------  --------------------------
lpd                     82.2808

After training, you can also prune your trained model to reduce the size of the model. Pruning is not shown in this post. For more information, see Pruning the model or Training with Custom Pretrained Models Using the NVIDIA Transfer Learning Toolkit.

Exporting the model

After training, export the model for deployment. The format for deployment is .etlt or encrypted TLT. You encrypt the exported model with a key and use the key to decrypt the model during deployment.

To run inference using INT8 precision, you can also generate an INT8 calibration table in the model export step. The encrypted TLT can be directly consumed in the DeepStream SDK.

To export the LPD model in INT8, use the following command. This command first calibrates the model for INT8 using calibration images specified by the --cal_image_dir option. The encryption key for this model is specified by the -k option. This can be any string. The exported .etlt file and calibration cache is specified by the -o and the --cal_cache_file option, respectively. To learn more about all the options with model export, see the TLT DetectNet_v2 documentation.

$ tlt detectnet_v2 export -m /workspace/openalpr/exp_unpruned/weights/model.tlt -o /workspace/openalpr/export/unpruned_model.etlt --cal_cache_file /workspace/openalpr/export/calibration.bin -e /workspace/openalpr/SPECS_train.txt -k nvidia_tlt --cal_image_dir /workspace/openalpr/lpd/data/image --data_type int8 --batch_size 4 --batches 10 –-engine_file /workspace/openalpr/export/unpruned_int8.trt

Accuracy of the trained LPD model

The pretrained model provides a great starting point for training and fine-tuning on your own dataset. For comparison, we have trained two models: one trained using the LPD pretrained model and the second trained from scratch. The following table shows the mean average precision (mAP) comparison of the two models. By using the pretrained model, you can reach your target accuracy much faster with a smaller dataset. If you were to train from scratch, you would need a much larger dataset and would need to run it with longer to achieve similar accuracy.

You could use the following command in TLT Docker to run an evaluation on the validation dataset specified in the experiments config file:

$ tlt detectnet_v2 evaluate -m /workspace/openalpr/exp_unpruned/weights/model.tlt -k nvidia_tlt -e /workspace/openalpr/SPECS_train.txt
ModelEpochsbatch-sizemAP
LPD: Training from scratch120453.11%
LPD: Fine-tuning a pretrained model120482.28%
Table 1. Accuracy of using the pretrained model vs. training from scratch.

License plate recognition

In this section, we go into the details of the LPR model training. NVIDIA provides LPRNet models trained on US license plates and Chinese license plates. You can find the details of these models in the model card. You use LPRNet trained on US license plates as the starting point for fine-tuning in the following section.

Dataset

You train and evaluate the LPRNet on the OpenALPR US images dataset as well. Split it to 80% (177 images) for training and 20% (44 images) for validation.

Algorithm introduction

For the license plate recognition task, you predict the characters in sequence of a license plate image. Just like other computer vision tasks, you first extract the image features. Take advantage of widely used DNN architecture, such as ResNet 10/18, to be the backbone of LPRNet. The original stride of the ResNet network is 32 but to make it more applicable to the small spatial size of the license plate image, tune the stride from 32 to 4. Then, feed the image feature into a classifier. Unlike the normal image classification task, in which the model only gives a single class ID for one image, the LPRNet model produces a sequence of class IDs. The image feature is divided into slices along the horizontal dimension and each slice is assigned a character ID in the prediction.

Finally, use the connectionist temporal classification (CTC) loss to train this sequence classifier. The training algorithm optimizes the network to minimize the CTC loss between the ground truth characters sequence of a license plate and predicted characters sequence.

In general, LPRNet is a sequence classification model with a tuned ResNet backbone. It takes the image as network input and produces sequence output. Then, the license plate is decoded from the sequence output using a CTC decoder based on a greedy decoding method.

The image feature is a sequence along its horizontal axis and then the image features are sent to a sequence classifier. Finally, the license plate number is decoded from the output of the sequence classifier using CTC decoder based on greedy decoding.
Figure 3. LPR model architecture.

Training the LPR model

Training LPRNet using TLT requires no code development from your side. You prepare a dataset, set the experiment config, and then run the command.

Prepare the data

You process data in the /home/<username>/tlt-experiments/ path of the local machine and use the mapped path in Docker for tlt-launcher. First, clone the OpenALPR benchmark from openalpr/benchmarks:

$ git clone https://github.com/openalpr/benchmarks benchmarks

Next, preprocess the downloaded dataset and split it into train/val using the preprocess_openalpr_benchmark.py script.

$ python preprocess_openalpr_benchmark.py --input_dir=./benchmarks/endtoend/us --output_dir=./data/openalpr

After preprocessing, the OpenALPR dataset is in the format that TLT requires. Each cropped license plate image has a corresponding label text file that contains the ground truth of the license plate image. You also create a characters_list.txt file that is a dictionary of all the characters found in the US license plates.

Experiments config

The experiments config file defines the hyperparameters for LPRNet model’s architecture, training, and evaluation. Download a sample LPR training config file and place it in the /home/<username>/tlt-experiments/lprnet path. Use this config for fine-tuning on US LPRNet.

In this config, you define a LPRNet model with a tuned ResNet18 backbone, which is your baseline. Train the model for 24 epochs with batch size 32, L2 regularization of 0.0005, and a soft_start_annealing_schedule to apply a variable learning rate during training. For more information about the parameters in the experiment config file, see the Transfer Learning Toolkit User Guide.

We also provide a spec file to train from scratch. Compared with fine-tuning config, you must enlarge the epoch number and learning rate. Though this is not a recommended way for training, we provided it for comparison.

Training

When the dataset and experiment spec are ready, start your training in TLT. Use the following command to train a LPRNet with a single GPU and the US LPRNet model as pretrained weights:

$ tlt lprnet train -e /workspace/tlt-experiments/lprnet/tutorial_spec.txt -r /workspace/tlt-experiments/lprnet/ -k nvidia_tlt -m /workspace/tlt-experiments/lprnet/us_lprnet_baseline18_trainable.tlt

TLT also supports multi-GPU training (data parallelism) and automatic mixed precision (AMP). To boost the training speed, you could run multi-GPU with option --gpus <num_gpus> and mixed precision training with option --use_amp. The training log, which includes accuracy on validation dataset, training loss, and learning rate, is saved in .csv format in the <results_dir> directory. The following code example shows the training log with pretrained weights:

epoch,accuracy,loss,lr
0,nan,1.085993747589952,1e-05
1,nan,0.9726232198503731,1e-05
2,nan,0.9452087508756563,1e-05
3,nan,0.7897920507495686,1e-05
4,0.8409090909090909,0.5753771635772145,1e-05               
…….

Exporting the model

To deploy the LPR model in DeepStream or other applications, export it to the .etlt format. Currently, LPR only supports FP32 and FP16 precision. Compared with LPD’s model export command, LPR is much simpler:

$ tlt lprnet export -m /workspace/tlt-experiments/lprnet/weights/lprnet_epoch-24.tlt -k nvidia_tlt -e /workspace/tlt-experiments/lprnet/tutorial_spec.txt

The output .etlt model is saved in the same directory as the trained .tlt model.

Accuracy of the trained LPR model

The evaluation metric of LPR is the accuracy of license plate recognition. A recognition is regarded as accurate if all the characters and the sequence in the license plate are correct. You can use the following command in TLT Docker to run an evaluation on the validation dataset specified in the experiments config file:

$ tlt lprnet evaluate -m /workspace/tlt-experiments/lprnet/weights/lprnet_epoch-24.tlt -k nvidia_tlt -e /workspace/tlt-experiments/lprnet/tutorial_spec.txt

The following table shows the accuracy comparison of the model trained from scratch and the model trained with the LPRNet pretrained model.

ModelEpochsTrain AccuracyVal Accuracy
baseline18_unpruned_from_scratch1000%0%
baseline18_unpruned_from_pretrained2498.87%90.90%
Table 2. Accuracy of using the pretrained model vs. training from scratch.

With the pretrained model, you can reach high accuracy with a small number of epochs. Conversely, when training from scratch, your model hasn’t even begun to converge with a 4x increase in the number of epochs. This means that you must increase the dataset significantly, which increases the training time and cost.

Deploying LPD and LPR using the DeepStream SDK

In this section, we walk you through the steps to deploy the LPD and LPR models in DeepStream. We have provided a sample DeepStream application. The LPD/LPR sample application builds a pipeline for multiple video streams input and infer the batched videos with cascading models to detect cars and their license plates and to recognize characters.

This shows a multiple input streams going into DeepStream application. The streams are decoded, muxed using nvstreammux, then send to a primary detection for detecting vehicles, after that the cropped image is send to a LPD model to detect license plate and then finally to a LPR model to recognize the characters on the plate before displaying the output on the screen using OSD.
Figure 4. DeepStream pipeline of LPD/LPR sample application.

The source code for the sample application is constructed in two parts:

  • lpr-test-sample—The main application of the LPD/LPR sample, which constructs the Gstreamer pipeline with DeepStream plugins.
  • nvinfer_custom_lpr_parser—The customized DeepStream nvinfer plugin classifier parser library for LPR model. The default DeepStream nvinfer classifier can only support confidence parsing and get labels from the label file which is configured by labelfile-path parameter in the nvinfer configuration file. Because the LPR model outputs the argmax and confidence with two layers, the customized output parsing function is needed to parse the LPR output layers and generate correct labels for car plate strings. The customized parser function should look-up the designated dictionary to find characters with argmax values and confidence values, and then combine the characters into car plate string as the label.

Download and prepare the models

For this application, you need three models from TLT:

  • TrafficCamNet detects vehicles.
  • LPD detects license plates.
  • LPR recognizes the characters.

All models can be downloaded from NVIDIA NGC. Alternatively, if you followed the training steps in the earlier two sections, you could also use your trained LPD and LPR model instead.

Download the TrafficCamNet model:

mkdir -p /opt/nvidia/deepstream/deepstream-5.0/samples/models/tlt_pretrained_models/trafficcamnet
cd /opt/nvidia/deepstream/deepstream-5.0/samples/models/tlt_pretrained_models/trafficcamnet
wget https://api.ngc.nvidia.com/v2/models/nvidia/tlt_trafficcamnet/versions/pruned_v1.0/files/trafficnet_int8.txt
wget https://api.ngc.nvidia.com/v2/models/nvidia/tlt_trafficcamnet/versions/pruned_v1.0/files/resnet18_trafficcamnet_pruned.etlt

Download the LPD model:

mkdir -p /opt/nvidia/deepstream/deepstream-5.0/samples/models/LP/LPD
cd /opt/nvidia/deepstream/deepstream-5.0/samples/models/LP/LPD
wget https://api.ngc.nvidia.com/v2/models/nvidia/tlt_lpdnet/versions/pruned_v1.0/files/usa_pruned.etlt
wget https://api.ngc.nvidia.com/v2/models/nvidia/tlt_lpdnet/versions/pruned_v1.0/files/usa_lpd_cal.bin
wget https://api.ngc.nvidia.com/v2/models/nvidia/tlt_lpdnet/versions/pruned_v1.0/files/usa_lpd_label.txt

Download the LPR model:

mkdir -p /opt/nvidia/deepstream/deepstream-5.0/samples/models/LP/LPR
cd /opt/nvidia/deepstream/deepstream-5.0/samples/models/LP/LPR
wget https://api.ngc.nvidia.com/v2/models/nvidia/tlt_lprnet/versions/deployable_v1.0/files/us_lprnet_baseline18_deployable.etlt
#create an empty label file
echo > labels_us.txt

With DeepStreamSDK 5.x, the gst-nvinfer plugin cannot automatically generate TensorRT engine from the ONNX format from TLT. The LPR model is exported in encrypted ONNX format from TLT, and it’s a limitation for the LPR model. The LPD model is in legacy encrypted UFF format and automatically works with DeepStream. The engine files for the LPR model should be generated using the tlt-converter tool. Download the latest tlt-converter for your appropriate hardware and CUDA or cuDNN version from the TLT getting started page.

PlatformCompute
x86 + GPUCUDA 10.2 / cuDNN 8.0 / TensorRT 7.1
x86 + GPUCUDA 10.2 / cuDNN 8.0 / TensorRT 7.2
x86 + GPUCUDA 11.0 / cuDNN 8.0 / TensorRT 7.1
x86 + GPUCUDA 11.0 / cuDNN 8.0 / TensorRT 7.2
JetsonJetPack 4.4
JetsonJetPack 4.5
Table 1. Download locations by platform and hardware.

Convert the encrypted LPR ONNX model to a TLT engine:

tlt-converter -k nvidia_tlt -p image_input,1x3x48x96,4x3x48x96,16x3x48x96 ./us_lprnet_baseline18_deployable.etltunpruned.etlt -t fp16 -e /opt/nvidia/deepstream/deepstream-5.0/samples/models/LP/LPR/lpr_us_onnx_b16.engine

Build and run the sample application

Download the sample code from the NVIDIA-AI-IOT/deepstream_lpr_app GitHub repo and build the application.

Copy the folder of nvinfer_custom_lpr_parser to the board and build the code:

cd  nvinfer_custom_lpr_parser
make

Copy the generated libnvdsinfer_custom_impl_lpr.so file to the /opt/nvidia/deepstream/deepstream-5.0/lib/ directory.

Copy the folder of lpr-test-sample to your device and build the code. The sample application lpt-test-app is generated.

cd lpr-test-sample
make

Modify the nvinfer configuration files for TrafficCamNet, LPD and LPR with the actual model path and names. The config file for TrafficCamNet is provided in DeepStream SDK under the following path:

/opt/nvidia/deepstream/deepstream-5.0/samples/configs/tlt_pretrained_models/deepstream_app_source1_trafficcamnet.txt

The sample lpd_config.txt and lpr_config_sgie_us.txt files can be found lpd_config.txt and lpr_config_sgie_us.txt. Note the parse-classifier-func-name and custom-lib-path. This uses the new nvinfer LPR library from step 1.

Prepare the dictionary file for the OCR according to the trained TLT LPR model. The dictionary file name should be dict.txt. Create dict.txt by using the US version.

$ cp dict_us.txt dict.txt

Run the sample application.

lpr-test-app [language mode:1-us 2-chinese]
  [sink mode:1-output as 264 stream file 2-no output 3-display on screen]
  [ROI enable:0-disable ROI 1-enable ROI]
  [input mp4 file path and name] [input mp4 file path and name] ... [input mp4 file path and name]
  [output 264 file path and name]

For example:

$ lpr-test-app 1 3 0 file1.mp4 file2.mp4 output.264

Performance

The following table just shows the inference throughput in frames per seconds (FPS) of the US LPD pruned model, which is trained on a proprietary dataset with over 45,000 US car images. The performance varies with different input size, pruning ratio, devices, etc. The performance is for the pruned version of the model that is available on NGC and not on the model trained in earlier sections.

DeviceInput Size(CHW)PrecisionBatch SizeFPS
Jetson Nano3x480x640FP16166
Jetson NX3x480x640INT81461
Jetson Xavier3x480x640INT81913
T43x480x640INT812748
Table 3. Inference performance of license plate detection.

LPR standalone performance

The following table shows the inference performance of the LPR trained on US license plates on different devices. We profiled the model inference with the trtexec command of TensorRT.

DeviceInput SizePrecisionBatch SizeFPS
Jetson Nano3x48x96FP163216
Jetson NX3x48x96FP1632600
Jetson Xavier3x48x96FP16641021
T43x48x96FP161283821
Table 4. Inference performance of license plate recognition.

Sample application performance

The full pipeline of this sample application runs three different DNN models. You use pretrained TrafficCamNet in TLT for car detection. LPD and LPR are pretrained with the NVIDIA training dataset of US license plates.

The following test is done with 1080p (1920×1080) resolution videos with the sample LPR application. The following table shows the end-to-end performance of processing the entire video analytic pipeline with three DNN models, starting from ingesting video data to rendering the metadata on the frames. The data is collected on different devices.

DeviceNumber of streamsBatch SizeTotal FPS
Jetson Nano119.2
Jetson NX3380.31
Jetson Xavier55146.43
T41414447.15
Table 5. End-to-end inference of ALPR application using DeepStream.

Summary

In this post, we introduced an end-to-end AI solution for automatic license plate recognition. This solution covers all the aspects of developing an intelligent video analysis pipeline — training deep neural network models with Transfer Learning Toolkit to deploying the trained models in DeepStream SDK.

For training, you don’t need the expertise to build your own DNN and optimize the model. TLT offers a simplified way to train your model – all you must do is prepare the dataset and set the config files. Besides, you can take advantage of the highly accurate pretrained models in TLT instead of random initialization.

For deployment, DeepStream optimizes the system resources for video decoding, image preprocessing and inference, providing you with highest channel density for real-time video analytics. You can quickly deploy your trained models into a multi-stream video analytics pipeline with DeepStream with minimal effort.

Start your next AI project with NVIDIA pretrained models and train using Transfer Learning Toolkit.

For more information, see the following resources: