AI / Deep Learning |

Building Image Segmentation Faster Using Jupyter Notebooks from NGC

The NVIDIA NGC team is hosting a webinar with live Q&A to dive into this Jupyter notebook available from the NGC catalog. Learn how to use these resources to kickstart your AI journey. Register now: NVIDIA NGC Jupyter Notebook Day: Image Segmentation.

Image segmentation is the process of partitioning a digital image into multiple segments by changing the representation of an image into something that is more meaningful and easier to analyze. Image segmentation can be used in a variety of domains such as manufacturing to identify defective parts, in medical imaging to detect early onset of diseases, in autonomous driving to detect pedestrians, and more.

However, building, training, and optimizing these models can be complex and quite time consuming. To achieve a state-of-the-art model, you need to set up the right environment, train with the correct hyperparameters, and optimize it to achieve the desired accuracy. Data scientists and developers usually end up spending a considerable amount of looking for the right tools and setting up the environments for their models, which is why we built the NGC Catalog.

A hub for cloud-native, GPU-optimized AI and HPC applications and tools that provides faster access to performance-optimized containers, shortens time-to-solution with pretrained models and provides industry specific software development kits to build end-to-end AI solutions. The catalog hosts a diverse set of assets that can be used for a variety of applications and use cases ranging from computer vision and speech recognition to recommendation systems.

Image shows the NGC homepage, on the Collections tab.
Figure 1. The NGC Catalog, a hub for GPU-optimized AI software.

The AI containers and models on the NGC Catalog are tuned, tested, and optimized to extract maximum performance from your existing GPU infrastructure. The containers and models use automatic mixed precision (AMP), enabling you to use this feature with either no code changes or only minimal changes. AMP uses the Tensor Cores on NVIDIA GPUs and can speed up model training considerably. Multi-GPU training is a standard feature implemented on all NGC models that leverage Horovod and NCCL libraries for distributed training and efficient communication.

Aside from providing just the building blocks, NGC is now adding sample Jupyter notebooks complete with instructions on how to train and deploy a model using these artifacts from the NGC Catalog. In this post, I show you how to use a sample image segmentation notebook to identify defective parts in a manufacturing assembly line.

Image segmentation

This U-Net model is adapted from the original version of the U-Net model, which is a convolutional auto-encoder for 2D image segmentation. This work proposes a modified version of U-Net, called TinyUNet, which performs efficiently and with high accuracy on the industrial anomaly dataset DAGM2007. TinyUNet is composed of two parts, encoding and decoding subsystems. The encoder converts the input sequence into a single dimensional vector. The decoder uses the output of the encoder as input and converts this vector into the output sequence.

This model repeatedly applies three downsampling blocks composed of two 2D convolutions followed by a 2D max pooling layer in the encoding subnetwork. In the decoding subnetwork, three upsampling blocks are composed of a upsample2D layer followed by a 2D convolution, a concatenation operation with the residual connection, and two 2D convolutions.

This is the default configuration of the model:

  1. Epochs: 2500
  2. Global Batch Size: 16
  3. Optimizer RMSProp:
    1. decay: 0.9
    2. momentum: 0.8
    3. centered: True
  4. Learning Rate Schedule: Exponential Step Decay
    1. decay: 0.8
    2. steps: 500 initial
    3. learning rate: 1e-4
  5. Weight Initialization: He Uniform Distribution (introduced by Kaiming He et al. in 2015 to address issues related ReLU activations in deep neural networks)
  6. Loss Function:
    1. When DICE Loss < 0.3, Loss = Binary Cross Entropy
    2. Else, Loss = DICE Loss
  7. Data Augmentation:
    1. Random Horizontal Flip (50% chance)
    2. Random Rotation 90°
  8. Activation Functions:
    1. ReLU is used for all layers Sigmoid is used at the output to ensure that the outputs are between [0, 1]
  9. Weight decay:
    1. 1e-5

This NGC resource contains a Dockerfile that extends the TensorFlow container in the NGC Catalog and encapsulates the necessary dependencies. Aside from these dependencies, you also need the following components:

To train your model using mixed precision with Tensor Cores or using FP32, perform the following steps using the default parameters of the U-Net model on the EM segmentation challenge dataset. This enables you to build the U-Net TensorFlow NGC container, train and evaluate your model, and generate predictions on the test data.

Build the container

Find the U-Net model under the resources section in the NGC Catalog. You can download the resource manually from the top right menu or by using the wget resource command.

The page for the U-Net industrial model for TensorFlow, with all the pull instructions, documentation for setting up the model, and necessary performance metrics associated with this model.
Figure 2. Resource landing page for the U-Net resource in the NGC Catalog.

To build the container, follow these steps:

  1. Make a folder using mkdir.
  2. Go to that folder using cd.
  3. Use wget to download resources as a zip file inside the folder.
  4. Unzip the zip file.
  5. Build the container using the Dockerfile inside this directory.
mkdir unet

cd unet

wget --content-disposition -O

docker build . --rm -t unet_industrial:latest

When you have built your container, follow the steps listed below to start an interactive session inside the NGC container:

  1. Make a directory for the dataset, for example ./dataset.
  2. Make a directory for results, for example ./results.
  3. Start the container with nvidia-docker.
mkdir ./dataset

mkdir ./results

docker run -it --rm \
    --shm-size=2g --ulimit memlock=-1 --ulimit stack=67108864 \
    -v <absolute path to dataset directory that we made>:/data/ \
    -v <absolute path to results directory that we made>:/results \
    -p 8888:8888 unet_industrial:latest

Download and preprocess the data set

You use DAGM2007 dataset for training this model. This data set is artificially generated but is similar to real-world problems. It consists of multiple data sets, each consisting of 1000 images showing the background texture without defects and 150 images with one labeled defect each on the background texture. The images in a single data set are similar, but each data set is generated by a different texture model and defect model.

Figure 3 shows a sample image from this data set. The left image is without defect and the right image contains a scratch-shaped defect that appears as a thin dark line, and a diffuse darker area. The defects are labeled by a surrounding ellipse, shown in red.

Graphic shows two similar images, one with a blurry area circled in red.
Figure 3. Images in the sample data set that contain images with and without defects.

To download the dataset, execute the following command. Some files in the dataset require an account. If you face a permissions problem while running any of these scripts, use chmod to add the scripts and rebuild the container.

chmod +x scriptname

For example:

chmod +x

To download the dataset, run the following command:

./ /data

Download the rest of the data set

To download the complete dataset required for this tutorial, you can request access to the private data. The provided link allows you to download a group of 10 datasets to your local machine. For training and testing purposes, you must choose the class of data to use so you can download some of these datasets instead of all 10 classes.

When you have downloaded the dataset, open another terminal in your local machine. To copy downloaded files to your running container, run the following command in the terminal:

docker cp /home/<your file path> :/data/raw_images/private

An example command to copy class1 folder:

docker cp /home/skouchak/ 1877b7cc7625:/data/raw_images/private

Unzip the folder using the following command:

unzip /folder/path/.zip

Start training

To train using the default configuration (for example 1/4/8 GPUs, FP32/TF-AMP), launch one of the scripts in the ./scripts directory:./scripts/UNet{AMP}{1, 4, 8}

Each of the scripts requires several parameters:

  • Path to the results directory of the model as the first argument
  • Path to the dataset as a second argument
  • Class ID from DAGM used (between 1-10)

For example, for class 1:

cd scripts/
./ /results /data 1

Run the evaluation

Launch model evaluation on a checkpoint by running one of the scripts in the ./scripts directory: ./scripts/UNet{_AMP}

Each of the scripts requires three parameters:

  • Path to the results directory of the model as the first argument
  • Path to the dataset as a second argument
  • Class ID from DAGM used (between 1-10)
cd scripts/
./ /results /data 1


In this section, you see how to benchmark the training and inference performance of the model using the  related scripts.

Training performance benchmark

To benchmark the training performance, run one of the scripts in the ./scripts/benchmarking/ directory: ./scripts/benchmarking/UNet_trainbench{AMP}{1, 4, 8} Each of the scripts requires two parameters:

  • Path to the dataset as the first argument
  • Class ID from DAGM used (between 1-10)
cd scripts/benchmarking/
./ /data 1

Inference performance benchmark

To benchmark the inference performance, run one of the scripts in the ./scripts/benchmarking/ directory:./scripts/benchmarking/UNet_evalbench{_AMP}.sh.

Each of the scripts requires the following parameters:

  • Path to the dataset as the first argument
  • Class ID from DAGM used (between 1-10)
cd scripts/benchmarking/
./ /data 1

You can access the Jupyterlab notebook in the container by using the following command inside the container bash shell:

jupyter notebook --ip --port 8888 --allow-root

Running the command results in a link to the Jupyterlab workspace in the container. Go to the workspace and upload the unet-industrial-demo notebook. Download the Jupyter notebook.

By following the unet-industrial-demo notebook, you can download 10 pretrained and fine-tuned models like the model that you just trained. You can test these models using images available in the public folders. To use the model that you trained instead of pretrained models, change the checkpoints folder path to the checkpoint of the model that you just trained and test your trained model using images in the public folder.

Advanced steps

The following sections provide greater details on the dataset, running training and inference and the results. To see the full list of available options and their descriptions, use the -h or --help command line option. For example:

python --help
General argumentsDescription
–exec_mode=train_and_evaluateShows the execution mode.
–iter_unit=batchShows the number of batches and epochs.
–num_iter=2500Shows the number of iterations.
–batch_size=16Shows the size of each minibatch per GPU.
–results_dir=/path/to/resultsShows the directory in which to write training logs, summaries, and checkpoints.
–data_dir=/path/to/datasetShows the directory that contains the DAGM2007 dataset.
–dataset_name=DAGM2007Shows the name of the dataset used in this run. Only DAGM2007 is currently supported.
–dataset_classID=1Shows the ClassID to train or evaluate the network. This is used for DAGM.
Table 1. List of options.

unet-industrial-demo notebook

After you have successfully trained the model, the next logical step is to deploy the model in production for inference. The UNet_Industrial TensorFlow checkpoint is trained with AMP. This model can be used for many different applications:

  • Running inference/predictions using the model directly
  • Building more efficient inference engines
  • Resuming training from the downloaded checkpoint
  • Transfer learning
  • Training a student network

Download resources

Check your current path using the pwd command and adjust the path in the following command, including the path to the notebook folder of the resource that you just downloaded.

import os

print (os.getcwd())

Download some data for testing the Weekly Supervised Learning for Industrial Optical Inspection (DAGM 2007) competition dataset. To download the 10 public datasets for preprocessing, run the following command:

! ./ ./data

The next command downloads the model checkpoints as a zip file and unzips it:

wget --content-disposition -O

Upon completion of the download, the following model directories contain the pretrained models corresponding to the 10 classes of the DAGM 2007 competition data set:

!ls nvidia_unetindtf_fp16_20190522
Image shows a list of classes 1-10.
Figure 4. List of DAGM 2007 competition data set classes.

Inference with native TensorFlow

In this section, you launch an interactive session to verify the correctness of the pretrained models, where you can load new test images.

Import some necessary libraries:

except ImportError:
    os.system("pip install horovod")

import horovod.tensorflow
import sys

from model.unet import UNet_v1
import numpy as np
%matplotlib inline
import matplotlib.pyplot as plt
import matplotlib.image as mpimg

Select any image from the public image folders downloaded previously to test the model. You can choose any other image from these folders by changing the last two parts of the image address:

img = mpimg.imread('./data/raw_images/public/Class1_def/1.png')

plt.figure(figsize = (10,10));
plt.imshow(img, cmap='gray');
The defective image here is loaded to verify the model. The image shows the defect on the top left portion of the image.
Figure 5. One of the images in the sample data set with a defect.

As you can see in Figure 5, there is a defective area in the top left corner. In the next step, you normalize the image:

# Image preprocessing
img =  np.expand_dims(img, axis=2)
img =  np.expand_dims(img, axis=0)
img = (img-0.5)/0.5

Testing the model

Restore the model using one of the checkpoints available in the /nvidia_unetindtf_fp16_20190522 folder (downloaded earlier). You can change the last three parts of the address to have access to other checkpoints.

import tensorflow as tf
import horovod.tensorflow as hvd
config = tf.ConfigProto()
config.gpu_options.allow_growth = True
config.allow_soft_placement = True

graph = tf.Graph()
with graph.as_default():
    with tf.Session(config=config) as sess:
        network = UNet_v1(
        tf_input = tf.placeholder(tf.float32, [None, 512, 512, 1], name='input')
        outputs, logits = network.build_model(tf_input)
        saver = tf.train.Saver()

        # Restore variables from disk.
        saver.restore(sess, "nvidia_unetindtf_fp16_20190522/Class 1/checkpoints/model.ckpt-2500")
        output =[outputs, logits], feed_dict={tf_input: img})

In the final step, plot the output of the model:

# Print out model predicted mask
plt.figure(figsize = (10,10))
plt.imshow(np.squeeze(output[0]), cmap='gray')
Shows the final segmentation mask of the image with the defect shown in the top left white over a black background.
Figure 6. Resulting image generated from the trained model.

As expected, the model has captured the defective area in the image. Feel free to try out other defective images for Class 1 within ./data/raw_images/public/Class1_def/ or load the model and test data for other classes from 1 to 10.


In this post, I showed you how to use a sample image segmentation notebook to identify defective parts in a manufacturing assembly line using a Jupyter notebook from NGC.

Get started today by visiting the NGC Catalog, downloading the Jupyter notebooks for an image segmentation model and applying it to your own use cases.

NVIDIA GTC provides training, insights, and direct access to experts. Join us for breakthroughs in AI, data center, accelerated computing, healthcare, game development, networking, and more. Invent the future with us April 12-16, 2021.