An artificial neural network is a biologically inspired computational model that is patterned after the network of neurons present in the human brain. Artificial neural networks can also be thought of as learning algorithms that model the input-output relationship. Applications of artificial neural networks include pattern recognition and forecasting in fields such as medicine, business, pure sciences, data mining, telecommunications, and operations managements.

An artificial neural network transforms input data by applying a nonlinear function to a weighted sum of the inputs. The transformation is known as a neural layer and the function is referred to as a neural unit. The intermediate outputs of one layer, called features, are used as the input into the next layer. The neural network through repeated transformations learns multiple layers of nonlinear features (like edges and shapes), which it then combines in a final layer to create a prediction (of more complex objects). The neural net learns by varying the weights or parameters of a network so as to minimize the difference between the predictions of the neural network and the desired values. This phase where the artificial neural network learns from the data is called training.

Figure 1: : Schematic representation of a neural network

Neural networks where information is only fed forward from one layer to the next are called feedforward neural networks. On the other hand, the class of networks that has memory or feedback loops is called Recurrent Neural Networks.

Once the artificial neural network has been trained, it can accurately predict outputs when presented with inputs, a process referred to as neural network inference. To perform inference, the trained neural network can be deployed in platforms ranging from the cloud, to enterprise datacenters, to resource-constrained edge devices. The deployment platform and type of application impose unique latency, throughput, and application size requirements on runtime. For example, a neural network performing lane detection in a car needs to have low latency and a small runtime application. On the other hand, datacenter identifying objects in video streams needs to process thousands of video streams simultaneously, needing high throughput and efficiency.

A unit often refers to a nonlinear activation function (such as the logistic sigmoid function) in a neural network layer that transforms the input data. The units in the input/ hidden/ output layers are referred to as input/ hidden/ output units. A unit typically has multiple incoming and outgoing connections. Complex units such as long short-term memory (LSTM) units have multiple activation functions with a distinct layout of connections to the nonlinear activation functions, or maxout units, which compute the final output over an array of nonlinearly transformed input values. Pooling, convolution, and other input transforming functions are usually not referred to as units.

The terms neuron or artificial neuron are equivalent to a unit, but imply a close connection to a biological neuron. However, deep learning does not have much to do with neurobiology and the human brain. On a micro level, the term neuron is used to explain deep learning as a mimicry of the human brain. On a macro level, Artificial Intelligence can be thought of as the simulation of human level intelligence using machines. Biological neurons are however now believed to be more similar to entire multilayer perceptrons than to a single unit/ artificial neuron in a neural network. Connectionist models of human perception and cognition utilize artificial neural networks. These connectionist models of the brain as neural nets formed of neurons and their synapses are different from the classical view (computationalism) that human cognition is more similar to symbolic computation in digital computers. Relational Networks and Neural Turing Machines are provided as evidence that cognition models of connectionism and computationalism need not be at odds and can coexist.

An activation function, or transfer function, applies a transformation on weighted input data (matrix multiplication between input data and weights). The function can be either linear or nonlinear. Units differ from transfer functions in their increased level of complexity. A unit can have multiple transfer functions (LSTM units) or a more complex structure (maxout units).

The features of 1000 layers of pure linear transformations can be reproduced by a single layer (because a chain of matrix multiplication can always be represented by a single matrix multiplication). A non-linear transformation, however, can create new, increasingly complex relationships. These functions are therefore very important in deep learning, to create increasingly complex features with every layer. Examples of nonlinear activation functions include logistic sigmoid, Tanh, and ReLU functions.

A layer is the highest-level building block in machine learning. The first, middle, and last layers of a neural network are called the input layer, hidden layer, and output layer respectively. The term hidden layer comes from its output not being visible, or hidden, as a network output. A simple three-layer neural net has one hidden layer while the term deep neural net implies multiple hidden layers. Each neural layer contains neurons, or nodes, and the nodes of one layer are connected to those of the next. The connections between nodes are associated with weights that are dependent on the relationship between the nodes. The weights are adjusted so as to minimize the cost function by back-propagating the errors through the layers. The cost function is a measure of how close the output of the neural network algorithm is to the expected output. The error backpropagation to minimize the cost is done using optimization algorithms such as stochastic gradient descent, batch gradient descent, or mini-batch gradient descent algorithms. Stochastic gradient descent is a statistical approximation of the optimal change in gradient that produces the cost minima. The rate of change of the weights in the direction of the gradient is referred to as the learning rate. A low learning rate corresponds to slower/ more reliable training while a high rate corresponds to quicker/ less reliable training that might not converge on an optimal solution.

A layer is a container that usually receives weighted input, transforms it with a set of mostly nonlinear functions and then passes these values as output to the next layer in the neural net. A layer is usually uniform, that is it only contains one type of activation function, pooling, convolution etc. so that it can be easily compared to other parts of the neural network.

State-of-the-art Neural Networks can have from millions to well over one billion parameters to adjust via back-propagation. They also require a large amount of training data to achieve high accuracy, meaning hundreds of thousands to millions of input samples will have to be run through both a forward and backward pass. Because neural nets are created from large numbers of identical neurons they are highly parallel by nature. This parallelism maps naturally to GPUs, which provide a significant computation speed-up over CPU-only training.

GPUs have become the platform of choice for training large, complex Neural Network-based systems because of their ability to accelerate the systems. Because of the increasing importance of Neural networks in both industry and academia and the key role of GPUs, NVIDIA has a library of primitives called cuDNN that makes it easy to obtain state-of-the-art performance with Deep Neural Networks.

The parallel nature of inference operations also lend themselves well for execution on GPUs. To optimize, validate, and deploy networks for inference, NVIDIA has an inference platform accelerator and runtime called TensorRT. TensorRT delivers low-latency, high-throughput inference and tunes the runtime application to run optimally across different families of GPUs.

- “Deep Learning in a Nutshell: Core Concepts” Dettmers, Tim. Parallel For All. NVIDIA, 3 Nov 2015.
- “Accelerate Machine Learning with the cuDNN Deep Neural Network Library” Brown, Larry. Parallel For All. NVIDIA, 7 Sep 2014.
- “cuDNN v2: Higher Performance for Deep learning on GPUs” Brown, Larry. Parallel For All. NVIDIA, 31 Mar 2015.
- “Convolutional deep belief networks for scalable unsupervised learning of hierarchical representations” Lee, Honglak. ICML 2009
- “The Basic Ideas in Neural Networks” Rumelhart, David et al. Communications of the ACM, March 1994