GTC Silicon Valley-2019: Untangling Information for Training Strategies in Machine and Deep Learning
GTC Silicon Valley-2019 ID:S9647:Untangling Information for Training Strategies in Machine and Deep Learning
Daniel Wilke(University of Pretoria)
Training machine and deep learning architectures remains challenging and resource-demanding. Much of the difficulty arises from stochastic mini-batch sampling, which results in noisy cost functions for which well-known deterministic approaches fail. Strategies proposed to overcome these challenges include basic fixed learning rates (or sub-gradient approaches), a priori selected learning-rate schedules, learning-rate cycling, and grid search to probabilistic line searches. Our talk will untangle available information used for training such as noisy function values and noisy gradients. We'll demonstrate the training benefits of exploiting noisy gradient information in more advanced training strategies for both Tensorflow and Pytorch on NVIDIA GeForce GTX 1080Ti and RTX 2080 Ti GPUs. We'll show a simulated dataset of the comminution process of an industrial ball mill, which we generated using a multi-GPU discrete element code.