Jetson TK1: Mobile Embedded Supercomputer Takes CUDA Everywhere

Today, cars are learning to see pedestrians and road hazards; robots are becoming higher functioning; complex medical diagnostic devices are becoming more portable; and unmanned aircraft are learning to navigate autonomously. As a result, the computational requirements for these devices are increasing exponentially, while their size, weight, and power limits continue to decrease. Aimed at … Continued

Using NVIDIA Nsight Systems in Containers and the Cloud

Gone are the days when it was expected that a programmer would “own” all the systems that they needed. Modern computational work frequently happens in shared systems, in the cloud, or otherwise on hardware not owned by the user or even their employer. This is good for developers. It can save time and money by … Continued

Validating Distributed Multi-Node Autonomous Vehicle AI Training with NVIDIA DGX Systems on OpenShift with DXC Robotic Drive

Deep neural network (DNN) development for self-driving cars is a demanding workload. In this post, we validate DGX multi-node, multi-GPU, distributed training running on RedHat OpenShift in the DXC Robotic Drive environment. We used OpenShift 3.11, also a part of the Robotic Drive containerized compute platform, to orchestrate and execute the deep learning (DL) workloads. … Continued

CUDA Pro Tip: Minimize the Tail Effect

When I work on the optimization of CUDA kernels, I sometimes see a discrepancy between Achieved and Theoretical Occupancies. The Theoretical Occupancy is the ratio between the number of threads which may run on each multiprocessor (SM) and the maximum number of executable threads per SM (2048 on the Kepler architecture). This value is estimated … Continued

Improving INT8 Accuracy Using Quantization Aware Training and the NVIDIA TAO Toolkit

Deep neural network (DNN) models are routinely used in applications requiring analysis of video stream content. These may include object detection, classification, and segmentation. Typically, these models are trained on servers with high-end GPUs, either in stand-alone servers, such as NVIDIA DGX1, or on servers available in data centers or private or public clouds. Such … Continued

NVIDIA Nsight Systems Adds Vulkan Support

Vulkan is a low-overhead, cross-platform 3D graphics and compute API targeting a wide variety of devices from cloud gaming servers, to PCs and embedded platforms. The Khronos Group manages and defines the Vulkan API. Introduction to NVIDIA Nsight Systems NVIDIA Nsight™Systems provides developers with a unified timeline view which displays how applications use computer resources. This low-overhead performance … Continued

Accelerating Recommender Systems Training with NVIDIA Merlin Open Beta

NVIDIA Merlin is an open beta application framework and ecosystem that enables the end-to-end development of recommender systems, from data preprocessing to model training and inference, all accelerated on NVIDIA GPU. We announced Merlin in a previous post and have been continuously making updates to the open beta. In this post, we detail the new … Continued

Creating Visualizations of Large Molecular Systems using NVIDIA Omniverse

Wouldn’t it be amazing if you could create beautiful and immersive scientific visualizations of large and dynamic simulations like Folding@Home’s simulation of COVID-19 spikes? In this post, we share our recipe to show that you can use NVIDIA Omniverse to create powerful cinematic visualizations from scientific data. Figure 1. Bring in large dynamic simulation data … Continued

Create Realistic Synthetic Faces That Look Older With Deep Learning

Developers from Orange Labs in France developed a deep learning system that can quickly make young faces look older, and older faces look younger. A number of techniques already exist, but they are expensive and time consuming. Using CUDA, Tesla K40 GPUs and cuDNN for the deep learning work, they trained their neural network on … Continued

Creating Robust Neural Speech Synthesis with ForwardTacotron

Photo by Thomas Le: https://unsplash.com/@thomasble The artificial production of human speech, also known as speech synthesis, has always been a fascinating field for researchers, including our AI team at Axel Springer SE. For a long time, people have worked on creating text-to-speech (TTS) systems that reach human level. Following the field’s transition to deep learning … Continued

VRWorks 360 Video SDK 2.0 Adds Features, Turing Support

An ecosystem of camera systems and video processing applications surround us today for professional and consumer use, be it, for film or home video. The ability to enhance and optimize this omnipresent stream of videos and photos has become an important focus in consumer and prosumer circles. The real-world use cases associated with 3DoF 360 video have … Continued